On Fri, Aug 14, 2009 at 11:24:19AM -0700, David Daney wrote:
> In atomic.h for atomic_add we have this gem:
> __asm__ __volatile__(
> " .set mips3 \n"
> "1: ll %0, %1 # atomic_add \n"
> " addu %0, %2 \n"
> " sc %0, %1 \n"
> " beqz %0, 2f \n"
> " .subsection 2 \n"
> "2: b 1b \n"
> " .previous \n"
> " .set mips0 \n"
> What is the purpose of the .subsection here?
> It will not affect branch prediction in the beqz as nothing happens in
> .subsection 2.
I'm not following. Most simple branch predictors will assume a backward
branch to be a loop completion branch and thus predict it as taken while
we assume that the SC instruction rarely fails no matter if spinlock,
bit or atomic operation.
It can even help on a CPU without branch prediction like the R4000 which
kills the two instruction following the delay slot for a taken branch.
> For spin locks it is clear that this technique can help, but for
> atomic_add I don't think so. To make matters worse for some code the
> subsection is going out of branch range.
That problem should have be solved by building the kernel with
-ffunction-sections. Other architectures needed -ffunction-sections for
the same reason.