linux-mips
[Top] [All Lists]

Re: .subsection madness

To: David Daney <ddaney@caviumnetworks.com>
Subject: Re: .subsection madness
From: Ralf Baechle <ralf@linux-mips.org>
Date: Fri, 14 Aug 2009 22:57:59 +0100
Cc: linux-mips <linux-mips@linux-mips.org>, Adam Nemet <anemet@caviumnetworks.com>
In-reply-to: <4A85ABD3.5040801@caviumnetworks.com>
Original-recipient: rfc822;linux-mips@linux-mips.org
References: <4A85ABD3.5040801@caviumnetworks.com>
Sender: linux-mips-bounce@linux-mips.org
User-agent: Mutt/1.5.18 (2008-05-17)
On Fri, Aug 14, 2009 at 11:24:19AM -0700, David Daney wrote:

> In atomic.h for atomic_add we have this gem:
>
>       __asm__ __volatile__(
>       "       .set    mips3                                   \n"
>       "1:     ll      %0, %1          # atomic_add            \n"
>       "       addu    %0, %2                                  \n"
>       "       sc      %0, %1                                  \n"
>       "       beqz    %0, 2f                                  \n"
>       "       .subsection 2                                   \n"
>       "2:     b       1b                                      \n"
>       "       .previous                                       \n"
>       "       .set    mips0                                   \n"
>
>
> What is the purpose of the .subsection here?
>
> It will not affect branch prediction in the beqz as nothing happens in  
> .subsection 2.

I'm not following.  Most simple branch predictors will assume a backward
branch to be a loop completion branch and thus predict it as taken while
we assume that the SC instruction rarely fails no matter if spinlock,
bit or atomic operation.

It can even help on a CPU without branch prediction like the R4000 which
kills the two instruction following the delay slot for a taken branch.

> For spin locks it is clear that this technique can help, but for  
> atomic_add I don't think so.  To make matters worse for some code the  
> subsection is going out of branch range.

That problem should have be solved by building the kernel with
-ffunction-sections.  Other architectures needed -ffunction-sections for
the same reason.

  Ralf

<Prev in Thread] Current Thread [Next in Thread>