On Wed, 1 Dec 2004, Ralf Baechle wrote:
> this problem here is specific to inline assembler. The splitlock code for
> a reasonable CPU is:
>
> static __inline__ void atomic_add(int i, atomic_t * v)
> {
> unsigned long temp;
>
> __asm__ __volatile__(
> "1: ll %0, %1 # atomic_add \n"
> " addu %0, %2 \n"
> " sc %0, %1 \n"
> " beqz %0, 1b \n"
> : "=&r" (temp), "=m" (v->counter)
> : "Ir" (i), "m" (v->counter));
> }
>
> For the average atomic op generated code is going to look about like:
>
> 80100634: lui a0,0x802c
> 80100638: ll a0,-24160(a0)
> 8010063c: addu a0,a0,v0
> 80100640: lui at,0x802c
> 80100644: addu at,at,v1
> 80100648: sc a0,-24160(at)
> 8010064c: beqz a0,80100634 <init+0x194>
> 80100650: nop
>
> It's significantly worse for 64-bit due to the excessive code sequence
> generated for loading a 64-bit address. One outside CKSEGx that is.
Only for old compilers. For current (>= 3.4) ones you can use the "R"
constraint and get exactly what you need. Rewriting inline asms to use
"R" for GCC >= 3.4 has actually been on my to-do list for some time;
predating the current working implementation even.
> On 32-bit Thiemo's patch would cut that down to something like:
>
> 80100630: lui t0,0x802c
> 80100634: addiu t0,t0,-24160
> 80100638: ll a0,0(t0)
> 8010063c: addu a0,a0,v0
> 80100648: sc a0,0(to)
> 8010064c: beqz a0,80100638 <init+0x194>
> 80100650: nop
Plus it clobbers memory requiring a writeback and a refetch of all
unrelated variables that have happened to be cached in registers.
> On 64-bit the savings would be even more significant. But what we actually
> want would be using the "o" constraint. Which just at least on the
> compilers where I've tried it, didn't produce code any different from "m".
No surprise as the "o" constraint doesn't mean anything particular for
MIPS. All addresses are offsettable -- there is no addressing mode that
would preclude it, so "o" is exactly the same as "m".
> The expected code would be something like:
>
> 80100634: lui t0,0x802c
> 80100638: ll a0,-24160(t0)
> 8010063c: addu a0,a0,v0
> 80100648: sc a0,-24160(to)
> 8010064c: beqz a0,80100634 <init+0x194>
> 80100650: nop
>
> So another instruction less.
That's exactly what's emitted with "R". Should I accelerate my work on
it? It's nothing that would require a lot of effort -- it's more boring
than challenging.
Maciej
|