> Except that the compiler does not always have the knowledge, particularly
>when inline assembly bits (insolvable) or macros such as "la" (unless gcc
>gets a full-blown ABI-dependent machinery implemented) are involved.
There is a natural conflict between compiler optimization and assembler
optimization/assembler macro expansion. If you want the best possible
compiler optimization, then you need to be willing to give up use of
assembler optimizations and assembler macros. That includes uses in extended
asms. We can make that work if we have to, but it is better if we don't have
> For the latter, gas could be able to move parts of macro expansions into
>delay slots and it sometimes succeeds, though it isn't particularly good
This is ISA confusion. When you ask gas to generate o32/PIC code, it assumes
the least common denominator, which is the R2000. The R2000 does not have
hardware interlocks on loads. It requires a nop in between a load and the
instruction that uses the result of the load. Therefore, we can not put a
load in a delay slot unless we know that the instruction at the branch target
does not use the result of the load. Since gas doesn't bother to construct
a control flow graph, we have no idea what is at the branch target, and
therefore we can't put a load in the branch delay slot.
When you ask gas to generate n32/PIC code, the least common denominator is
the R4000, which does have hardware interlocks on loads, and thus we can put
a load into a delay slot.
If you ask gas to generate R4000 o32/PIC code, it will fill the delay slot
exactly like you wanted, but the code may fail at run time on some mips
> It can't be optimized by gcc, if to be emitted,
It can be optimized if we use direct cpu instructions instead of relying
on assembler macros. Then gcc would know about the load instructions, and
would be able to place one in the branch delay slot (assuming a R4000 or
The MIPS gcc target is the only one that has this problem, because it is the
only one that relies on assembler macros for PIC support.
>So there is still a small gain from letting gas try to fill slots usefully
>when gcc can't. ...
> This isn't ever going to hurt, whether gcc gets smarter
Yes it can hurt. If gcc decides the optimal code for a loop requires putting
a nop in a branch delay slot, then the assembler would hurt performance if
it put another instruction there.
If your main concern is only extended asm code writting using assembler macros,
then that can be fixed by turning on assembler optimization within the
extended asm code. In the long run though, you are better off if you stop
using assembler macros.