Florian,
Could you do me a huge favor and try a build that
uses 3 or 4 nops instead of the branch to the instruction
after the delay slot? There was a reason that I eliminated
the branch construct from the MIPS internal Linux source
base - it's a hack that works perfectly on R4000's, but
it's pretty much a coincidence that it does so. Yes,
the code fragment in question is R4K-specific, but
we really need to migrate towards the use of consistent
mechanisms that work across the full range of MIPS
CPUs. Ideally, *all* CP0 hazards should some day be
padded out with "ssnops" (sll $0,$0,1, if I recall), which
force a 1 cycle delay per instruction even on superscalar
MIPS CPUs.
Kevin K.
|