[Top] [All Lists]

Re: sti() does not work.

To: "Ralf Baechle" <>, "Maciej W. Rozycki" <>
Subject: Re: sti() does not work.
From: "Kevin D. Kissell" <>
Date: Sat, 14 Jul 2001 13:39:58 +0200
Cc: "Thiemo Seufer" <>, <>
References: <> <> <>
> Real wild pig hackers on R3000 were writing code which knows that in the
> load delay slot they still have the old register value available.  So you
> can implement var1++; var2++ as:
> .set noreorder
> lw $reg, var1($gp)
> nop
> addiu $reg, $reg, 1
> lw $reg, var2($gp)
> sw $reg, var1($gp)
> addiu $reg, $reg, 1
> sw $reg, var2($gp)
> .common var1, 4, 4
> .common var2, 4, 4
> Of course only safe with interrupts disabled.  So in a sense introducing
> the load interlock broke semantics of MIPS machine code ;-)

Architecturally, the target register value is UNDEFINED during
the load delay slot on a MIPS I CPU.  Anyone who coded to any
particular assumption regarding its value was coding to a 
specific CPU implementation.  Introducing the load interlock
in later versions of the ISA and later implementations did not
reach backward in time and break the old hardware.  The
implementation-specific code still works for its specific 
implementation.  Refining the spec did not break the code for later
implementations - it was *always* broken for later implementations! ;-)

In a less pedantic tone, there actually is an architecturally
legal case where an assembly coder can justify the use of
noreorder for something other than CP0 pipeline hazards.
If what I want to do is to test a value, branch on the result,
and modify that value regardless of whether the branch is
taken, I can code something like:

    .set noreorder
    bltz    t0,foo
    sra    t0,t0,2
    .set reorder
    <other code>

Whereas otherwise I need to either consume another
register or replicate the shift both after the branch and
after foo.  If I'm very very lucky, the assembler will "hoist"
such a replicated instruction into the delay slot - a  good
compiler back-end optimiser certainly would.  But I'm not 
aware of any MIPS assembler that would perform that
optimisation - certainly the GNU assembler does not.

            Kevin K.

<Prev in Thread] Current Thread [Next in Thread>