In a thread in this group some weeks ago, I suggested
that the problems observed with running Linux on R5K
Indys might be related to the fact that the TLB miss
handler did not respect the rules set out in the R5000
users' manual, specifically, that there be two integer instructions
between any modification of the EntryHi/EntryLo/PageMask
registers and a TLB Write operation. The response of
the old hands in the group was that this couldn't be a problem,
since IRIX didn't respect that rule, and IRIX empirically works.
Perhaps, but In the course of tormenting our various systems with
"crashme", we discovered that, while we could make crashme
run for unbounded periods of time on our new MIPS "Jade" CPUs,
it would sieze up in less than a minute on a QED R5260 running
on the same hardware platform. Logic analyser traces seemed to
indicate that it may have been a problem with TLB miss service where
the instruction causing the fault was a load/store using k0/k1 as
a base register - something no sane program would do, of course.
On a hunch, I modified the excep_vec0_nevada routine to insert
two nops between the mtc0 to EntryLo1 and the tlbwr. I also took
out one of the nops between the tlbwr and the eret. The documentation
implies that none is necessary, but I note that the IRIX handler has
a single nop, and I didn't want to push my luck. So there was a net
addition of 1 nop. Bingo. The system is as stable as with a Jade.
Now, that's on an R5260, not an R5000, but from what the engineers
at QED have told me, the CP0 design is the same for both families.
I am for once checking this change directly into the SGI repository,
but only for the "Nevada" CPUs. Someone with an R5000 Indy needs
to repeat the experiment for the R5000, and check in the change if
it helps there as well.
Kevin D. Kissell
MIPS Technologies European Architecture Lab