On Mon, Jun 25, 2001 at 01:36:15PM +0200, Maciej W. Rozycki wrote:
> After extensive debugging I managed to track down the bug that was
> preventing me from building binutils since the beginning of February.
> Once again the culprit turned out to the the explicit nature of MIPS'
> caches.
>
> The problem lies in r3k_flush_cache_sigtramp(). It flushes three
> consecutive word-wide locations starting from the address passed as an
> argument. The argument is normally a sigreturn trampoline that is set up
> by setup_frame() or setup_rt_frame(). But these functions set up two
> opcodes only -- the third word is left untouched. In my case the address
> was something like 0x7???bff8. So the area to be flushed spanned a page
> boundary and since the third word was unreferenced, a TLB entry for the
> page the word was located in was absent. As a result, a TLB refill
> exception happened with caches isolated, which is not necessarily a win.
> The symptom was a solid crash.
>
> I don't see any reason to flush the third word location, so I removed the
> code doing it. This fixed the crashes I was observing, but since we are
> using mapped (KUSEG) addresses in r3k_flush_cache_sigtramp(), I believe we
> need more protection against unwanted TLB exceptions. The point is we are
> running with interrupts enabled and a reschedule may happen between
> touching the trampoline in setup*_frame() and flushing the cache. Hence
> the TLB entries for the trampoline area, even once present, may get
> removed meanwhile. So I added some code to explicitly load the entries,
> if needed, with interrupts disabled just before isolating caches.
> Following is a resulting patch.
>
> Ralf, this is a showstopper bug -- please apply the fix ASAP.
Applied.
Ralf
|