linux-mips
[Top] [All Lists]

Re: a really really weird crash on swarm

To: "Maciej W. Rozycki" <macro@ds2.pg.gda.pl>
Subject: Re: a really really weird crash on swarm
From: Ralf Baechle <ralf@linux-mips.org>
Date: Mon, 19 Aug 2002 15:28:17 +0200
Cc: Jun Sun <jsun@mvista.com>, linux-mips@oss.sgi.com
In-reply-to: <Pine.GSO.3.96.1020819144136.14441E-100000@delta.ds2.pg.gda.pl>; from macro@ds2.pg.gda.pl on Mon, Aug 19, 2002 at 02:57:14PM +0200
References: <20020811185138.A2133@dea.linux-mips.net> <Pine.GSO.3.96.1020819144136.14441E-100000@delta.ds2.pg.gda.pl>
Sender: owner-linux-mips@oss.sgi.com
User-agent: Mutt/1.2.5.1i
On Mon, Aug 19, 2002 at 02:57:14PM +0200, Maciej W. Rozycki wrote:

> > Really odd because the register only lost the upper 16 bits; the lower 16
> > bits still have their expected value.
> 
>  It is a typical symptom of a register being corrupted between a "lui" and
> an "addiu"  -- an exception must have done it in the immediately preceding
> code.  You might be able to track a reason down by carefully studying
> possible exception paths at the place of the problem.  Unfortunately you
> don't have much of the state preserved at this stage -- you only know
> which register was corrupted. 

Little exception potencial in this case as the interrupts got disabled and
the addresses used were rsp. should all be in KSEG0.

>  Another possible approach is to add some code that compares the values of
> the register upon an exception entry and exit and wait for it to trigger
> -- for a single register it shouldn't be too tough and you have still much
> of the state available before an "rfe" or "eret".

Don't try to think too deterministic - Jun was working on first silicon, so
not necessarily on a deterministic platform as we'd like.  Fortunately
as you may have seen in the kernel code there's already newer silicon so
I'd simply file this one to /dev/null for now.

  Ralf


<Prev in Thread] Current Thread [Next in Thread>