On 03/29/2010 03:02 PM, Andreas Barth wrote:
* David Daney (firstname.lastname@example.org) [100329 18:54]:
On 03/27/2010 04:07 PM, Andreas Barth wrote:
* David Daney (email@example.com) [100326 19:57]:
Also you could try running with the attached patch. It is not the best
watchdog, but it will print the register state for each core when things
get stuck. Occasionally that is enough to see where the problem is.
As our logging has only limited buffer size, I'd be happy about an
variant of the patch which doesn't reboot but just let the machine
hang after the third occurence.
Any chances for it?
You could just sit in a loop kicking the watchdog timer after you get to
the NMI handler. That should prevent a reset, but still print the
I need to admit that I'm totally unable to make code from that
Could you (or someone else) give me a hand? Also please note that it
usually takes a few hours to crash the machine, and I didn't see
anything in the normal syslog.
At the end of octeon_watchdog_nmi_stage3, instead of returning, do:
for(;;) watchdog_poke_irq(0, NULL);
That should prevent it from rebooting. The messages will appear on the
serial port, not in syslog.