On 06/20/2010 10:55 PM, Jan Rovins wrote:
Some additions& corrections to the previous:
From: firstname.lastname@example.org [mailto:linux-mips-bounce@linux-
mips.org] On Behalf Of Jan Rovins
Sent: Saturday, June 19, 2010 3:14 PM
To: David Daney
Cc: Kevin D. Kissell; email@example.com
Subject: Re: Help with decoding a NMI Watchdog interrupt on an Octeon
David Daney wrote:
On 06/17/2010 02:26 PM, Kevin D. Kissell wrote:
NMI is just an input pin, so you'd really need to know what it's
connected to in the system you're working on.
In this case, the NMI is likely being asserted by the watchdog. So if
you are stuck in a loop with interrupts disabled, the register dump
might help you figure out where things are stuck. But as you say
below, knowing the value of the ErrorEPC register is critical.
Thank you David& Kevin for the detailed information.
Yes, in my case it's the watchdog, when I turn the watchdog off, the
machine just hangs, with no NMI dump.
Ok, I added the code to Print out the ErrorEPC, and got:
This address is not in vmlinux, but is the address of a loaded module.
So, I poked around in /sys/module/ until I found one that had that
cat /sys/module/linux_bcm_core/sections/.text :0xc000000001c4e000
And then did an objdump on this module. Since the module dump did not
contain the actual addresses that it was running from, I doctored up the
offsets by using the .text address from /sys/module/ of where the module
objdump.cavium -d --adjust-vma 0xc000000001c4e000 linux-bcm-core.ko
When looking at kernel modules, it can be helpful to show the
relocations as well, so add '-r' to your objdump command line...
Just want to check if all this sounds correct so far? is my objdump
valid with the .text offset?
I got a hit on the ErrorEPC value in my dump:
c0000000023c5004: 08000000 j c000000000000000
... Once you turn on display or relocations, you can see where the jump
is really going. The relocations are applied by the kernel when loading
This line of code was inside a function called _default_assert, which on
assertion failure, did a printk() and went into an intentional infinite
loop, which explains the NMI dump. The only thing that puzzles me now, is
that the assert failure printk rarely displayed. Could that be because it
was called while interrupts were turned off? I suppose that would stop it
from showing up in /var/log/messages.
The assembly still does not make sense to me (first time with MIPS assembly)
but on examining the C code I think I understand what's going on here.
It seems like you may be onto the cause of the watchdog expiring, all
that's left is to figure out how you get into this spot in the first place.