On Sat, May 27, 2006 at 05:13:21PM -0400, Kumba wrote:
> Finally managed to track down the git commit causing SGI IP32 (O2) systems
> to lock up really early in the boot cycle, but I'm at a loss to understand
> why.
>
> Effect:
> It appears the system silently hangs somewhere in the void between function
> calls when trying to invoke the memset() call in __alloc_bootmem_core() in
> mm/bootmem.c. This puts the machine hardware in a state such that a simple
> soft reset doesn't clear it -- the machine has to be cold booted to get it
> to boot a working kernel again.
>
> Determined Cause:
> It seems this commit:
> 78eef01b0fae087c5fadbd85dd4fe2918c3a015f
> [PATCH] on_each_cpu(): disable local interrupts
>
> Is the cause. I've verified this by reversing this one change on a
> 2.6.17-rc4 tree, and it'll boot to a mini-userland (initramfs-based) and
> appears to function normally.
>
>
> But this is as far as I can trace this. I'm not sure what this change is
> doing internally that's triggering this lockup on O2 systems. It doesn't
> appear to affect Octane (IP30) systems or Origin (IP27). I haven't
> test-ran it on IP22/IP28 hardware yet, so only IP32 is known to be
> affected. Unsure about non-SGI MIPS hardware.
on_each_cpu is re-enabling interrupt. This may crash the system if it
happens before interrupt handlers have been installed. A while ago I've
fixes all such calls but I may have missed some instances.
Andrew, what was the reason for 78eef01b0fae087c5fadbd85dd4fe2918c3a015f ?
Ralf
|