On Sat, 7 Sep 2013, Maciej W. Rozycki wrote:
> > > Unfortunately that can't be said of the 64-bit kernel that hangs solidly
> > > (reset does not help, need to power-cycle) early on, after:
> > >
> > > Linux version 3.11.0-rc2 (macro@tp) (gcc version 4.1.2) #1 Sun Sep 1
> > > 18:06:20 BST 2013
> > > bootconsole [prom0] enabled
> > >
> > > has been printed. The next line should be:
> > >
> > > This is a DECstation 5000/2x0
> > No idea why this might be hanging. You might try git-bisect, if that's
> > not too painful?
> Given that printk works and it's just a couple of lines to examine I
> think I'll do it in the old fashion. ;)
So it turned out harder to debug than I anticipated and I resorted to
`git-bisect' after all. That took several hours, including the fun of
figuring out how to actually make the states of the tree chosen by `git'
(or any nearby candidates) build at all -- the period covered was clearly
the dark age of DECstation support.
Eventually I tracked it down to 231a35d37293ab88d325a9cb94e5474c156282c0
that introduced an incompatible copy of arch/mips/dec/prom/call_o32.S in
arch/mips/fw/lib/, built unconditionally. The copy happens to land
earlier of the two among the modules used in the link and is therefore
chosen for the DECstation rather than the intended original. As a result
random kernel data is corrupted because a pointer to the "%s" formatted
output template is used as a temporary stack pointer rather than being
passed down to prom_printf. This also explains why prom_printf still
works, up to a point -- the next argument is the actual string to output
so it works just fine as the output template until enough kernel data has
been corrupted to cause a crash.
Now that the cause is known it is straightforward to correct one way or
another -- the only question remaining is which one to choose.
Thomas, what was the rationale behind arranging things in this way? Did
you mean to make this code shared among platforms needing it? I would
guess so, but then the copy in arch/mips/dec/prom/ would have to be
removed and macros in <asm/dec/prom.h> adjusted according to the new API
which you didn't do in your change. Also why the need for stack
switching? It looks like an unnecessary complication to me, any firmware
callbacks exported have to maintain stack integrity or they would be
unusable. Is that to work around some SNI firmware quirk?
[And does it work for the SNI in the first place? -- it looks to me like
`o32_stk' has an alignment problem (8 is required for the stack pointer in
the o32 ABI though 4 will often be enough to satisfy hardware); but
perhaps it just happens to get correct alignment by virtue of merely
always following a data object that enforces it, hmm...]
No wonder it was so weird to debug, and I guess I need to build the
64-bit config a bit more often...