linux-mips
[Top] [All Lists]

Re: Qube2 slowly dies

To: Glyn Astill <glynastill@yahoo.co.uk>
Subject: Re: Qube2 slowly dies
From: "Kevin D. Kissell" <kevink@paralogos.com>
Date: Wed, 10 Jun 2009 20:39:20 -0700
Cc: linux-mips@linux-mips.org
In-reply-to: <137040.69938.qm@web23605.mail.ird.yahoo.com>
Original-recipient: rfc822;linux-mips@linux-mips.org
References: <137040.69938.qm@web23605.mail.ird.yahoo.com>
Sender: linux-mips-bounce@linux-mips.org
User-agent: Thunderbird 2.0.0.21 (X11/20090318)
Your description sounds an awful lot like failures I've seen when interrupts get lost or blocked for some reason (could be hardware, the kernel, or some interaction between them). Have you looked at /proc/interrupts to see if "Spurious" interrupts are occurring, or if the rate of serviced timer and I/O interrupts decreases or increases as the system degrades? When the system becomes unresponsive, by any chance does it "wake up" after 10-20 minutes (the time for the Count register to wrap)?

If other Qube2s don't exhibit this behavior with a given Linux kernel, but yours does, and yet yours runs NetBSD OK, it suggests that there's a difference in interrupt setup/handling between the two systems that just happens to work around a hardware problem on your board.

         Regards,

         Kevin K.

Glyn Astill wrote:
Hi people,

I've been directed here from the Debian lists by Martin Michlmayr. I'm running 
lenny on a qube2 128mb ram / 40gb disk.

I've tried kernels 2.6.26 and 2.6.30~rc8 and the issue I'm about to describe is 
present in both, I haven't tried any other kernels - but I will try 2.6.22 when 
I can.

Essentially the machine gets more and more sluggish until it finally dies. I've 
had a quick look in meminfo and I can't see that it's running out of memory, 
and I'm not sure what else to check?

I find it hard to describe what's going off, but here's a scenario I hope 
illustrates the problem. The configure script is just an example of doing 
something - I could easily have extracted an archive with tar or something for 
the same results;

- I start 2 ssh sessions and in one start configure for the postgres source, in 
the other I just started top.

- And for a while all seems fine; configure ticks away and top refreshes every 
second.

- Then top stops ticking over - but it'll refresh with a keypress. Anyway I 
exit top and try to run it again... nothing. I hit ctrl-c which brings me back 
to the prompt and I try again... nothing.

- The configure script is still ticking over slowly.

- I try "ps ax" - it works; so I try it again... nothing.

- I try "ipcs" and "lsof" they both work and seem to keep working.

- I try "ps ax" again... nothing. I hit ctrl-c and now it doesn't come back to 
the command prompt for a while.. say 5 minutes and eventually it's back.

- It's still going. Some commands still work, some just do nothing. 
proc/meminfo shows it's not eaten all the memory.

- If I try to start another ssh session I can log in, I get the motd, but I 
don't get to the shell.

- Eventually the configure script ends, and all shells come back to the prompt. But it now seems 
totally braindamaged, I can run "ps ax" but "top" and other commands still do 
nothing. Heres strace attached to the top process:

deb:~# strace -p 7228
Process 7228 attached - interrupt to quit
_newselect(0, NULL, NULL, NULL, {0, 500013}

- Then after a little while the whole thing becomes unresponsive.


Can anyone confirm they've seen the same behaviour or direct me what to look 
into?

Thanks
Glyn




<Prev in Thread] Current Thread [Next in Thread>