linux-mips
[Top] [All Lists]

IP27: Random hard locks after ~16hrs uptime

To: Linux MIPS List <linux-mips@linux-mips.org>
Subject: IP27: Random hard locks after ~16hrs uptime
From: Joshua Kinard <kumba@gentoo.org>
Date: Sat, 07 Feb 2015 21:58:29 -0500
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1423364331; bh=KZh1jqrDfrvxLmuzu5N5AH09x8KRRCYFzh1gg8PAbLc=; h=Received:Received:Message-ID:Date:From:MIME-Version:To:Subject: Content-Type; b=vZj0pQ4PpuCTD6Uvjai0PlhBrCGugG+5GZc79Qq3oulrfUeq9mbP0GoiUYx2H5vTD UdAF9FraqNoU1hl8ga78jAY25fEiUBMNjAOZPvP7wI7MXb4Y/CeTieA4wrBHkrKnT0 iq0x92pmj/hXawEe9TZXp5YNiZvIqAp30S1Kcpg/ubbRiWiCjXy0jpts6vPM3MS4jS 0NAiMKXWeBp2beA2heZFxehNUyABjN8JboWxL43YHD0Bkw1ffIABiulNWdk2xHbBeM xcgkdrIcLFceT+xUfBXiV0q6Ra2u+eUMUoB3Uc1/53ExzJUrZzamggxEtRw4LLLjmQ KpdZ/omyu5GRw==
List-archive: <http://www.linux-mips.org/archives/linux-mips/>
List-help: <mailto:ecartis@linux-mips.org?Subject=help>
List-id: linux-mips <linux-mips.eddie.linux-mips.org>
List-owner: <mailto:ralf@linux-mips.org>
List-post: <mailto:linux-mips@linux-mips.org>
List-software: Ecartis version 1.0.0
List-subscribe: <mailto:ecartis@linux-mips.org?subject=subscribe%20linux-mips>
List-unsubscribe: <mailto:ecartis@linux-mips.org?subject=unsubscribe%20linux-mips>
Original-recipient: rfc822;linux-mips@linux-mips.org
Sender: linux-mips-bounce@linux-mips.org
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
I've had my Onyx2 running quite a bit lately doing compile runs, and it seems
that after about ~16 hours, there's a random possibility that the machine just
completely stops.  No errors printed anywhere, serial becomes completely
unresponsive.  I have to issue a 'rst' from the MSC to bring it back up again.

It's currently got dual IP31 R14000 node boards (500MHz), and for the most
part, runs great (I'll regret the electric bill later...).  Clearly a bug,
though, but I am not sure where to start debugging on this platform to find
this bug, since I can't trigger it manually.  Even tried an NMI interrupt,
since this machine has an NMI handler in the kernel, but all that does is reset
the machine.

Already ran an extensive memory test from the PROM and had no issues with that.
 Haven't tried running any of the more thorough hardware tests from IRIX, 
though.

Ideas?

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
4096R/D25D95E3 2011-03-28

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

<Prev in Thread] Current Thread [Next in Thread>