linux-mips-fnet
[Top] [All Lists]

Re: Problems with multiple harddisks on my Indigo2

To: "Ian Chilton" <mailinglist@ichilton.co.uk>, linux-mips@fnet.fr
Subject: Re: Problems with multiple harddisks on my Indigo2
From: Matthias Heidbrink <mh@cs.tu-berlin.de>
Date: Thu, 22 Jun 2000 19:51:45 +0200
In-reply-to: <NAENLMKGGBDKLPONCDDOAELFCNAA.mailinglist@ichilton.co.uk>; from mailinglist@ichilton.co.uk on Wed, Jun 21, 2000 at 01:43:07PM +0100
References: <20000621112343.A19912@spock> <NAENLMKGGBDKLPONCDDOAELFCNAA.mailinglist@ichilton.co.uk>
Hi,


On Wed, Jun 21, 2000 at 01:43:07PM +0100, Ian Chilton wrote:
> > cp: /mnt/redhat/kernel23/linux/arch/sparc/lib/COPYING.LIB: 
> > Input/output error
> > And dmesg shows:
> > attempt to access beyond end of device
> > 08:03: rw=0, want=959546428, limit=1888830

> I have an SGI Indy, with 2 internal hard disks, a 4GB and a 1GB
> 
> I get the same thing!
> 
> 
> > Also I'm getting a message
> > sc1,2,0: cmd=0x12 timeout after 2 sec.  Resetting SCSI bus
> 
> I started getting this when I added the 2nd HD (1GB).

I've got an Indy (which is still running IRIX) with an IBM DCAS and know the 
problem. This happens nearly always when cold-starting the machine. When 
switching it off and on again immediately, it does not happen. So I think
that it must be a problem of the spin-up time of some drives.

Further the Indy seems to be extremely sensible for bad SCSI wiring. It 
requires active termination both internally and externally. The original 
internal SCSI cable has got a terminator yet, so termination of all drives
(except an external drive that is last in the chain, if no external SCSI
terminator is used) must be disabled. Also be carefully with the quality
cables for external devices. Theoretically very old SCSI-I devices that get 
confused when Fast-SCSI signals are on the bus could also cause 
problems. 

But the first problem does not sound like a problem of the SCSI hardware,
more like being on the file system level. 
The first possibility is that there are errors in calculating offsets on 
the disk. But I don't think that that's very probable because that should 
also cause problems on other Linux systems with a similiar CPU architecture 
or with other applications. 
The second possibility is that someone (a driver or probably system-dependant
other code) is randomly writing in the memory (probably access to an invalid 
pointer, but it could be anything, also errors with the cache management) and
just by chance hits a place where filesystem metadata are stored.
I've had errors like this in "ordinary" applications and it can cause very 
interesting effects...

Ciao, Matthias
-- 
Matthias Heidbrink     E-Mail: 
Bundesratufer 12       Matthias_Heidbrink@b.maus.de  
10555 Berlin, Germany  mh@cs.tu-berlin.de         
Tel. +49-30-8536361    mh@carano.de   (at work; http://www.carano.com)

<Prev in Thread] Current Thread [Next in Thread>