linux-mips
[Top] [All Lists]

Re: sgiwd93 multiple disk problem

To: "Kevin D. Kissell" <kevink@mips.com>
Subject: Re: sgiwd93 multiple disk problem
From: Florian Lohoff <flo@rfc822.org>
Date: Tue, 10 Apr 2001 00:55:28 +0200
Cc: ralf@oss.sgi.com, linux-mips@oss.sgi.com, rmurray@cyberhqz.com
In-reply-to: <20010409213632.B22835@paradigm.rfc822.org>; from flo@rfc822.org on Mon, Apr 09, 2001 at 09:36:32PM +0200
Organization: rfc822 - pure communication
References: <20010403174749.B4135@paradigm.rfc822.org> <013801c0bc58$7abe2700$0deca8c0@Ulysses> <20010409213632.B22835@paradigm.rfc822.org>
Sender: owner-linux-mips@oss.sgi.com
User-agent: Mutt/1.3.15i
Ok - more info

1. The bug only happens when accessing 2 disks/devices on the SAME controller
2. The bug is in stopping the HPC DMA to early.
3. Stopping the DMA to early happens only on reads.


If applying this patch on the driver 

Index: drivers/scsi/sgiwd93.c
===================================================================
RCS file: /cvs/linux/drivers/scsi/sgiwd93.c,v
retrieving revision 1.27
diff -u -r1.27 sgiwd93.c
--- drivers/scsi/sgiwd93.c      2001/03/26 00:38:20     1.27
+++ drivers/scsi/sgiwd93.c      2001/04/09 22:12:10
@@ -183,6 +187,10 @@
        printk("dma_stop: status<%d> ", status);
 #endif
 
+       if (hregs->ctrl & HPC3_SCTRL_ACTIVE)
+               printk("DMA still active dir %d bresid %d\n",
+                       hdata->dma_dir,
+                       SCpnt->SCp.buffers_residual);
        /* First stop the HPC and flush it's FIFO. */
        if(hdata->dma_dir) {
                hregs->ctrl |= HPC3_SCTRL_FLUSH;


I get this output on the console which means files/data
got corrupted.

---------------schnipp---------------------
DMA still active dir 1 bresid 6
DMA still active dir 1 bresid 1
DMA still active dir 1 bresid 3
DMA still active dir 1 bresid 1
DMA still active dir 1 bresid 0
DMA still active dir 1 bresid 0
DMA still active dir 1 bresid 0
DMA still active dir 1 bresid 2
DMA still active dir 1 bresid 0
DMA still active dir 1 bresid 2
---------------schnapp---------------------

This might also be the cause why metadata has never been corrupted.
Metadata are mostly not read in large chunks which get unchecked
dumped to disk again.

One suspicion i had was around the modification of the wd33 registers in
the sgi driver (Its the only driver to do so) which based on the thesis
that on the wd33 driver we act on the current scatter_gather item
and subtract the total length - But this is simply wrong.

Flo
-- 
Florian Lohoff                  flo@rfc822.org             +49-5201-669912
     Why is it called "common sense" when nobody seems to have any?


<Prev in Thread] Current Thread [Next in Thread>