linux-mips
[Top] [All Lists]

Re: corruption of load instruction offset

To: "Chuck Meade" <chuckmeade@mindspring.com>, <linux-mips@linux-mips.org>
Subject: Re: corruption of load instruction offset
From: "Kevin D. Kissell" <kevink@mips.com>
Date: Mon, 3 Apr 2006 09:25:42 +0200
Cc: "Chuck Meade \(mindspring\)" <chuckmeade@mindspring.com>
Original-recipient: rfc822;linux-mips@linux-mips.org
References: <IIEEICKJLNEPBBDJICNGEECHKIAA.chuckmeade@mindspring.com>
Sender: linux-mips-bounce@linux-mips.org
That's pretty twisted - one could almost believe that the fetch from
0x8021e28c got corrupted to pick up the most significant 16 bits
of the instruction at 0x8021e22c or 0x8021e26c - but given that
instructions are fetched and issued word-by-word, it's hard to see
where that could happen, in either CPU hardware or software. 
What is the I-cache line size? If it  were me, I'd check my clocks, 
voltages, and above all my RAM timing, and I'd re-seat my CPU 
and RAM in their sockets...

            Regards,

            Kevin K.

----- Original Message ----- 
From: "Chuck Meade" <chuckmeade@mindspring.com>
To: <linux-mips@linux-mips.org>
Cc: "Chuck Meade (mindspring)" <chuckmeade@mindspring.com>
Sent: Monday, April 03, 2006 6:12 AM
Subject: corruption of load instruction offset


> Hello,
> 
> I am seeing a very interesting/worrisome bug on an RM7965 cpu, which has
> an E9000 core.  I am running 2.6.14-rc1.  Please take a look at the
> behavior I describe and send me your thoughts.  Thanks.
> 
> The error message is immediately below.  Notice that the epc is 8021e28c,
> and the BadVA is 87e39681, and register 4 (a0) is 87e38660.
> 
> Now scan down below the error message, to the disassembly of move_32bytes.
> If you look at the instruction at 8021e28c, it appears harmless enough.  
> Nothing to cause an unaligned access or invalid instruction.  But look
> about 6 lines above that, and we are loading at offsets from a0.  The
> offsets from a0 in those 4 load instructions are 16, 20, 24, and 28.  If
> you look at the opcodes in the column to the left, those offsets appear in
> the least significant 16 bits of the opcode.
> 
> Now look again at the value of a0 in the register dump:  87e38660.  And
> at the BadVA value:  87e39681.  The BadVA is offset exactly 0x1021 from
> a0.  This indicates that we somehow tried to access memory at offset 
> 0x1021 from a0.  However, we never should have done that according to
> the disassembly.  *But* there are many instructions in the vicinity which
> have a least significant 16 bits of 0x1021.  None of them are loads from a0,
> but I believe that this is the root of the problem.  Something is happening
> here, possibly an interrupt, or a cpu bug(?) that is causing the load from
> a0 to use an offset of 0x1021 (the least significant 16-bits of many of
> the nearby instructions) rather than the correct offset for the load
> instruction, which is found in the least significant 16-bits of the actual
> load instructions.
> 
> This is not "quickly" reproducible.  I run a TCP blaster/blastee test between
> this machine and Linux PC, and at some point during the run (sometimes much
> later) this error appears.
> 
> Thanks for your ideas,
> Chuck
> 
> Error message:
> 
> Unhandled kernel unaligned access or invalid instruction in 
> arch/mips/kernel/unaligned.c::emulate_load_store_insn, line 487[#1]:
> Cpu 0
> $ 0   : 00000000 10004ce8 00000000 00000000
> $ 4   : 87e38660 000005a8 00000000 00000000
> $ 8   : 00000000 00000000 00000020 00000000
> $12   : 00000000 80402000 00000001 00000000
> $16   : 00000000 87e171a0 000005a8 87c1f060
> $20   : 87e380e0 004009e0 10004740 00002ad8
> $24   : 00000008 803171c0
> $28   : 8120a000 8120bd48 00000000 802deb30
> Hi    : 0000000c
> Lo    : 000d4bf8
> epc   : 8021e28c move_32bytes+0x64/0x88     Not tainted
> ra    : 802deb30 tcp_sendmsg+0x460/0xd80
> Status: 90018403    KERNEL EXL IE
> Cause : 00000010
> BadVA : 87e39681
> PrId  : 00003422
> Modules linked in:
> Process blaster (pid: 162, threadinfo=8120a000, task=8050b3f8)
> Stack : 8120bdd0 00000000 812fd4a0 8120bdf0 8120bd70 87e18520 00000001 
> 00000000
>         8120be40 7fffffff 00000000 8120bf18 8120be14 00000000 000005a8 
> 000005a8
>         000032e8 00000001 00000000 90018400 8120be40 00005dc0 10001458 
> 8120bf18
>         00000005 004009e0 10011044 10010000 10010fd4 8028e7a8 00000020 
> ffffffff
>         00000001 00000000 00005dc0 10001458 87e18520 00005dc0 812fd4a0 
> 004009e0
>         ...
> Call Trace:
>  [<8028e7a8>] sock_aio_write+0x10c/0x12c
>  [<8016bef8>] do_sync_write+0xd0/0x128
>  [<801037d4>] do_IRQ+0x24/0x34
>  [<804203cc>] init+0xd8/0xe4
>  [<8013cf78>] autoremove_wake_function+0x0/0x44
>  [<8016c020>] vfs_write+0xd0/0x144
>  [<8016c020>] vfs_write+0xd0/0x144
>  [<8016c074>] vfs_write+0x124/0x144
>  [<8016c150>] sys_write+0x24/0x98
>  [<8016c180>] sys_write+0x54/0x98
>  [<8016c154>] sys_write+0x28/0x98
>  [<801037d4>] do_IRQ+0x24/0x34
>  [<8010b260>] stack_done+0x20/0x3c
> 
> 
> 
> Disassembly of relevant portion of move_32bytes:
> 
> 8021e228 <move_32bytes>:
> 8021e228:       8c880000        lw      t0,0(a0)
> 8021e22c:       8c890004        lw      t1,4(a0)
> 8021e230:       8c8b0008        lw      t3,8(a0)
> 8021e234:       8c8c000c        lw      t4,12(a0)
> 8021e238:       00481021        addu    v0,v0,t0
> 8021e23c:       0048182b        sltu    v1,v0,t0
> 8021e240:       00431021        addu    v0,v0,v1
> 8021e244:       00491021        addu    v0,v0,t1
> 8021e248:       0049182b        sltu    v1,v0,t1
> 8021e24c:       00431021        addu    v0,v0,v1
> 8021e250:       004b1021        addu    v0,v0,t3
> 8021e254:       004b182b        sltu    v1,v0,t3
> 8021e258:       00431021        addu    v0,v0,v1
> 8021e25c:       004c1021        addu    v0,v0,t4
> 8021e260:       004c182b        sltu    v1,v0,t4
> 8021e264:       00431021        addu    v0,v0,v1
> 8021e268:       8c880010        lw      t0,16(a0)
> 8021e26c:       8c890014        lw      t1,20(a0)
> 8021e270:       8c8b0018        lw      t3,24(a0)
> 8021e274:       8c8c001c        lw      t4,28(a0)
> 8021e278:       00481021        addu    v0,v0,t0
> 8021e27c:       0048182b        sltu    v1,v0,t0
> 8021e280:       00431021        addu    v0,v0,v1
> 8021e284:       00491021        addu    v0,v0,t1
> 8021e288:       0049182b        sltu    v1,v0,t1
> 8021e28c:       00431021        addu    v0,v0,v1
> 8021e290:       004b1021        addu    v0,v0,t3
> 8021e294:       004b182b        sltu    v1,v0,t3
> 8021e298:       00431021        addu    v0,v0,v1
> 8021e29c:       004c1021        addu    v0,v0,t4
> 8021e2a0:       004c182b        sltu    v1,v0,t4
> 8021e2a4:       00431021        addu    v0,v0,v1
> 8021e2a8:       30b8001c        andi    t8,a1,0x1c
> 8021e2ac:       24840020        addiu   a0,a0,32
> 
> 
> 
> 
> 

<Prev in Thread] Current Thread [Next in Thread>