linux-mips
[Top] [All Lists]

Re: Octeon crash in virt_to_page(&core0_stack_variable)

To: Cosmin Ratiu <cratiu@ixiacom.com>
Subject: Re: Octeon crash in virt_to_page(&core0_stack_variable)
From: David Daney <david.daney@cavium.com>
Date: Fri, 09 Sep 2011 09:59:05 -0700
Cc: linux-mips@linux-mips.org, netdev@vger.kernel.org
In-reply-to: <201109091623.29000.cratiu@ixiacom.com>
References: <201109091623.29000.cratiu@ixiacom.com>
Sender: linux-mips-bounce@linux-mips.org
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Fedora/3.0.10-1.fc12 Thunderbird/3.0.10
On 09/09/2011 06:23 AM, Cosmin Ratiu wrote:
Hello,

I've been investigating a strange crash and I wanted to ask for your help.
The crash happens when virt_to_page is called with an address from the softirq
stack of core 0 on Cavium Octeon. It may happen on other MIPS processors as
well, but I'm not sure.

I've attached a simple kernel module to demonstrate the problem and the output
of dmesg + the crash. Two seconds after inserting the module, the kernel
should crash.

 From what I've dug up in the kernel sources, it seems the stack for the first
idle task resides in the data segment (mapped in kseg2) while the rest are
allocated with kmalloc in __cpu_up() and reside in a different area (CAC_BASE
upwards).
It seems virt_to_phys produces bogus results for kseg2 and after that,
virt_to_page crashes trying to access invalid memory.

This problem was discovered when doing BGP traffic with the TCP MD5 option
activated, where the following call chain caused a crash:

  * tcp_v4_rcv
  *  tcp_v4_timewait_ack
  *   tcp_v4_send_ack ->  follow stack variable rep.th
  *    tcp_v4_md5_hash_hdr
  *     tcp_md5_hash_header
  *      sg_init_one
  *       sg_set_buf
  *        virt_to_page

I noticed that tcp_v4_send_reset uses a similar stack variable and also calls
tcp_v4_md5_hash_hdr, so it has the same problem.

I don't fully understand octeon mm details, so I wanted to bring up this issue
in order to find a proper fix.
To avoid the problem, I've implemented a quick hack to declare those variables
percpu instead of on the stack, so they would also reside in CAC_BASE upwards.
I've attached a patch against 2.6.32 for reference.

Cosmin.


[...]
[ 2040.300/0] Call Trace:
[ 2040.300/0] [<ffffffffc123a054>] vcrash+0x54/0x80 [vcrash]
[ 2040.300/0] [<ffffffffc0065f28>] run_timer_softirq+0x198/0x23c
[ 2040.300/0] [<ffffffffc00609e0>] __do_softirq+0xd8/0x188

                  ^^^^^^^^^ CKSEG2 addresses detected!

You are using the out-of-tree mapped kernel patch which mucks about with the implementation of virt_to_phys().

Can you reproduce the TCP related crash in an unpatched kernel?

If not, then it would point to problems in the out-of-tree patches you have applied.

David Daney

<Prev in Thread] Current Thread [Next in Thread>