linux-mips
[Top] [All Lists]

Re: system lockup with 2.6.29 on Cavium/Octeon

To: Greg Ungerer <gerg@snapgear.com>
Subject: Re: system lockup with 2.6.29 on Cavium/Octeon
From: Ralf Baechle <ralf@linux-mips.org>
Date: Wed, 20 May 2009 15:26:04 +0100
Cc: linux-mips@linux-mips.org
In-reply-to: <4A139F50.7050409@snapgear.com>
Original-recipient: rfc822;linux-mips@linux-mips.org
References: <4A139F50.7050409@snapgear.com>
Sender: linux-mips-bounce@linux-mips.org
User-agent: Mutt/1.5.18 (2008-05-17)
On Wed, May 20, 2009 at 04:12:32PM +1000, Greg Ungerer wrote:

> I have a system lockup problem that I have been looking at on a custom
> Cavium/Octeon 5010 based design. I am running on linux-2.6.29 with
> David Daney's latest round of PCI and ethernet patches (posted here
> on this list).
>
> I have tracked the problem back to local_flush_tlb_kernel_range() in
> arch/mips/mm/tlb-r4k.c. At the top of this function is:
>
>     void local_flush_tlb_kernel_range(unsigned long start, unsigned long 
> end)
>     {
>         unsigned long flags;
>         int size;
>
>         ENTER_CRITICAL(flags);
>         size = (end - start + (PAGE_SIZE - 1)) >> PAGE_SHIFT;
>         size = (size + 1) >> 1;
>         if (size <= current_cpu_data.tlbsize / 2) {
>
> The problem is that typical example values I see passed in for start
> and end are:
>
>     start = c000000000006000
>     end   = ffffffffc01d8000
>
> Now the vmalloc area starts at 0xc000000000000000 and the kernel code
> and data is all at 0xffffffff80000000 and above. I don't know if the
> start and end are reasonable values, but I can see some logic as to
> where they come from. The code path that leads to this is via
> __vunmap() and __purge_vmap_area_lazy(). So it is not too difficult
> to see how we end up with values like this.

Either start or end address is sensible but not the combination - both
addresses should be in the same segment.  Start is in XKSEG, end in CKSEG2
and in between there are vast wastelands of unused address space exabytes
in size.

> But the size calculation above with these types of values will result
> in still a large number. Larger than the 32bit "int" that is "size".
> I see large negative values fall out as size, and so the following
> tlbsize check becomes true, and the code spins inside the loop inside
> that if statement for a _very_ long time trying to flush tlb entries.
>
> This is of course easily fixed, by making that size "unsigned long".
> The patch below trivially does this.
>
> But is this analysis correct?

Yes - but I think we have two issues here.  The one is the calculation
overflowing int for the arguments you're seeing.  The other being that
the arguments simply are looking wrong.

There are a few more instances of the same overflow issue which the patch
below is fixing.

  Ralf


 arch/mips/mm/tlb-r3k.c |    6 ++----
 arch/mips/mm/tlb-r4k.c |    6 ++----
 arch/mips/mm/tlb-r8k.c |    3 +--
 3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/mips/mm/tlb-r3k.c b/arch/mips/mm/tlb-r3k.c
index f0cf46a..1c0048a 100644
--- a/arch/mips/mm/tlb-r3k.c
+++ b/arch/mips/mm/tlb-r3k.c
@@ -82,8 +82,7 @@ void local_flush_tlb_range(struct vm_area_struct *vma, 
unsigned long start,
        int cpu = smp_processor_id();
 
        if (cpu_context(cpu, mm) != 0) {
-               unsigned long flags;
-               int size;
+               unsigned long size, flags;
 
 #ifdef DEBUG_TLB
                printk("[tlbrange<%lu,0x%08lx,0x%08lx>]",
@@ -121,8 +120,7 @@ void local_flush_tlb_range(struct vm_area_struct *vma, 
unsigned long start,
 
 void local_flush_tlb_kernel_range(unsigned long start, unsigned long end)
 {
-       unsigned long flags;
-       int size;
+       unsigned long size, flags;
 
 #ifdef DEBUG_TLB
        printk("[tlbrange<%lu,0x%08lx,0x%08lx>]", start, end);
diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
index 9619f66..892be42 100644
--- a/arch/mips/mm/tlb-r4k.c
+++ b/arch/mips/mm/tlb-r4k.c
@@ -117,8 +117,7 @@ void local_flush_tlb_range(struct vm_area_struct *vma, 
unsigned long start,
        int cpu = smp_processor_id();
 
        if (cpu_context(cpu, mm) != 0) {
-               unsigned long flags;
-               int size;
+               unsigned long size, flags;
 
                ENTER_CRITICAL(flags);
                size = (end - start + (PAGE_SIZE - 1)) >> PAGE_SHIFT;
@@ -160,8 +159,7 @@ void local_flush_tlb_range(struct vm_area_struct *vma, 
unsigned long start,
 
 void local_flush_tlb_kernel_range(unsigned long start, unsigned long end)
 {
-       unsigned long flags;
-       int size;
+       unsigned long size, flags;
 
        ENTER_CRITICAL(flags);
        size = (end - start + (PAGE_SIZE - 1)) >> PAGE_SHIFT;
diff --git a/arch/mips/mm/tlb-r8k.c b/arch/mips/mm/tlb-r8k.c
index 4f01a3b..4ec95cc 100644
--- a/arch/mips/mm/tlb-r8k.c
+++ b/arch/mips/mm/tlb-r8k.c
@@ -111,8 +111,7 @@ out_restore:
 /* Usable for KV1 addresses only! */
 void local_flush_tlb_kernel_range(unsigned long start, unsigned long end)
 {
-       unsigned long flags;
-       int size;
+       unsigned long size, flags;
 
        size = (end - start + (PAGE_SIZE - 1)) >> PAGE_SHIFT;
        size = (size + 1) >> 1;

<Prev in Thread] Current Thread [Next in Thread>