linux-mips
[Top] [All Lists]

Re: Tracking down exception in sched.c

To: Mark E Mason <mark.e.mason@broadcom.com>
Subject: Re: Tracking down exception in sched.c
From: Rojhalat Ibrahim <imr@rtschenk.de>
Date: Mon, 20 Feb 2006 13:09:12 +0100
Cc: linux-mips@linux-mips.org
In-reply-to: <7E000E7F06B05C49BDBB769ADAF44D0773A636@NT-SJCA-0750.brcm.ad.broadcom.com>
Original-recipient: rfc822;linux-mips@linux-mips.org
References: <7E000E7F06B05C49BDBB769ADAF44D0773A636@NT-SJCA-0750.brcm.ad.broadcom.com>
Sender: linux-mips-bounce@linux-mips.org
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050919
Hi,

I tracked this one down to 88a2a4ac6b671a4b0dd5d2d762418904c05f4104
(percpu data: only iterate over possible CPUs). I don't know if this
is the correct way to fix this, but the following patch makes the
problem go away for me.

--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6021,7 +6021,7 @@ void __init sched_init(void)
        runqueue_t *rq;
        int i, j, k;

-       for_each_cpu(i) {
+       for (i = 0; i < NR_CPUS; i++) {
                prio_array_t *array;

                rq = cpu_rq(i);

Any other suggestions, how to fix this?

Thanks,
Rojhalat Ibrahim


Mark E Mason wrote:
> [Cross-posted from LKML]
>  
> Hello all,
>  
> Working from the linux-mip.org repository (which just recently merged
> from the kernel.org repository), we've been getting exceptions on
> several different processors due to NULL pointer dereferences in
> sched.c.  These happen on SMP systems only (but both 32 and 64-bit
> systems trigger this problem).
>  
> The Oops output and surrounding text (w/ backtrace) is below.  What I've
> traced is down to so far is that enqueue_task() gets called with a ready
> queue (rq) where (rq->active == NULL).
> 
> Backtracing a bit, the following patch triggers an earlier, slightly
> more controlled failure:
> 
> [mason@hawaii linux.git]$ git diff kernel/sched.c diff --git
> a/kernel/sched.c b/kernel/sched.c
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -1264,6 +1264,7 @@ static int try_to_wake_up(task_t *p, uns  #endif
> 
>         rq = task_rq_lock(p, &flags);
> +       BUG_ON(rq->active == NULL);
>         old_state = p->state;
>         if (!(old_state & state))
>                 goto out;
> 
> 
> My question is, is the above assert valid (ie. Should rq->active always
> be non-NULL at this point)?  It seems like it should be, but I'm pretty
> new to this code, and thought I should double-check before going off
> into the weeds.
> 
> If anyone has any ideas about where specifically to look for the
> underlying problem, I'd appreciate it.
> 
> Thanks (very much) in advance,
> Mark Mason
> mason@broadcom.com
> Newberg, Oregon
>  

<Prev in Thread] Current Thread [Next in Thread>