Hi,
I tracked this one down to 88a2a4ac6b671a4b0dd5d2d762418904c05f4104
(percpu data: only iterate over possible CPUs). I don't know if this
is the correct way to fix this, but the following patch makes the
problem go away for me.
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6021,7 +6021,7 @@ void __init sched_init(void)
runqueue_t *rq;
int i, j, k;
- for_each_cpu(i) {
+ for (i = 0; i < NR_CPUS; i++) {
prio_array_t *array;
rq = cpu_rq(i);
Any other suggestions, how to fix this?
Thanks,
Rojhalat Ibrahim
Mark E Mason wrote:
> [Cross-posted from LKML]
>
> Hello all,
>
> Working from the linux-mip.org repository (which just recently merged
> from the kernel.org repository), we've been getting exceptions on
> several different processors due to NULL pointer dereferences in
> sched.c. These happen on SMP systems only (but both 32 and 64-bit
> systems trigger this problem).
>
> The Oops output and surrounding text (w/ backtrace) is below. What I've
> traced is down to so far is that enqueue_task() gets called with a ready
> queue (rq) where (rq->active == NULL).
>
> Backtracing a bit, the following patch triggers an earlier, slightly
> more controlled failure:
>
> [mason@hawaii linux.git]$ git diff kernel/sched.c diff --git
> a/kernel/sched.c b/kernel/sched.c
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -1264,6 +1264,7 @@ static int try_to_wake_up(task_t *p, uns #endif
>
> rq = task_rq_lock(p, &flags);
> + BUG_ON(rq->active == NULL);
> old_state = p->state;
> if (!(old_state & state))
> goto out;
>
>
> My question is, is the above assert valid (ie. Should rq->active always
> be non-NULL at this point)? It seems like it should be, but I'm pretty
> new to this code, and thought I should double-check before going off
> into the weeds.
>
> If anyone has any ideas about where specifically to look for the
> underlying problem, I'd appreciate it.
>
> Thanks (very much) in advance,
> Mark Mason
> mason@broadcom.com
> Newberg, Oregon
>
|