[Top] [All Lists]

[RFC & PATCH] fixing tlb flush race problem on smp

Subject: [RFC & PATCH] fixing tlb flush race problem on smp
From: Jun Sun <>
Date: Tue, 21 Jan 2003 14:37:26 -0800
Original-recipient: rfc822;
User-agent: Mutt/1.2.5i
Many of us are aware of a hole in current TLB flushing code that
could cause processes using the same ASID for a SMP machine.

Actually there are several problems:

1) get_new_mmu_context() and following set_entryhi, etc are
not called automically in switch_mm() and active_mm().  If
an IPI happens and request to flush local tlb, bad things happen.

2) if local_flush_tlb_range() and local_flush_tlb_mm() are 
called from an IPI, they may call get_new_mmu_context() which
can bump up the ASID generation number with current active_mm
totally not aware of it.  Bad things will happen later.

3) during the time window after schedule() calling switch_mm()
before switch_to(), current->active_mm may be valid but does
really mean "current->active_mm" anymore.  This is because
the "current" process will soon become "prev".  The real active_mm
is actually "next->active_mm".  Because of this, it is not
enough for those two IPI'ed flushing routines to just check
again current->active_mm.  Long story made short - bad
things will happen.

It turns out that other arches have similar problems and solved
it in various ways.  Unfortunely I like none of them.

Here is one I am pretty happy with.  It is very small and efficient.
And conceptually it is clean too.  We basically keep the semantics
of ->mm and ->active_mm unchanged and only introduce a new bit
to mark which mm is the true owner of mmu hardware on a cpu.

The only downside is that cpu_vm_mask variable does not really
mean "mask for blocking IPI" in this approach.  It actually 
indicates whether current->active_mm is really active or not.  

Tested and passed the notorious fork/malloc test.

Let me know what you think.


Attachment: junk
Description: Text document

<Prev in Thread] Current Thread [Next in Thread>