Date: Wed, 3 Apr 1996 20:53:20 -0600
From: Miguel de Icaza <email@example.com>
I took a quick glance at the patch, and I think that a bunch of people
would benefit if you tell us (with a cc to linux-kernel) what is the
general idea behind the patch (it's difficult to see the exact context
in some places of what is being done).
Right you are. Here is what is going on in the 1.3.83 changes I made.
First of all, the name invalidate() itself doesn't mean very much.
And if it means anyone to someone, it only makes sense if your tlb and
cache architectures are coherent with each other, the i386 and alpha
(???) are like this so the traditional naming scheme works ok. But on
most RISC based cache/mmu architectures it makes more sense to split
up the operations in two (and as you will see later on, there is a
third operation needed).
Where '*' describes the extent to which page tables are changing. We
keep the old granularity levels for the new scheme:
See, on many machines, the cache and the tlb are seperate entities and
have different rules for getting rid of old stale data. As an example
I will show how this all functions on the HyperSparc MBUS module.
The HyperSparc has a VIPT (virtually indexed, physically tagged) level
2 cache, it also has an on-chip ICACHE (instruction cache). The level
2 cache can be either 128k or 256k in size, the ICACHE is 8k in size.
Flushing the HyperSparc cache is a process where one must be careful,
in order to flush properly the chip must make sure that it doesn't
flush a cache line that supervisor software isn't asking to be
flushed. Therefore, a real page translation can be initiated for the
check for a tag match during a cache flush (the tags a physical
remember). So if we did something like:
whoope... we just got rid of the mapping that the chip needs to check
for a flush cache-tag match and the processor will take a fault due to
the missed translation. This is undesirable.
There are two more elements to fully support the vast array of cache
architecture out there. I've only partially added one of them at this
time. It is for the benefit of copy-back style caches (I use the
hypersparc in copy-back mode for SMP kernels, the sun4 cache
architecture is also copy-back in nature). Consider the following
section of code during a copy-on-write fault:
whoops, we have a lot of problems going on here, first the
aforementioned necessity to flush the cache before changing the page
tables, let's fix that first:
Ok, much better. There is still a problem here with copy-back
caches. Note that what the kernel is doing here is copying the kernel
space copies of a page to make sure the faulting process gets an
exclusive copy of the page which it can then write to. This is fine,
but after this stream of code the cache is in a very insistant state.
Watch what happens:
1) user COW faults at address 0x10000
2) Kernel copies exclusive page to the user
using kernel addresses
old_page = 0xf0ef0000
new_page = 0xf0184000
(addresses are for illustration purposes only
to show the case we are trying to avoid)
At this point old_page and new_page are in the level 2 copy-back
cache, they have not reached real memory yet. The cache only knows
about the identity of this page based upon the virtual and physical
address of the page, HyperSparc is VIPT remember.
3) Kernel flushes the user page
4) Kernel makes the page table entry for the user
5) Kernel flushes the tlb
6) User reads/writes to page 0x10000 but this misses the cache
and goes to real memory
Oops... the cache doesn't know to go to the kernel aliases for the
pages which were used during the copy, the index and tag wouldn't
match. We need to validate main memory when the kernel does stuff
like this to keep copy-back caches happy. Here is the resultant code:
Much better. flush_page_to_ram() flushes a kernel address such that
it reaches main memory (and therefore when the user misses the cache
later on it will get the right data from main memory).
The last and final thing necessary for true multi-architure support
for the mm code has to do with SMP. I won't discuss this one until I
work out the fine points with Linus and Alan as to how it should
really be used.
I hope this helps people understand my changes much better.
David S. Miller