Ulf Carlsson writes:
> Sometimes when this happens I think I only get a SIGSEGV or a SIGBUS,
> I get internal compiler errors. It's hard to say since these problems are
> hard to reproduce, and I forget what happens from time to time. I have
> unfortunately not written down the results. It sounds like this may be the
> cause of the type of file corruption I have when only a little part of the
> is damaged (sounds like the problem covers both icache and dcache). That
> of file corruption goes away after reboot. I haven't had a chance to try
> with my discard-disk-cache program since this happens very seldom..
> > What model of CPU do you have in your machine?
> I have a 133 MHz R4600 with 512kb board cache, 16kb dcache and 16kb icache.
I have been looking at the fault handling and the cache flushing routines
for the R4600. In do_no_page() in mm/memory.c, we have:
I don't see where any code invalidates the icache, which might have
cached lines from a previous incarnation of the page.
flush_page_to_ram(), for the R4600, essentially does a writeback of
the dcache, if I understand the code correctly. I believe that an
icache invalidate is also needed, at least for executable pages
(including any page for which mprotect() with PROT_EXEC has been
called, not just for text pages from an executable file). Also,
unless something has changed, my understanding is that conflicting
virtual aliases (in the dcache) are still possible, which will also
lead to data corruption when it happens.
In particular, if process A mmaps a file page at virtual index
0 and process B happens to mmap the same file page at virtual index
1, they will in general corrupt each other's view of the data.
There is a comment in memory.c that a non-present page shouldn't
be cached, but it is not yet clear to me that this is guaranteed for
the icache. Also, the flush_page_to_ram() slows down processing on
machines which physical cache tags, for cases where the virtual
index used by the kernel and the virtual index used by the application
are the same. It should have an extra argument of the intended user virtual
address, so that it can decide whether to flush or not on architectures
such as MIPS.
Handling the virtual index conflicts requires dynamic ownership
switching (including cache flushing), which means that we have to record
those hardware-valid PTEs currently referencing the page, so that we can
invalidate the PTEs and flush the cache when a fault happens for a mapping
of a different color. We could take a brute-force approach, and record
just one mapping, forcing a fault on each use of a different message,
which would allow us to keep the reverse map in an array parallel to mem_map,
or we could use some more complex structure to record mappings. Also,
to reduce the frequency of conflicts, address assignment in do_mmap()
should take cache color into account on machines with virtually indexed
caches which lack hardware cache coherency (such as the R4000PC, R4600,