On Mon, Dec 15, 2003 at 03:27:17AM +0100, Ralf Baechle wrote:
> On Sun, Dec 14, 2003 at 04:26:05PM +0000, Peter Horton wrote:
>
> > When mapping an executable image into user space the kernel reads data
> > into the page cache and then maps the page into user space. For an
> > executable page no copy is done as the mapping is read only.
>
> Correct.
>
> The kernel may also share writable pages until they're actually written to.
> This is called copy-on-write (COW). But executable pages usually aren't
> COW so this case isn't meaningful for us.
>
> > On my Qube
> > the acting of reading data from the IDE via PIO causes the data to be
> > placed in the D-cache (the RM52xx cache does write allocate), but the
> > page never gets flushed to physical memory and so suffers from cache
> > aliasing problems when it's mapped into user space.
> >
> > By enabling DMA on the IDE interface (it's off in the default Cobalt
> > config) the kernel suddenly becomes stable (the page in the page cache
> > never gets pulled into the D-cache).
> >
> > This seems to be a generic kernel problem - all architectures with VI
> > caches and write allocate policies could trigger it.
>
> Now that's where I'm getting some doubts about your explanation. Assume
> we're paging in a page that isn't mapped yet:
>
> In this case do_no_page() will load the page. Any DMA cache coherency
> issues are supposed to be handled by the driver. That means for an
> executable page all that's missing is ensuring the I-cache is coherent.
> This is done in these two lines:
>
> [...]
> flush_page_to_ram(new_page);
> flush_icache_page(vma, new_page);
> [...]
> update_mmu_cache(vma, address, entry);
> [...]
>
> flush_page_to_ram is (and must be!) a no-op. So the burden is entirely
> upto flush_icache_page and update_mmu_cache. Note flush_dcache_page
> never enters the picture when mapping an executable because the file has
> not been written to. So let's see flush_icache_page:
>
> static void r4k_flush_icache_page(struct vm_area_struct *vma,
> struct page *page)
> {
> /*
> * If there's no context yet, or the page isn't executable, no icache
> * flush is needed.
> */
> if (!(vma->vm_flags & VM_EXEC))
> return;
>
> All this is only about I-cache coherence. That is we do nothing at all if
> this isn't an executable page.
>
> /*
> * Tricky ... Because we don't know the virtual address we've got the
> * choice of either invalidating the entire primary and secondary
> * caches or invalidating the secondary caches also. With the subset
> * enforcment on R4000SC, R4400SC, R10000 and R12000 invalidating the
> * secondary cache will result in any entries in the primary caches
> * also getting invalidated which hopefully is a bit more economical.
> */
> if (cpu_has_subset_pcaches) {
> unsigned long addr = (unsigned long) page_address(page);
> r4k_blast_scache_page(addr);
>
> return;
> }
>
> This section is only needed for certain processors such as the R4000SC.
> That is it's not of interest here either.
>
> if (!cpu_has_ic_fills_f_dc) {
> unsigned long addr = (unsigned long) page_address(page);
> r4k_blast_dcache_page(addr);
> }
>
> But cpu_has_ic_fills_f_dc is always zero on Nevada. Which means we're
> going to flush the page's kernel address from the D-cache here.
>
> /*
> * We're not sure of the virtual address(es) involved here, so
> * we have to flush the entire I-cache.
> */
> if (cpu_has_vtag_icache) {
> int cpu = smp_processor_id();
>
> if (cpu_context(cpu, vma->vm_mm) != 0)
> drop_mmu_context(vma->vm_mm, cpu);
>
> ... cpu_has_vtag_icache is zero on Nevada so the else case will be taken:
> } else
> r4k_blast_icache();
>
> so we just blast away the entire I-cache. Coherency the hard way. At
> this point we've established I-cache coherency for executable pages.
>
> But what this was a non-executable page? Then flush_icache_page would do
> nothing at all - nor would update_mmu_cache. The page will be copied to
> userspace and ... whoops, data may still be in the wrong cache segment,
> game over. This also explains a few other bugs.
>
I could see the aliases at the end of do_no_page() (using memcmp()) and
from the code knew they had to be read only so I just assumed they were
executable pages. I missed the fact that flush_icache_page() flushed the
D-cache page. So like you say it must be non-executable read only pages
that cause the problem.
> > So where's the correct place to put the flush_dcache_page() ? :-)
> >
> > I don't know whether the problem could affect any other IO subsystems
> > ... probably SCSI at least.
>
> As you describe it it doesn't seem specific to any particular kind of
> device - only DMA or PIO matters; and the DMA coherency thing happens to
> paint over the issue which must be why it wasn't discovered for so long.
>
So how do we fix it ? Flushing the page really needs to be done just
after the IO into the page cache is complete so we only do it once per
page cache page ?
P.
|