I had some time this weekend to revisit an old problem:
When testing Dezhong's patch, I found that un-kmapped pages could get
passed into dma_map_sg(). The original patch did not handle this case.
This resulted in serious DMA coherency issues on my system, along the
lines of "/sbin/init segfaults on boot."
I found that this calling sequence, seen during SATA disk I/O, was the
__do_page_cache_readahead() allocates a bunch of pages for the I/O
requests, but I do not see them getting mapped to kernel addresses via
kmap() / kmap_atomic(). This explains why (PageHighMem(page) &&
!kmap_high_get(page)) can be true.
I do not think there is anything wrong with this behavior, so it seems
like dma_map_sg() will need to handle it somehow. Looking at other
The PPC approach (__dma_sync_page_highmem()) is to disable IRQs and call
kmap_atomic() to create a temporary mapping for the page. Disabling
IRQs is necessary because it is possible (but not required) for the
dma_* functions to be invoked from interrupt context; it also disables
preemption. kmap_atomic() guarantees a unique mapping per CPU.
Interestingly, PPC does not use kmap_high_get() at all.
The ARM approach (kmap_high_l1_vipt()) recognizes that it is not always
desirable to keep IRQs disabled during DMA flushes, so the ARM
maintainers implemented a sort of "reentrant kmap_atomic()" that allows
multiple contexts to share the same pte by saving and restoring whatever
was there prior to the DMA sync.
Due to its use of kmap_high_get(), the ARM approach suffers from problem
#2, below. It is also more complex and harder to test. I'm not sure
how to recreate some of the worst corner cases, e.g. hardirq cacheflush
interrupts a softirq cacheflush which interrupted a user cacheflush.
Dezhong's patch was largely based on the ARM scheme, minus the
kmap_high_l1_vipt() logic. I am sending an update to this patch which
attempts to imitate the PPC approach instead. There is a considerable
amount of reuse since both strategies require similar modifications to
dma-default.c, in order to get that code to pass around "struct page"
pointers rather than kseg0 addresses.
Consider this sequence:
The first thing kmap_high() does is lock_kmap(), which disables
interrupts on kmap_high_get() architectures:
#define lock_kmap() spin_lock_irq(&kmap_lock)
#define lock_kmap() spin_lock(&kmap_lock)
smp_call_function() may not be called with interrupts disabled:
* You must not call this function with disabled interrupts or from a
* hardware interrupt handler or from a bottom half handler.
So, on SMP, we get warnings like:
------------[ cut here ]------------
WARNING: at kernel/smp.c:293 smp_call_function_single+0x17c/0x260()
Modules linked in:
---[ end trace 8abde6adefbcc81f ]---
I did some digging and found that ARM runs into the same problem. For
processors that cannot "broadcast" TLB operations, SMP + HIGHMEM are
So, I would opt for the PPC approach in order to avoid this conflict.
Disabling SMP will result in a much nastier performance problem than
blocking interrupts during flushes.
Regarding David's flush_data_cache_page() concern:
There are at least 4 reasons to flush the cache:
1) Boot time (cache contents are undefined at reset)
2) DMA coherence
3) I$/D$ coherence (self-modifying code)
4) Zap cache aliases
My interpretation of the code is that flush_dcache_page() is only called
for #3 and #4:
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
static inline void flush_dcache_page(struct page *page)
if (cpu_has_dc_aliases || !cpu_has_ic_fills_f_dc)
The fact that the L2 / board cache is not flushed in c-r4k.c leads me to
believe that #2 is not an intended use of this function:
static inline void local_r4k_flush_data_cache_page(void * addr)
r4k_blast_dcache_page((unsigned long) addr);
Besides, we already have a standard, documented DMA API that should be
Since this is not boot time, and HIGHMEM is incompatible with cache
aliases, that leaves #3 as a possible issue. Is this function ever used
for I$/D$ coherence?
I should note that ARM has special handling in __flush_dcache_page() for
high pages. IIRC, at one point they tried to support HIGHMEM on systems
with cache aliases, before deciding it was not feasible. Maybe this
code is just a relic - or maybe it's really needed for something.