On Mon, Apr 9, 2012 at 9:13 AM, Steven J. Hill <sjhill@mips.com> wrote:
> This reverts commit d759e6bf49a41edd66de7c6480b34cceb88ee29a.
It may be clearer to say "refactors" instead of "reverts," as this
patch reimplements the same feature in a different way.
If the change was completely reverting the MIPS *_kernel_vmap_range
functions, that would make me very uneasy as XFS will not run reliably
on my systems without them.
Also, the commit hash is d9cdc901af0f92da7f90c750d8c187f5500be067 in
Linus' tree.
> The
> flush_kernel_vmap_range() and invalidate_kernel_vmap_range() are DMA
> cache flushing operations and should be done via _dma_cache_wback_inv()
> and _dma_cache_inv().
I think there is some ambiguity here.
Documentation/cachetlb.txt says:
"Since kernel I/O goes via physical pages, the I/O subsystem assumes
that the user mapping and kernel offset mapping are the only aliases.
This isn't true for vmap aliases, so anything in the kernel trying to
do I/O to vmap areas must manually manage coherency. It must do this
by flushing the vmap range before doing I/O and invalidating it after
the I/O returns."
"The design is to make this area safe to perform I/O on."
This appears to support your statement.
However, actual usage suggests that the *_kernel_vmap_range functions
are used way up at the filesystem layer just to get the data out of
the "wrong colored" D$ lines on systems with cache aliases:
if (xfs_buf_is_vmapped(bp)) {
flush_kernel_vmap_range(bp->b_addr,
xfs_buf_vmap_len(bp));
}
submit_bio(rw, bio);
Then eventually, something like dma_map_sg() is called from a lower
level driver (ala ata_sg_setup()), to perform the full L1+L2
cacheflush on each page involved in the transaction.
I would also note that on ARM these operations turn into NOPs on VIPT
non-aliasing CPUs, even if their caches are noncoherent:
static inline void flush_kernel_vmap_range(void *addr, int size)
{
if ((cache_is_vivt() || cache_is_vipt_aliasing()))
__cpuc_flush_dcache_area(addr, (size_t)size);
}
I suspect that reimplementing the *_kernel_vmap_range functions using
_dma_cache_* would result in a double L2 flush on the same memory
regions on systems with cache aliases, and an unnecessary L1+L2 flush
on systems without aliases.
I don't see any way that the *_kernel_vmap_range functions could be
called in lieu of the dma_map_* functions, as they are missing
critical elements such as "returning the mapped PA" and calls to the
plat_* functions. So, any memory getting flushed through
*_kernel_vmap_range will still need to use dma_map_* for DMA.
Is there a reason why Ralf's original approach was not workable?
|