[Top] [All Lists]

Re: Question concerning cache coherency

To: "Jeff Harrell" <>, "sgi-mips" <>
Subject: Re: Question concerning cache coherency
From: "Kevin D. Kissell" <>
Date: Thu, 20 Jan 2000 01:49:04 +0100
Cc: "Ralf Baechle" <>, "bbrown" <>, "vwells" <>, "kmcdonald" <>, "mhassler" <>
>I have an interesting issue that I would like to run past the MIPS/Linux
>newsgroup.  I am
>currently porting the MIPS/Linux code to a development board that has a
>IDT64475 MIPS
>core (64-bit R4xxx core).  I notice that this part does not have any
>method of maintaining
>cache coherency (i.e., no hardware support for cache coherency).  It is
>highly likely that we
>will be plugging in a network card on a PCI bus that would be DMA'ing to
>a shared memory
>space in SDRAM.  I assume that the problem of cache coherency is fixed
>by mapping the shared
>memory as uncached.  I have not dug into the network drivers (or the
>kernel) enough to know whether
>this is how the problem is addressed on typical MIPS architectures.  I
>guess I have two questions
>related to this issue;  Do devices that DMA, typically access uncached
>memory  and if so, is a second buffer
>required to copy from kernel to user space?  The second question is
>concerning the performance hit in
>running out of uncached memory,  Have people seen significant
>performance degradation when
>using uncached memory.  Any insight that anybody can provide would be
>greatly appreciated.

While some MIPS CPUs have mechanisms for hardware
cache coherence, many of them do not, and even systems
with coherent-I/O-capable CPUs often do not implement
the necessary protocol.

There are two basic options for dealing with caches
and DMA I/O:   flush the caches, or operate on
non-cached memory.  Sometimes one does both.  
A random  buffer being handed to a driver must be 
assumed  to have some portion of its contents cached, 
and  must be explicity flushed to memory (via 
hit_writeback_invalidate Cache instructions, or
dma_cache_wback_invalidate() calls in Linux) 
before being  presented to a DMA device.  

There's  a bit more discretion for data structures that 
are private to the driver/device.  If a data structure 
is going to be manipulated a great deal by the CPU 
before being DMAed, it will be worthwhile to treat it 
as cached and flush it out to memory when it is 
released to the I/O device.   If a data structure is
constantly shared between CPU and I/O, it is may be 
better to treat it as uncached rather than constantly
invoke the cache flush procedure.  There's a lot of
grey area in between where the optimal choice is
implementation and application dependent.

In an ethernet driver for a chip like a Lance or a Tulip ,
for example, which autonomously processes lists of 
buffers, the shared buffer descriptor lists might be treated 
as uncached  by the CPU, but transmit buffers coming 
in from further up the protocol stack and empty receive
buffers allocated from the general memory pool might 
be explicity flushed before being turned over to the I/O 

Simple OS's like Linux (at least through 2.2.x) map the
kernel code and data through the kseg0/kseg1 mappings 
to physical memory, which makes it really simple to create 
an uncached data structure.  Including asm/io.h provides
a KSEG1ADDR() macro which just does an AND and an 
OR to generate an uncached alias.  This only works
for systems with 512M or less of memory, BTW.

Great care must be taken with uncached aliases, since
the behaviour of MIPS CPUs is not well defined if uncached
and cached accesses to the same location (or cache line)
are mixed.  I recommend allocating twice the maximum
cache line size (less 1 byte if you like) of kernel memory
in addition to the size of any data structure, and forcing
the alignment of the structure to the first cache line
boundary within the allocated block.  This should ensure
that no cached allocation of memory (or cached malloc
control structure) overlaps with the data structure, and
that it is thus safe to transform the pointer to the new
data structure to the kseg1 uncached form.  Of course,
if the structure is ever to be deallocated, the original
allocation address must be recoverable somehow.


            Kevin K.

<Prev in Thread] Current Thread [Next in Thread>