On Sat, 2002-06-08 at 00:15, Vivien Chappelier wrote:
> Here is a proposal for a software workaround to speculative
> execution on a non-coherent system such as the i2 R10k and the o2 R10k.
> 1. Problem:
> The R10000 processor can (and will) execute intructions ahead.
> These instructions will be cancelled if they're not supposed to execute,
> e.g. if a jump happened. If a load or store instruction is executed
> speculatively, and the accessed memory is not in the cache, the cache
> line will be fetched in main memory and, on a store, be marked dirty.
> These speculative loads and stores can happen anywhere, since there might
> be old values in registers used in a speculative load/store
> instruction that would be cancelled afterwards.
> The problem is:
> - on a speculative load, the fetched cache line will remain in the
> cache even if the speculative load is cancelled
> - on a speculative store, the *dirty* cache line will remain in
> the cache even if the speculative store is cancelled
> On non-coherent systems we need to flush the cache lines to main
> memory before doing DMA to device, so that the device can see them. We
> also need to invalidate lines before reading from a DMA'd buffer to make
> sure the CPU will read main memory and not the cache.
> However, if a speculative load or store happens during DMA
> transfer, the cache line will be fetched from memory and, on a store,
> be marked dirty. That means this cache line could be evicted when the
> line is needed, thus being written back in memory if it was dirty,
> thus overwritting the data a device could have put in the DMA
> buffer. Something we really don't want to happen ;)
> 2. Proposed solution
> Speculative execution will not happen in the following conditions:
> - access to memory is uncached
> - the speculated instruction causes an exception: that
> also means a speculative load/store will not happen in a mapped memory
> region which doesn't have a TLB line for it.
> This second point means that any mapped space can be made safe by
> removing the DMA'd buffer address translations from the TLB or by marking
> them 'uncached' during DMA transfer.
> The remaining unmapped adress spaces are:
> - kseg1, which is safe since uncached
> - kseg0, which can turned uncached with the K0 bits
> from the CPO Config register
> - xkphys which will cause adress error if the KX bit is
> not set, the aborting the speculative load/store before it can do harm ;)
> Since we need to turn KX off, xkseg will not be accessible
> either.. and since we need to have KSEG0 uncached, we need to remap the
> kernel elsewhere if we want performance ;). We could use the xsseg
> segment, available in Supervisor mode, which is mapped (safe) and moreover
> allows to access all memory (on o2 it can be up to 2G I think, whereas in
> 32bit mode, only 512Mb would be accessible). So the proposed workaround is
> to permanently map the lower 16MB of memory in xsseg in using a wired TLB
> entry and a page size of 16MB. This memory would not be usable for
> DMA. Everything else would, so we could for example reserve the upper 16Mb
> for DMA (and give them to the DMA zoned memory allocator). On exception or
> error, the handler (in KSEG0) would set CU0 to allow access to CPO, then
> switch to Supervisor mode and jump to the equivalent xsseg location and
> continue execution in Supervisor mode. The code for returning to userland
> would need to clear the CU0 bit, to prevent user access to CP0.
> Before DMA transfer, the DMA'd buffer cache lines would be
> flushed, and then it would be remapped 'uncached', thus preventing that
> any speculative load or store to this memory happens during
> transfer. After the DMA transfer, the cache would be invalidated to make
> sure main memory is read, and the DMA buffer would be remapped 'cacheable
> A diagram is attached to illustrate the workaround. Comments,
> suggestions (and even flames) are welcome before anyone starts coding
> the workaround ;)
Looks like a good start but unfortunately your concept is ignoring
userspace entirely. You might have to deal with mmapped pages that are
being written to backing storage. In such a case you'll have to
trackdown all mappings and disable access to them and that's something
that is pretty hard in the current memory managment code. The solution
would be Rik's rmap patches which themselves are still under
development.. Intially I guess you can simply avoid this hairy job by
doing all DMA using bounce buffers only. Expensive but doable ...
(Testing evolution ...)