firstname.lastname@example.org (Thomas Bogendoerfer) writes:
> On Wed, Dec 05, 2007 at 01:16:13AM -0500, Kumba wrote:
>> I've been out of it lately -- did the gcc side of things ever make it in,
>> or do we need to go push on that some more?
> We need push on that. Looking at
> there seems to be a missing understanding, why the cache
> barriers are needed.
Heh. Quite probably. Which bit of my message don't you agree with?
FWIW, I was going off the original message as posted here:
The explanation of the chosen workaround seemed to be left to this bit
All is well with coherent IO systems. On non coherent
systems like Indigo2 and O2 this creates a race
condition with DMA reads (IO->mem) where a stale
cached data can be written back over the DMAed data.
This issue was figured out late the the R10K I2
design cycle. The problem was fixed by modifying
the compiler and assembler to issue a cache barrier
instruction to address 0(sp) as the first instruction
in basic blocks that contain stores to registers
other than $0 and $sp.
and from a compiler point of view, it would be nice to know
_why_ that was a reasonable workaround. What I was really
looking for was: (a) a short description of the problem,
(b) a list of assumptions that the compiler is going to
make when working around the problem and (c) a description
of what said workarounds are.
My understanding of (a) is that, if a store is speculatively executed,
the target of the store might be fetched into cache and marked dirty.
We therefore want to avoid the speculative execution of stores if:
(1) the addressed memory might be the target of a later DMA operation.
If the DMA completes before the "dirty" cache line is flushed,
the cached data might overwrite the DMAed data.
(2) the addressed memory might be to IO-mapped cached memory
(usually through the address being garbage). The cached
data will be written back to the IO region when flushed.
We also want to avoid speculative execution of loads if:
(3) the addressed memory might be to load-sensitive IO-mapped cached
memory (usually through the address being garbage). The hardware
would "see" loads that aren't actually executed.
Is that vaguely accurate?
I tried to piece together (b) by asking questions in the reviews,
but it would be great to have a single explanation.
The idea behind (c) is simple, of course: we insert a cache barrier
before the potentially-problematic stores (and, for certain
configurations, loads, although the original gcc patch had the
associated macro hard-wired to false). The key is explaining how,
from a compiler internals viewpoint, we decide what is "potentially-
problematic". This ties in with the assumptions for (b).
I'm sure my attempt at (a) above can be improved upon even if it's
vaguely right. But...
> I guess the patch could be improved
> by pointing directly to the errata section of the R10k
> user manual.
...I think an integrated explanation of (a), (b) and (c) above
would be better than quoting large parts of the processor manual.
The processor manual is aimed at a much broader audience and has
a lot of superfluous info. It also doesn't explain what _our_
assumptions are and what our chosen workaround is.