On Tue, May 04, 1999 at 04:00:00PM -0700, William J. Earl wrote:
> > Let me point out that SGI has invented an almost genious workaround for a
> > R10000 bug that only hits systems without I/O cache coherency, that is the
> > Indigo2 and O2.
> The R10000 "bug" is, in a sense, a feature, in that it improves
> performance, and is harmless on machines with cache-coherent I/O.
> Specifically, on a speculative store miss (a cache miss due to a
> speculatively executed store instruction), the R10000 fetches the line
> dirty-exclusive and marks it modified, in anticipation of the store.
> If, however, the speculatively executed store never graduates (is
> never committed), the line is left dirty, even though it has not been
> modified. If the line happens to be part of a buffer into which data
> is being DMAed, a subsequent victim writeback of the dirty cache line
> might overwrite good data from the DMA with the obsolete data in the
> cache line. This means that, one way or the other, a system with
> non-cache-coherent I/O and an R10000 must avoid allowing the
> processor to perform a speculative store miss with respect to memory
> into which a DMA is taking place.
> Note that the Indigo2 and O2 have somewhat different workarounds.
> The Indigo2 deals with the kernel side using a special compilation mode,
> and the O2 deals with the kernel side using a special hardware feature
> plus a generalization of the solution for the user mode part of the problem.
> Both deal with the user mode by invalidating TLB entries for pages into
> which data is being transferred via DMA, so that the processor cannot
> resolve the virtual address, and hence cannot speculatively fetch
> a cache line at that address, while the DMA is in progress. The kernel
> side is harder, since the TLB is not used for K0SEG and XKPHYS address
> spaces, which is where things get complicated.
> I can provide the details to someone who is really interested
> in working on this, but, as Dave Olson indicated, you don't want to
> start on this unless you have a LOT of spare time.
There is a number of embedded systems which need top end horse power and
are therefore based on the R10000. I bet many of these systems are also
had to work around this R10000 non-coherent I/O problem using the same or
similar tricks as SGI did. So I hope somebody will be interested solving