[Top] [All Lists]

Re: building an elf64 R10k kernel

To: "William J. Earl" <>
Subject: Re: building an elf64 R10k kernel
From: Ralf Baechle <>
Date: Thu, 6 May 1999 14:11:35 +0200
Cc: Dave Olson <>, Charles Lepple <>,,,
In-reply-to: <>; from William J. Earl on Tue, May 04, 1999 at 04:00:00PM -0700
References: <> <> <> <>
On Tue, May 04, 1999 at 04:00:00PM -0700, William J. Earl wrote:

>  > Let me point out that SGI has invented an almost genious workaround for a
>  > R10000 bug that only hits systems without I/O cache coherency, that is the
>  > Indigo2 and O2.
> ...
>      The R10000 "bug" is, in a sense, a feature, in that it improves
> performance, and is harmless on machines with cache-coherent I/O.
> Specifically, on a speculative store miss (a cache miss due to a
> speculatively executed store instruction), the R10000 fetches the line
> dirty-exclusive and marks it modified, in anticipation of the store.
> If, however, the speculatively executed store never graduates (is
> never committed), the line is left dirty, even though it has not been
> modified.  If the line happens to be part of a buffer into which data
> is being DMAed, a subsequent victim writeback of the dirty cache line
> might overwrite good data from the DMA with the obsolete data in the
> cache line.  This means that, one way or the other, a system with
> non-cache-coherent I/O and an R10000 must avoid allowing the
> processor to perform a speculative store miss with respect to memory
> into which a DMA is taking place.
>      Note that the Indigo2 and O2 have somewhat different workarounds.
> The Indigo2 deals with the kernel side using a special compilation mode,
> and the O2 deals with the kernel side using a special hardware feature
> plus a generalization of the solution for the user mode part of the problem.
> Both deal with the user mode by invalidating TLB entries for pages into
> which data is being transferred via DMA, so that the processor cannot
> resolve the virtual address, and hence cannot speculatively fetch
> a cache line at that address, while the DMA is in progress.  The kernel
> side is harder, since the TLB is not used for K0SEG and XKPHYS address
> spaces, which is where things get complicated.
>      I can provide the details to someone who is really interested
> in working on this, but, as Dave Olson indicated, you don't want to
> start on this unless you have a LOT of spare time. 

There is a number of embedded systems which need top end horse power and
are therefore based on the R10000.  I bet many of these systems are also
had to work around this R10000 non-coherent I/O problem using the same or
similar tricks as SGI did.  So I hope somebody will be interested solving
that problem.


<Prev in Thread] Current Thread [Next in Thread>