linux-mips-fnet
[Top] [All Lists]

Re: building an elf64 R10k kernel

To: "William J. Earl" <wje@fir.engr.sgi.com>
Subject: Re: building an elf64 R10k kernel
From: Ralf Baechle <ralf@gnu.org>
Date: Thu, 6 May 1999 14:11:35 +0200
Cc: Dave Olson <olson@anchor.engr.sgi.com>, Charles Lepple <clepple@foo.tho.org>, linux@engr.sgi.com, linux-mips@fnet.fr, linux-mips@vger.rutgers.edu
In-reply-to: <199905042300.QAA17970@fir.engr.sgi.com>; from William J. Earl on Tue, May 04, 1999 at 04:00:00PM -0700
References: <372E6AA0.505A6071@foo.tho.org> <199905040354.UAA16791@anchor.engr.sgi.com> <19990505001606.E1063@uni-koblenz.de> <199905042300.QAA17970@fir.engr.sgi.com>
On Tue, May 04, 1999 at 04:00:00PM -0700, William J. Earl wrote:

>  > Let me point out that SGI has invented an almost genious workaround for a
>  > R10000 bug that only hits systems without I/O cache coherency, that is the
>  > Indigo2 and O2.
> ...
> 
>      The R10000 "bug" is, in a sense, a feature, in that it improves
> performance, and is harmless on machines with cache-coherent I/O.
> Specifically, on a speculative store miss (a cache miss due to a
> speculatively executed store instruction), the R10000 fetches the line
> dirty-exclusive and marks it modified, in anticipation of the store.
> If, however, the speculatively executed store never graduates (is
> never committed), the line is left dirty, even though it has not been
> modified.  If the line happens to be part of a buffer into which data
> is being DMAed, a subsequent victim writeback of the dirty cache line
> might overwrite good data from the DMA with the obsolete data in the
> cache line.  This means that, one way or the other, a system with
> non-cache-coherent I/O and an R10000 must avoid allowing the
> processor to perform a speculative store miss with respect to memory
> into which a DMA is taking place.
> 
>      Note that the Indigo2 and O2 have somewhat different workarounds.
> The Indigo2 deals with the kernel side using a special compilation mode,
> and the O2 deals with the kernel side using a special hardware feature
> plus a generalization of the solution for the user mode part of the problem.
> Both deal with the user mode by invalidating TLB entries for pages into
> which data is being transferred via DMA, so that the processor cannot
> resolve the virtual address, and hence cannot speculatively fetch
> a cache line at that address, while the DMA is in progress.  The kernel
> side is harder, since the TLB is not used for K0SEG and XKPHYS address
> spaces, which is where things get complicated.
> 
>      I can provide the details to someone who is really interested
> in working on this, but, as Dave Olson indicated, you don't want to
> start on this unless you have a LOT of spare time. 

There is a number of embedded systems which need top end horse power and
are therefore based on the R10000.  I bet many of these systems are also
had to work around this R10000 non-coherent I/O problem using the same or
similar tricks as SGI did.  So I hope somebody will be interested solving
that problem.

  Ralf

<Prev in Thread] Current Thread [Next in Thread>