linux-mips
[Top] [All Lists]

Re: Performance bug in c-r4k.c cache handling code

To: "Maciej W. Rozycki" <macro@linux-mips.org>
Subject: Re: Performance bug in c-r4k.c cache handling code
From: Dominic Sweetman <dom@mips.com>
Date: Tue, 20 Sep 2005 14:18:39 +0100
Cc: Dominic Sweetman <dom@mips.com>, Thiemo Seufer <ths@networkno.de>, linux-mips@linux-mips.org
In-reply-to: <Pine.LNX.4.61L.0509201017220.23494@blysk.ds.pg.gda.pl>
Original-recipient: rfc822;linux-mips@linux-mips.org
References: <20050919154056.GG3386@hattusa.textio> <Pine.LNX.4.61L.0509191733180.5551@blysk.ds.pg.gda.pl> <17199.53696.27856.801284@mips.com> <Pine.LNX.4.61L.0509201017220.23494@blysk.ds.pg.gda.pl>
Sender: linux-mips-bounce@linux-mips.org
Maciej W. Rozycki (macro@linux-mips.org) writes:

> Besides new CPUs more often than not 
> require changes to kernel-level software anyway.

Making sure that isn't so is the reason why there's a MIPS32/64 spec
(with all the privileged operations defined).  Which also avoids the
undesirable development step of new hardware combined with new kernel
software... 

> > How did you measure the high throughput?  Have you got a
> > machine with DMA-coherency you can turn on and off?
> 
>  I just disabled invalidations. ;-)

Ouch.  So the effect could have come from a variety of sources.

> That was an R4400 with 1MB of S-cache.

With an R4400 S-cache, any difference between "would write it back but
it's clean" and "just invalidate" is likely to be small, since in
either case the time will be dominated by the (external) cache tag
memory RMW operation.

> Eventually I should benchmark both invalidation variations against each 
> other with the system in question and see if it makes any difference.  

Indeed.  And it might also be a good idea to test a more modern
system, too, to see how big an effect this might be.

> Ironically this is where the write-back cache of the R4k gives loss
> rather than gain as compared to the write-through cache of the R3k
> (the system supports daughtercards with either CPU, so useful
> comparison is possible)...

Maybe.  But remember, on the R3K every write was a write through, and
they all had a cost in bus congestion, which may have delayed a
following read and held up the CPU (or the write buffer may have
filled and stalled the CPU). 

I think up to about 33MHz write-through remained a tolerable policy
for 1988-era memory systems; any faster than that and you just sank
under a flood of writes.  2005-era memory systems are much faster when
bursting, but the time they take to process a single write cycle has
improved by less than 2x.  So write-through is still a really bad idea
for 100MHz CPUs using off-chip memory.

Even when your device requires you to push out all the data it can be
more efficient to write data to the cache and then force writeback to
memory: at least that way the data goes to the memory in efficient
burst cycles.

--
Dominic


<Prev in Thread] Current Thread [Next in Thread>