> Like the original R3000 (but not the R4000 and R4400) the R4600 has a
> 5-stage pipeline, and always executes the instruction which follows
> any taken branch (the "branch delay slot"). So long as the branch
> delay slot instruction does useful work (and it usually can on the
> R4x00) there's no branch penalty for any R4600 CPU.
> The R4600 was a complete redesign of the R4000-without-secondary-cache
> made faster by:
> o Changing the 8-stage R4000/4400 pipeline back to 5 stages;
> o Bigger, cleverer (2x16Kbytes, 2-way set associative) caches;
> o On a data cache miss the R4600 restarts as soon as it gets the data
> it wants; the R4000/4400 waits for the whole line.
> It really did work.
> "Version 2.0" which Ralf referred to might be the R4700, an updated
> pin-compatible design. But I think the only tweaks in the R4700 are
> to the floating point unit. The real speed-up comes with the R5000,
> which all reports indicate is pretty neat.
No, it's definately a R4600.
> > So far this is the highes scoring MIPS box that I've tested myself
> > :-)
> There's an interesting point. Potential MIPS users often ask us what
> indication we can give them of performance for MIPS vs other
> architectures. Does anyone out there on planet Linux have some
> big-program "benchmarks" which give any leads to MIPS vs x86
> system performance?
Oh, I was talking about BogoMIPS which are - just as the name says -
bogus. They're not a real benchmark, but a meassure of how fast
1: bnez reg,1b
is executed. This is internally used in the kernel for short delays.
The loop is competly executed in the primay cache. The interesting is
now that the first MIPS machine I ported Linux to - a Deskstation Tyne
with a R4600/133MHz - needed three cycles to execute this loop while
this machine seems to execute the loop in *one* cycle. From my under-
standing this cannot have been caused by something outside the CPU, so
it looks as if there were changes to the CPU.
There are two versions of the R4600; my Tyne had version 1.0 as far as
I remember while the SNI box has version 2.0. The 2.0 chip has some
bugfixes; most important is probably the way it accesses primary caches.
This bug can be worked around by disabeling interrupts during cacheflush
or (untested ...) flushing way B before way A of the cache.
What type of benchmark would you account as real world benchmark?
If you need "realworld" benchmarks probably some guy at MTI might supply
you more representative data than I could.