Daniel Veillard writes:
> just a few remarks concerning The Board design :
> - maybe we should put the L2 cache memory management
> even if most of us won't use it (SRAM == $$$), in order
> to preserve the expandability of the beast. Upgrading
> to R4400 without L2 cache may not be very interesting.
> - What format of SIMMs will be needed, 36 or 72 bits?
> I bet most of us spent much $$$ in 36 bit SIMMs and
> would be disapointed if these cannot fit onto their
> Super Board. But there is probably a bandwith problem
> with 36 bits SIMMs.
> Maybe the HW specialists here could comment on these points,
Although I'm not one of the "HW specialists" you've asked for comments,
I have been thinking about the whole memory subsystem for a while now.
I have the following items to throw out for discussion:
1. The R4200 is a 64-bit data bus. This means either we run 64-bit
memory bus or we have bus sizing hardware. Both have their
pros and cons:
A 64-bit bus means twice as many traces on the board, twice as
many signals to glitch in a flakey design, more board real
estate for those additional traces, and memory must be either
72-bit SIMMs (less common = more $$) or pairs of 36-bit SIMMs
(= more board real estate, and memory must be added in pairs).
A 32-bit bus has fewer traces (and therefore fewer of the problems
cited above), but it requires bus sizing H/W. Bus sizing H/W
is tricky business at 33Mhz and beyond. Plus, it *seriously*
slows down the processor if the processor stalls, since *2*
memory accesses (at page mode rates) are required for every
CPU memory access.
On balance, I'd say that a 64-bit data bus is easier to get
right, and it should be the higher performing design.
2. Since the R4200 is a 64-bit data bus, then a level 2 cache must
be 64-bits wide. This strikes me as being *very* hard to do
in a cost-effective manner.
3. If we do without a level 2 cache (see #2), then the next obvious
place to try to squeeze more performance is by interleaving the
DRAM. Let's see, a 64-bit data bus 2-way interleaved is 128
traces (yuck!), and 4-way interleaved would be 256 traces
(har! har! har!). The 2-way (128 trace) interleave is do-able,
but I think the 4-way (256 trace) is unreasonable.
4. What about the RAMTRON EDRAM part? (DM 1M36SJ) It uses a JEDEC
standard form (= least cost sockets), has a 512x36 integrated
cache with a 2Kbit-wide interface to the DRAM (yowza!) so a
cache-miss causes a 2048-bit cache fill in a 35ns cycle time.
All memory access goes through the integrated cache, so cache
coherency between the CPU and any funky DMA peripherals is a
Cons: - it's a 36-bit part, so they'd need to be added in pairs.
- it's not the ubiquitous PC SIMM, so those folks who
are/were planning to use PC SIMMs are out-of-luck.
(or they run without cache: the RAMTRON DRAM controller
handles standard SIMMs, too!)
- it is not really a set-associative cache; it only has one
working-set. This means that highly localized references
win, but random accesses cause cache misses.
That last con I think could be a killer. That and the cost of
the part. Does anyone have a cost for the EDRAM?
I suppose a hybrid 2-way interleaved EDRAM solution could be done,
but for that much trouble it seems like a level 2 cache would be
no harder, and probably higher performing.
Texas Instruments firstname.lastname@example.org -or-
PO Box 655012 M/S 3624 email@example.com
Dallas, TX 75265 TI-MSG: RAK9
Voice: (214) 917-2285 FAX: (214) 917-5112