> sun4m SS10 115mhz hypersparc, 256k cache
> csum_partial: sz 10000 iterations takes 17009430 microseconds
> csum_partial: sz 1 iteration takes <1700 microseconds>==<332
Maybe I'm dense this morning but I don't understand the numbers.
You measure the realtime to do a sequential IP checksum on a 1K
buffer and you always invalidate the processor cache (I$, D$, and
Secondary$ or just some subset of these?) and don't include the
cost of doing the cache invalidate in the time above. 10000 of
these costs a hair over 17s, divide by 10000 gives 1.7ms per
iteration. This seems way high so must include the cost of the
cache invalidate or something?
What does the 332ns refer to?
My back of the envelope:
If I recall, the cacheline on a sun4m is 32bytes. Assuming
something in the 300-600ns/secondarycachemiss range and a single
pending cachemiss at a time would put most any "touch the data"
operation on 1K of data in the 9.6-19.2us ballpark or 9-18ns/byte
Does the hypersparc processor support multiple concurrent cache
misses? Or does it have a Viking-like sequential reference
detector and automatic cache prefetch logic?
> sun4m MicroSparcI, 40mhz, 16k icache 20k dcache
> Thats around 2.5us/Kbyte for csum,
This must be hot$ that you're talking about? Exactly when do you
invalidate the caches?