There was a thread about optimizing the tcp/ip checksum in comp.arch
(the ``Plenty of registers'' thread) recently, so I took some time out
to look at the Linux code.
After having stared at the i386 code, I wondered what we did for MIPS.
After musing a bit over it, I have com to the conclusion that at least
of the following two statements are true for Linux/MIPS:
o Nobody uses TCP/IP.
o All TCP/IP fragments are word aligned.
Why? -- well, it's quite simple if you look in the cum_partial code:
__asm__("
.set noreorder
.set noat
andi $1,%5,2 # Check alignment
beqz $1,2f # Branch if ok
subu $1,%4,2 # delay slot, Alignment uses up two bytes
bgez $1,1f # Jump if we had at least two bytes
move %4,$1 # delay slot
j 4f
addiu %4,2 # delay slot; len was < 2. Deal with it
1: lw %2,(%5)
addiu %4,2
addu %0,%2
sltu $1,%0,%2
addu %0,$1
We will reach label '1' if we have at least 2 byets to check and the
address is aligned to an *ODD* halfword boundary. So the CPU would do
an address fault (unaligned word access?), and we even does not inc
the address pointer, so *all* word accesses would be unaligned from
then on.
So, the patch would be to write
1: lhu %2,(%5)
addiu %5,2
instead of the 'lw' line.
PS: why don't we use the 64bit registers in the R4x00 chips, when
they are available? yes, we'd need separate 32bit and 64bit versions,
but I think it would be worth it.
PPS: Ralf, after thinking about csum_partial_copy() I do understand
why you prefered the easy option (mempcy from address 0x1 to 0x7,
anyone?).
Kai
--
Kai Harrekilde-Petersen <khp@dolphinics.no> #include <std/disclaimer.h>
http://www.dolphinics.no/~khp/ Linux: the choice of a GNU generation
"Argue for your limitations, and sure enough - they're yours" --Richard Bach.
|