On Wed, 17 Sep 2008 11:40:01 +0100 (BST), "Maciej W. Rozycki"
<macro@linux-mips.org> wrote:
> > and it seems to fix the problem for me. Can you comment?
>
> It seems obvious that a carry from the bit #15 in the last addition of
> the passed checksum -- ADDC(sum, a2) -- will negate the effect of the
> folding. However a simpler fix should do as well. Try if the following
> patch works for you. Please note this is completely untested and further
> optimisation is possible, but I've skipped it in this version for clarity.
Well, csum_partial()'s return value is __wsum (32-bit). So strictly
there is no need to folding into 16-bit.
I think this bug was introduced by my commit
ed99e2bc1dc5dc54eb5a019f4975562dbef20103 ("[MIPS] Optimize
csum_partial for 64bit kernel"). This commit changed ADDC to using
daddu for 64-bit kernel and that was wrong for the last addition of
partial csum which should be 32-bit addition.
Here is my proposal fix. Bryan, could you try it too?
diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/csum_partial.S
index 8d77841..8b15766 100644
--- a/arch/mips/lib/csum_partial.S
+++ b/arch/mips/lib/csum_partial.S
@@ -60,6 +60,14 @@
ADD sum, v1; \
.set pop
+#define ADDC32(sum,reg) \
+ .set push; \
+ .set noat; \
+ addu sum, reg; \
+ sltu v1, sum, reg; \
+ addu sum, v1; \
+ .set pop
+
#define CSUM_BIGCHUNK1(src, offset, sum, _t0, _t1, _t2, _t3) \
LOAD _t0, (offset + UNIT(0))(src); \
LOAD _t1, (offset + UNIT(1))(src); \
@@ -280,7 +288,7 @@ LEAF(csum_partial)
1:
.set reorder
/* Add the passed partial csum. */
- ADDC(sum, a2)
+ ADDC32(sum, a2)
jr ra
.set noreorder
END(csum_partial)
@@ -681,7 +689,7 @@ EXC( sb t0, NBYTES-2(dst), .Ls_exc)
.set pop
1:
.set reorder
- ADDC(sum, psum)
+ ADDC32(sum, psum)
jr ra
.set noreorder
|