[Top] [All Lists]

wicked checksum optimization...

Subject: wicked checksum optimization...
From: "David S. Miller" <>
Date: Mon, 13 May 1996 00:05:04 -0400
I think I figured out how to do it all "the right way(tm)"

The big problem is alignment, but %97 of the time the buffer is
aligned how we like.  I've decided it is ok to take the hit of an
unaligned access trap for the %3 cases, but not that much of a hit.
The implementation looks like this:

All loads and stores in the ip checksum routines will look the same,
the only time we do stores is for the csum/copy routines.  Anyways the
eight instruction codes recognized will be for:

        ld      [%o0 + offset], %o4
        ld      [%o0 + offset], %o5
        lduh    [%o0 + offset], %o4
        lduh    [%o0 + offset], %o5
        st      %o4, [%g3 + offset]
        st      %o5, [%g3 + offset]
        sth     %o4, [%g3 + offset]
        sth     %o5, [%g3 + offset]

The unaligned trap handler (before it even tries to save any state)
will look something like:

        andcc   %l0, PSR_PS, %g0
        be,a    mna_fromuser

        ld      [%l1], %l5
        sethi   %hi(LOAD_O4), %l4
        and     %l5, %l4, %l6
        cmp     %l6, %l4
        bne     1f
         sethi  %hi(LOAD_O5), %l4
        mov     %l1, %g6                                ! %pc
        sethi   %hi(C_LABEL(csum_ldo4_fixup)), %l1
        or      %l1, %lo(C_LABEL(csum_ldo4_fixup)), %l1
        wr      %l0, 0x0, %psr                          ! fix cond-codes
        and     %l5, LOAD_IMMEDIATE_FIELD, %g7
        srl     %g7, LOAD_IMMEDIATE_SHIFT, %g7          ! offset
        jmp     %l1
        rett    %l1 + 0x4

        /* etc. for other instructions recognized */

        /* From user mode or something we don't handle for the
         * kernel.
        call    C_LABEL(do_mna)

Ok, now the fixup routines just look like:

        ldub    [%o0 + %g7], %g4
        add     %g7, 1, %g7
        ldub    [%o0 + %g7], %g5
        sll     %g4, 24, %g4
        add     %g7, 1, %g7
        sll     %g5, 16, %g5
        or      %g4, %g5, %o4
        ldub    [%o0 + %g7], %g4
        add     %g7, 1, %g7
        ldub    [%o0 + %g7], %g5
        sll     %g4, 8, %g4
        or      %g5, %g4, %g4
        jmp     %g6             ! wheee...
        or      %o4, %g4, %o4

and so on...  then csum_parial and friends can just blaze through
assuming proper alignment for all pointers to the packet contents
etc.  Nifty eh?  Sparc is fun...

David S. Miller

<Prev in Thread] Current Thread [Next in Thread>
  • wicked checksum optimization..., David S. Miller <=