On Sat, Aug 20, 2011 at 12:46:25PM +0200, David Kuehling wrote:
> this patch adds a MIPS assembler version of the twofish cipher
> algorithm. x86(_64) had an assembler version of twofish for some time
> now, giving it an "unfair" advantage against the not so common
> The code wast put through various tests and benchmarks in a user-space
> testbench, testing its correctness (also via qemu) for 32bit vs. 64bit
> and big vs. little endian MIPS.
> Also the same code backported to a 2.6.39 kernel (64bit) has been
> working flawlessly on my loongson-2f machine (with a twofish dm-crypt
> root) for a few weeks now.
> Performance gain for the pure crypto code in 32-bit mode is about 15% on
> both Ingenic Xburst and Loongson-2f. With 64-bit it is 24-29%, as the C
> code suffers from 32->64 bit type conversions inserted by gcc. The code
> uses less unrolling than the C version, and should be smaller by at
> least a factor of 8. Full results in file Benchres.txt at the SVN URL
> given above.
> Signed-off-by: David Kühling <email@example.com>
Lots of trailing whitespace in that patch. scripts/checkpatch.pl would
have warned about those …
+# define HAVE_64BIT
+#if _MIPS_SZPTR == 32
+# define PTRADD addu
+# warning "detected pointer-size as 32-bit"
+# ifndef HAVE_64BIT
+# error "something is broken: detected pointer size > GPR size"
+# define PTRADD daddu
+# warning "detected pointer-size as 64-bit"
PTR_ADDU from <asm/asm.h> does the same thing as PTRADDU so the entire
ifdefery above can go away.
Finally the use of register names starting with $ is a bit obscure. Kernel
code needs to build for N64 and O32 but of those only N64 has registers
called $ta0 .. $ta3 which are the equivalent to registers $8 .. $11.
And in O32 registers $t0 ... $t3 also aliases for $8 .. $11. I haven't
fully analyzed the code to ensure that there is no register conflict
arising from that.
I was surprised that gas assembles the code at all. $ta0 .. $ta3 are
N32 / N64 register names and consider gas permitting the use of these
registers in O32 a bug - but see my other posting to the binutils list
Improvment suggestion for le32_fromto_cpu - MIPS 32/64 R2 CPUs can use
the wsbh instruction to faster endianess swapping:
static inline __attribute_const__ __u32 __arch_swab32(__u32 x)
" wsbh %0, %1 \n"
" rotr %0, %0, 16 \n"
: "=r" (x)
: "r" (x));
#define __arch_swab32 __arch_swab32
This code fragment is from <asm/swab.h>.
+ /* if we turned this into 64-bit ops, we get endianess issues on
+ big-endian mips, plus alignment problems */
Some CPUs (Cavium Octeon) handle unaligned loads in hardware, on others
a combination of LDL / LDR and SDL / SDR could be used to handle the
unaligned loads and DSBH / DSHD (see __arch_swab64 in swab.h) could be
used on MIPS64 R2 CPUs to handle the alignment issues. Or a rotate -
have to think about it.