Alignment means to chose a memory address such that the processor can access in a single memory access. On most CISC architectures the CPU will handle misaligned memory addresses as expected by the programmer that is for a load the processor will perform multiple memory addresses as required then reassemble the final value transparantly to the programmer. For the store the same equivalent will happen. Problem: a single load/store operation now requires multiple memory accesses which means it's no longer an atomic operation.
Unaligned loads and stores on MIPS
The MIPS architecture tries to get away without the extra complexity of handling unaligned loads in the pipeline or microcode. For this purpose the instructions LWL, LWR, SWL and SWR were designed. These instructions will load rsp. store a 32-bit word. For unaligned 64-bit loads and stores there are LDL, LDR, SDL and SDR.
Problems of the MIPS Unaligned Load Instructions
For one they are specific to the MIPS architecture. However GCC allows a variable to be declared as packed through the __attribute ((packed)). In a packed data structure there are no more alignment guarantees so GCC will emit code that uses the MIPS unaligned load and store instructions. So this is a portable way of accessing these instructions. But this may require recompilation and access to the source code.
Downside is that now every 32-bit or 64-bit memory access has been replaced by a sequence of of two instructions. This inflates code - sometimes considerably - resulting in higher I-cache pressure so possibly slower execution. Usually these instructions are always uses in pairs, that is for example a LWL/LWR sequence. These instructions execute reasonably quickly and even the oldest processors have a bypass in the pipeline so the 2nd instruction won't have to wait for the first one to complete. Still there's an extra instruction to execute.
Transparent fixing by the kernel
On the MIPS architecture a misaligned load or store will result in an address error exception. If everything else is looking ok the kernel will then execute the operation in software. This emulation has high overhead and is on the order of 1000 times slower than a properly aligned memory access.
Advantage of using the kernel fixup
For some code misalignment is expected to be very rare. In such a case letting the kernel do the job is the best choice for performance. Also there is no recompilation required and thus no bloat of the resulting binaries.
Hardware-specific considerations =
Cavium cnMIPS cores
Cavium cnMIPS cores feature an advanced pipeline design that can transparently handle misaligned memory accesses. The Linux kernel enables this feature which short of re-designing software to guarantee alignment provides best possible performance, no software engineering pain.
If there is a problem at with Cavium's approach then it's that a binary using the MIPS unaligned load/store instructions will suffer a small performance penalty of a cnMIPS core and a binary that does not use these instructions may perform optimally on a cnMIPS core may crawl on another MIPS core because it's alignement handling has not been issued. Still a hardware-based approach like this should be optimal.
Sony R5900 / Playstation
This MIPS core features 128-bit load and store instructions as part of its multimedia extensions. These 128-bit memory operations will not take an address error so fixup in the kernel is not possible.
The unaligned instructions are covered by US patent 4,814,976 which expire on December 23, 2006. Some of the international patents have expired later. For a little more on the history of this patent see the article on Lexra.