Difference between revisions of "Alignment"

From LinuxMIPS
Jump to: navigation, search
(Sony R5900 / Playstation)
 
(One intermediate revision by the same user not shown)
Line 14: Line 14:
 
=== Advantage of using the kernel fixup ===
 
=== Advantage of using the kernel fixup ===
 
For some code misalignment is expected to be very rare.  In such a case letting the kernel do the job is the best choice for performance.  Also there is no recompilation required and thus no bloat of the resulting binaries.
 
For some code misalignment is expected to be very rare.  In such a case letting the kernel do the job is the best choice for performance.  Also there is no recompilation required and thus no bloat of the resulting binaries.
 +
 +
=== Controlling the kernel unalignment handling via debugfs ===
 +
There is no truely standardized mountpoint for debugfs.  For brevity sake this section will assume <tt>/debug</tt> but <tt>/sys/kernel/debug</tt> is also common.  Debugfs provides two files to control and monitor the kernel's unaligned handling.
 +
 +
<pre>
 +
/debug/mips/unaligned_instructions
 +
/debug/mips/unaligned_action
 +
</pre>
 +
 +
The <tt>unaligned_instructions</tt> file counts the number of unaligned instructions.  Note that this is a system-wide, not per-process counter.
 +
 +
The <tt>unaligned_action</tt> file allows to change the action if the kernel encounters an address error due to an unaligned load/store instruction.  There are three possible modes:
 +
 +
{| {{PrettyTable}}
 +
! Mode !! Action
 +
|-
 +
| 0 || silently fixup the unaligned access
 +
|-
 +
| 1 || send SIGBUS
 +
|-
 +
| 2 || dump registers, process name, etc. and fixup
 +
|}
 +
 +
Default mode is 0.
  
 
=== Controlling the kernel unalignment handling in your own code ===
 
=== Controlling the kernel unalignment handling in your own code ===
Line 61: Line 85:
  
 
If there is a problem at with Cavium's approach then it's that a binary using the MIPS unaligned load/store instructions will suffer a small performance penalty of a cnMIPS core and a binary that does not use these instructions may perform optimally on a cnMIPS core may crawl on another MIPS core because it's alignement handling has not been issued.  Still a hardware-based approach like this should be optimal.
 
If there is a problem at with Cavium's approach then it's that a binary using the MIPS unaligned load/store instructions will suffer a small performance penalty of a cnMIPS core and a binary that does not use these instructions may perform optimally on a cnMIPS core may crawl on another MIPS core because it's alignement handling has not been issued.  Still a hardware-based approach like this should be optimal.
=== Sony R5900 / Playstation ===
+
=== Sony R5900 / Playstation 2 ===
 
This MIPS core features 128-bit load and store instructions as part of its multimedia extensions.  These 128-bit memory operations will not take an <em>address error</em> so fixup in the kernel is not possible.
 
This MIPS core features 128-bit load and store instructions as part of its multimedia extensions.  These 128-bit memory operations will not take an <em>address error</em> so fixup in the kernel is not possible.
  
 
== Patent issue ==
 
== Patent issue ==
 
The unaligned instructions are covered by US patent 4,814,976 which expire on December 23, 2006.  Some of the international patents have expired later.  For a little more on the history of this patent see the article on [[Lexra]].
 
The unaligned instructions are covered by US patent 4,814,976 which expire on December 23, 2006.  Some of the international patents have expired later.  For a little more on the history of this patent see the article on [[Lexra]].

Latest revision as of 08:35, 16 August 2012

Alignment means to chose a memory address such that the processor can access in a single memory access. On most CISC architectures the CPU will handle misaligned memory addresses as expected by the programmer that is for a load the processor will perform multiple memory addresses as required then reassemble the final value transparantly to the programmer. For the store the same equivalent will happen. Problem: a single load/store operation now requires multiple memory accesses which means it's no longer an atomic operation.

Unaligned loads and stores on MIPS

The MIPS architecture tries to get away without the extra complexity of handling unaligned loads in the pipeline or microcode. For this purpose the instructions LWL, LWR, SWL and SWR were designed. These instructions will load rsp. store a 32-bit word. For unaligned 64-bit loads and stores there are LDL, LDR, SDL and SDR.

Problems of the MIPS Unaligned Load Instructions

For one they are specific to the MIPS architecture. However GCC allows a variable to be declared as packed through the __attribute ((packed)). In a packed data structure there are no more alignment guarantees so GCC will emit code that uses the MIPS unaligned load and store instructions. So this is a portable way of accessing these instructions. But this may require recompilation and access to the source code.

Downside is that now every 32-bit or 64-bit memory access has been replaced by a sequence of of two instructions. This inflates code - sometimes considerably - resulting in higher I-cache pressure so possibly slower execution. Usually these instructions are always uses in pairs, that is for example a LWL/LWR sequence. These instructions execute reasonably quickly and even the oldest processors have a bypass in the pipeline so the 2nd instruction won't have to wait for the first one to complete. Still there's an extra instruction to execute.

Transparent fixing by the kernel

On the MIPS architecture a misaligned load or store will result in an address error exception. If everything else is looking ok the kernel will then execute the operation in software. This emulation has high overhead and is on the order of 1000 times slower than a properly aligned memory access.

Advantage of using the kernel fixup

For some code misalignment is expected to be very rare. In such a case letting the kernel do the job is the best choice for performance. Also there is no recompilation required and thus no bloat of the resulting binaries.

Controlling the kernel unalignment handling via debugfs

There is no truely standardized mountpoint for debugfs. For brevity sake this section will assume /debug but /sys/kernel/debug is also common. Debugfs provides two files to control and monitor the kernel's unaligned handling.

/debug/mips/unaligned_instructions
/debug/mips/unaligned_action

The unaligned_instructions file counts the number of unaligned instructions. Note that this is a system-wide, not per-process counter.

The unaligned_action file allows to change the action if the kernel encounters an address error due to an unaligned load/store instruction. There are three possible modes:

Mode Action
0 silently fixup the unaligned access
1 send SIGBUS
2 dump registers, process name, etc. and fixup

Default mode is 0.

Controlling the kernel unalignment handling in your own code

Fixing address errors is a per process option. The option is inherited across fork(2) and execve(2) calls. If you really want to use the option in your user programs - use something like the following code in your userland stuff:

#include <sys/sysmips.h>
...
sysmips(MIPS_FIXADE, x);
...

The argument x is 0 for disabling software emulation, enabled otherwise.

Below a little program to explore this feature:

#include <stdio.h>
#include <sys/sysmips.h>

struct foo {
         unsigned char bar[8];
};

main(int argc, char *argv[])
{
        struct foo x = {0, 1, 2, 3, 4, 5, 6, 7};
        unsigned int *p = (unsigned int *) (x.bar + 3);
        int i;

        if (argc > 1)
                sysmips(MIPS_FIXADE, atoi(argv[1]));

        printf("*p = %08lx\n", *p);

        *p = 0xdeadface;

        for(i = 0; i <= 7; i++)
        printf("%02x ", x.bar[i]);
        printf("\n");
}

Hardware-specific considerations

Cavium cnMIPS cores

Cavium cnMIPS cores feature an advanced pipeline design that can transparently handle misaligned memory accesses. The Linux kernel enables this feature which short of re-designing software to guarantee alignment provides best possible performance, no software engineering pain.

If there is a problem at with Cavium's approach then it's that a binary using the MIPS unaligned load/store instructions will suffer a small performance penalty of a cnMIPS core and a binary that does not use these instructions may perform optimally on a cnMIPS core may crawl on another MIPS core because it's alignement handling has not been issued. Still a hardware-based approach like this should be optimal.

Sony R5900 / Playstation 2

This MIPS core features 128-bit load and store instructions as part of its multimedia extensions. These 128-bit memory operations will not take an address error so fixup in the kernel is not possible.

Patent issue

The unaligned instructions are covered by US patent 4,814,976 which expire on December 23, 2006. Some of the international patents have expired later. For a little more on the history of this patent see the article on Lexra.