Why does the MIPS Architecture need a new ABI?
An ABI (Application Binary Interface, though that hardly helps) is a set of rules governing compiled programs which - if followed - make the programs able to be linked together (for calling and to share data) and to be comprehensible to various useful bits of software - that includes debuggers, the Linux kernel, and run-time loaders.
Earlier MIPS ABIs were interpreted as machine-specific extensions to the cross-architecture [SVR4]; but the definition of OS services in a unix-like system now falls to POSIX and (specifically) Linux. This specification does not include the machine-indepedent parts of SVR4 ABI by reference or otherwise.
The existing MIPS ABIs were evolved substantially by Silicon Graphics Inc (SGI) for various versions of their Irix OS; they are fairly typical of ABIs for Linux and other sophisticated operating systems. At least to date most MIPS embedded systems have got by using a subset of SGI's complicated ABIs - o32, n32 and n64. So What's wrong with o32, n32 and n64?.
The MIPS ABI took shape as a set of register usage and calling conventions established from the earliest days of MIPS CPUs. It picked up the ABI acronym and a defined binding to object code with the AT&T-inspired UnixSystem V document which is rooted with SVR4.
That process had coalesced as early as 1990 into much of the o32 ABI which is widely used today. By about 1994 the ABI was expanded to encompass position-independent code and the ELF object code syntax, and there have been no substantive and intentional changes since.
SGI pioneered 64-bit operating systems for MIPS in the early 1990s, and the o32 ABI was quite unsuitable for real 64-bit computing. SGI defined a 64-bit ABI called n64 suitable for the largest applications; and then - belatedly realising that n64's 64-bit pointer and long types bloated programs and caused portability problems to many applications which didn't need them - produced the very similar standard n32, which differs primarily in having 32-bit pointers.
From 1995 or so SGI used solely 64-bit-capable MIPS CPUs, so they had no need to revisit a 32-bit ABI. As a result the embedded MIPS world is still stuck on the 20-year-old o32 standard. A series of talks five years ago failed to come up with a replacement.
Meanwhile, the perceived deficiencies of o32 have led to the proliferation of variants and more narrowly-focussed alternatives, to the point where there are now as many as 15 incompatible MIPS ABIs. It may yet prove the least worst decision for us all to continue to use o32 forever: but escaping from o32 could noticeably improve performance and ease various kinds of compatibility. So this is MIPS Technologies' proposal to do so: but this won't make sense unless we can take the community with us and end up with fewer ABIs - not just another family to add to the overlong list.
Specific Goals for NUBI
- Replace multiple existing ABIs : a new family should be good enough for all the applications we can make contact with, and a seed for consolidation on a single standard.
- Make better use of registers : o32's limit of four argument registers causes unnecessary stack shuffling, and programs would run slightly better if we reserved more. Eight argument registers has been tried with success, notably by n64/n32.
We will define more saved registers. Not only is this common for other architectures, but it is evident that the GNU C compiler would quite often generate better code with a few more of these.
- Add a thread pointer : a per-thread pointer in a reserved register makes for efficient thread-local storage.
- Avoid unnecessary trouble with MIPS16e: the MIPS16 compact-code standard uses half-size (16-bit) instructions, and one of the trade-offs made means it only has first-class access to eight general-purpose registers. We want to ensure the ABI's register use does not cause avoidable pain to MIPS16e programs (1).
- Better position-independent code (PIC) : all Linux shared libraries and applications are built PIC, so PIC efficiency matters. The general PIC code sequences for external data access and subroutine linkage are quite slow: we want to permit more optimisations of PIC calling and data referencing sequences.
- Reduce 32-/64-bit incompatibility : 64-bit hardware is already with us. It will be easier for applications to migrate to 64-bit computing if we can offer transition tools which can build applications from a mixture of 32- and 64-bit modules. This will never be seamless, but we believe it's worth making it practicable in controlled circumstances.
We've chosen this name, for now. We intend to improve on o32 with something simpler, but which prefers being trouble-free for the MIPS programming community over ground-breaking innovation.
Unfortunately there can't be just one NUBI. We believe there are three basic choices (related to the hardware) which will create NUBI variants, but which it's essential to support:
- Endianness: big- and little-endian programs are wholly incompatible. However, with some care this document covers both. When we need to distinguish them we'll use a suffix L or B for little/big-endian.
- Use of 64-bit integer instructions: whether the software is restricted to a MIPS32 instruction set (NUBI32), or can use the whole of MIPS64 (NUBI64). If you only ever use MIPS32 instructions all general-purpose registers might as well be 32-bits wide. Even NUBI32 software is assumed to have access to double-precision floating point operations,
- Size of basic C types: we need to recognise a variant of NUBI64 with 64-bit pointers and long. I'm going to use NUBI64W (W for wide) to denote this for now. Better suggestions welcomed.
In the medium term we expect the narrow 64-bit ABI which uses 32-bit pointers and long type to be more popular.
That leads to the following list of six main variants: NUBI32L, NUBI32B, NUBI64L, NUBI64B, NUBI64WL and NUBI64WB. We'll use NUBI32 to mean NUBI32L and NUBI32B, and - at a pinch - NUBI-L to mean NUBI32L and NUBI64L.
How's NUBI different from o32?
- More argument registers : eight instead of four.
- Argument registers shared with return-value registers : helps avoid having too many registers with pre-defined roles.
- Adds a thread pointer : for efficient TLS.
- o32 was stack based, NUBI is register-based: at bottom o32 used a stack-based calling convention, though it's disguised because the first, notional, 4Ã—32-bit locations of the underlying stack argument structure are left unwritten, with the real data passed in four argument registers.
At the time o32 was introduced C programs were frequently written without function prototypes to describe the types of the arguments expected by an external function. Without any prototypes calls to functions with non-standard arguments or (worse) with a variable number of arguments, like printf(), were difficult to get right. The underlying stack structure helped; a troubled function could save the four registers onto the stack to obtain a completely predictable memory structure for its arguments.
The stack-based structure also made it possible to pass data of derived types (structures, principally) by value. This has advantages: sometimes the data types you think are obscure turn out to be common.
In o32's generation, the critical example of this was the Fortran complex-number data type (which is a pair of floating point values).
However, in NUBI32's time the problem is more likely to be that we'd like to handle long long arguments and return values more efficiently. This might be worth a special case (which would cover complex and double-precision floating point numbers too).
- o32 was irredeemably 32-bit, NUBI makes interworking possible: a call between a NUBI64 program and a NUBI32 program will always need gasket code, but a gasket which is automatically produced would be a useful tool for complicated applications which are migrating from 32-bit to 64-bit, and where not all the components are recompiled together.
The v0.16 from September , 2005 can be downloaded at ftp://ftp.linux-mips.org//pub/linux/mips/doc/NUBI/MD00438-2C-NUBIDESC-SPC-00.16.pdf.
- Since the normal 32-bit MIPS register set treats pretty much all registers the same, the MIPS16e constraint should be unproblematic.