Difference between revisions of "NPTL"

From LinuxMIPS
Jump to: navigation, search
m (fix typo)
m (Formatting and typo fixes.)
Line 14: Line 14:
  
 
* There are no available hardware registers to designate as the thread register.
 
* There are no available hardware registers to designate as the thread register.
Therefore, kernel magic will be used to make the thread pointer available to userspace. This specification does not proscribe a mechanism for that; the mechanism for obtaining the thread pointer will be encapsulated in the |__tls_get_addr| function.
+
Therefore, kernel magic will be used to make the thread pointer available to userspace. This specification does not proscribe a mechanism for that; the mechanism for obtaining the thread pointer will be encapsulated in the <code>__tls_get_addr</code> function.
 
* Use TLS Variant II (in which the TLS data areas precede the TCB in memory).
 
* Use TLS Variant II (in which the TLS data areas precede the TCB in memory).
 
As noted in Drepper's paper, this design permits the compiler to generate efficient code for the case that the main executable accesses TLS variables from the executable itself.
 
As noted in Drepper's paper, this design permits the compiler to generate efficient code for the case that the main executable accesses TLS variables from the executable itself.
* The |__tls_get_addr| function has the prototype:
+
* The <code>__tls_get_addr</code> function has the prototype:
  
 
   extern void *__tls_get_addr (tls_index *ti);
 
   extern void *__tls_get_addr (tls_index *ti);
 
   
 
   
where the type 'tls_index' is defined as::
+
where the type <code>tls_index</code> is defined as:
 
   
 
   
 
   typedef struct {
 
   typedef struct {
Line 28: Line 28:
 
   } tls_index;
 
   } tls_index;
 
   
 
   
The type 'unsigned long' is used because it is a 32-bit type in 32-bit mode and a 64-bit type in 64-bit mode; thus, the members will fit correctly into two consecutive GOT entries in both modes.
+
The type <code>unsigned long</code> is used because it is a 32-bit type in 32-bit mode and a 64-bit type in 64-bit mode; thus, the members will fit correctly into two consecutive GOT entries in both modes.
  
 
* The Initial Exec and Local Exec models are not yet specified.
 
* The Initial Exec and Local Exec models are not yet specified.
  
These models require that the compiler be able to directly access the thread pointer without using |__tls_get_addr|. Whether or not that is possible will depend on the kernel mechanism used to implement the thread pointer.
+
These models require that the compiler be able to directly access the thread pointer without using <code>__tls_get_addr</code>. Whether or not that is possible will depend on the kernel mechanism used to implement the thread pointer.
  
 
* The compiler is not allowed to schedule the sequences below.
 
* The compiler is not allowed to schedule the sequences below.
Line 40: Line 40:
 
== Conventions ==
 
== Conventions ==
  
In what follows, all references to registers other than |$2| (when it is
+
In what follows, all references to registers other than <code>$2</code> (when it is
used as the return register) , |$4| (when it used as an argument
+
used as the return register), <code>$4</code> (when it used as an argument
register), |$25| (the address of a called function), and |$28| (the
+
register), <code>$25</code> (the address of a called function), and <code>$28</code> (the
 
global pointer) are arbitrary; the compiler is free to use other
 
global pointer) are arbitrary; the compiler is free to use other
 
registers instead.
 
registers instead.
 
   
 
   
Where |...| appears in a code sequence the compiler may insert zero or
+
Where <code>...</code> appears in a code sequence the compiler may insert zero or
 
more arbitrary instructions.
 
more arbitrary instructions.
 
   
 
   
Line 74: Line 74:
 
     GOT[n+1]                                    R_MIPS_TLS_DTPOFF64 x
 
     GOT[n+1]                                    R_MIPS_TLS_DTPOFF64 x
 
   
 
   
At the end of the code sequence the address of |x| is available in |$2|.
+
At the end of the code sequence the address of <code>x</code> is available in <code>$2</code>.
 
   
 
   
 
   
 
   
Line 115: Line 115:
 
     GOT[n]                                      R_MIPS_TLS_DTPMOD32 x1
 
     GOT[n]                                      R_MIPS_TLS_DTPMOD32 x1
 
   
 
   
After the instruction at |0x18| has executed, the address of |x1| is
+
After the instruction at <code>0x18</code> has executed, the address of <code>x1</code> is
available in |$3|.
+
available in <code>$3</code>.
 
   
 
   
 
If rather than needing the address of the variable, the variable is to
 
If rather than needing the address of the variable, the variable is to
Line 142: Line 142:
 
     0x18 lw $3, %lo(%dtpoff(x1))($3)                R_MIPS_TLS_LDO_LO16 x1
 
     0x18 lw $3, %lo(%dtpoff(x1))($3)                R_MIPS_TLS_LDO_LO16 x1
 
   
 
   
Here, |lw| may be replaced with any other load/store instruction, using
+
Here, <code>lw</code> may be replaced with any other load/store instruction, using
the same opcode format as |lw|, such as |lb|, |lbu|, |lh|, |lwu|, |ld|,
+
the same opcode format as <code>lw</code>, such as <code>lb</code>, <code>lbu</code>, <code>lh</code>, <code>lwu</code>, <code>ld</code>,
|sb|, |sh|, |sw|, |sd|, |ll|, |lld|, |lwl|, |lwr|, |ldl|, or |ldr|.
+
<code>sb</code>, <code>sh</code>, <code>sw</code>, <code>sd</code>, <code>ll</code>, <code>lld</code>, <code>lwl</code>, <code>lwr</code>, <code>ldl</code>, or <code>ldr</code>.
 
   
 
   
 
If the size of the TLS area is known to be smaller than 32K, then the
 
If the size of the TLS area is known to be smaller than 32K, then the
Line 188: Line 188:
 
     0x10 lw $3, %lo(%dtpoff(x1))($2)            R_MIPS_TLS_LDO_LO16 x1
 
     0x10 lw $3, %lo(%dtpoff(x1))($2)            R_MIPS_TLS_LDO_LO16 x1
 
   
 
   
Here, |lw| may be replaced with any other load/store insturction, as above.
+
Here, <code>lw</code> may be replaced with any other load/store instruction, as above.
 
   
 
   
 
   
 
   

Revision as of 01:10, 3 December 2004

Status

Currently NPTL for Linux/MIPS is work in progress. Ulrich Drepper's NPTL Design Document contains some information on the NTPL design including implementation details on other architectures. One prerequirement for NPTL is TLS, as documented in Thread Local Storage.

Overview

This document presents a design for implementing Thread Local Storage (TLS) for MIPS Linux, in both 32-bit and 64-bit mode. This design specifies the code that must be generated by the compiler, the relocations that must be generated by the assembler, and the processing that must be performed by the linker.


Design Choices

  • There are no available hardware registers to designate as the thread register.

Therefore, kernel magic will be used to make the thread pointer available to userspace. This specification does not proscribe a mechanism for that; the mechanism for obtaining the thread pointer will be encapsulated in the __tls_get_addr function.

  • Use TLS Variant II (in which the TLS data areas precede the TCB in memory).

As noted in Drepper's paper, this design permits the compiler to generate efficient code for the case that the main executable accesses TLS variables from the executable itself.

  • The __tls_get_addr function has the prototype:
  extern void *__tls_get_addr (tls_index *ti);

where the type tls_index is defined as:

  typedef struct {
          unsigned long ti_module;
          unsigned long ti_offset;
  } tls_index;

The type unsigned long is used because it is a 32-bit type in 32-bit mode and a 64-bit type in 64-bit mode; thus, the members will fit correctly into two consecutive GOT entries in both modes.

  • The Initial Exec and Local Exec models are not yet specified.

These models require that the compiler be able to directly access the thread pointer without using __tls_get_addr. Whether or not that is possible will depend on the kernel mechanism used to implement the thread pointer.

  • The compiler is not allowed to schedule the sequences below.

The sequences below must appear exactly as written in the code generated by the compiler. This restriction is present because we have not yet specified the Initial Exec and Local Exec models, and so it is not clear what linker optimizations may be possible. In order to facilitate adding linker optimizations in the future, without recompiling current code, the compiler is restricted from scheduling these sequences.

Conventions

In what follows, all references to registers other than $2 (when it is used as the return register), $4 (when it used as an argument register), $25 (the address of a called function), and $28 (the global pointer) are arbitrary; the compiler is free to use other registers instead.

Where ... appears in a code sequence the compiler may insert zero or more arbitrary instructions.


   General Dynamic TLS Model

Code sequence (32-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 addiu $4, $28, %tlsgd(x)               R_MIPS_TLS_GD   x

Outstanding relocations (32-bit mode):

   GOT[n]                                      R_MIPS_TLS_DTPMOD32 x
   GOT[n+1]                                    R_MIPS_TLS_DTPOFF32 x

Code sequence (64-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 daddiu $4, $28, %tlsgd(x)              R_MIPS_TLS_GD   x

Outstanding relocations (64-bit mode):

   GOT[n]                                      R_MIPS_TLS_DTPMOD64 x
   GOT[n+1]                                    R_MIPS_TLS_DTPOFF64 x

At the end of the code sequence the address of x is available in $2.


   Local Dynamic TLS Model

Code sequence I (32-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM   x
        ...
   0x10 lui $3, %hi(%dtpoff(x1))               R_MIPS_TLS_LDO_HI16 x1
   0x14 addiu $3, $3, %lo(%dtpoff(x1))     R_MIPS_TLS_LDO_LO16 x1
   0x18 addu $3, $3, $2
        ...
   0x1c lui $3, %hi(%dtpoff(x2))               R_MIPS_TLS_LDO_HI16 x2
   0x20 addiu $3, $3, %lo(%dtpoff(x2))     R_MIPS_TLS_LDO_LO16 x2
   0x24 addu $3, $3, $2

Outstanding relocations (32-bit mode):

   GOT[n]                                      R_MIPS_TLS_DTPMOD32 x1

Code sequence I (64-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 daddiu $4, $28, %tlsldm(x)             R_MIPS_TLS_LDM   x
        ...
   0x10 lui $3, %hi(%dtpoff(x1))               R_MIPS_TLS_LDO_HI16 x1
   0x14 addiu $3, $3, %lo(%dtpoff(x1))     R_MIPS_TLS_LDO_LO16 x1
   0x18 daddu $3, $3, $2
        ...
   0x1c lui $3, %hi(%dtpoff(x2))               R_MIPS_TLS_LDO_HI16 x2
   0x20 addiu $3, $3, %lo(%dtpoff(x2))     R_MIPS_TLS_LDO_LO16 x2
   0x24 daddu $3, $3, $2

Outstanding relocations (32-bit mode):

   GOT[n]                                      R_MIPS_TLS_DTPMOD32 x1

After the instruction at 0x18 has executed, the address of x1 is available in $3.

If rather than needing the address of the variable, the variable is to be read or written, then the following code sequences may be used instead of either of the sequences above.

Code Sequence II (32-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM  x
        ...
   0x10 lui $3, %hi(%dtpoff(x1))               R_MIPS_TLS_LDO_HI16 x1
   0x14 addu $3, $3, $2
   0x18 lw $3, %lo(%dtpoff(x1))($3)                R_MIPS_TLS_LDO_LO16 x1

Code Sequence II (64-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 daddiu $4, $28, %tlsldm(x)             R_MIPS_TLS_LDM  x
        ...
   0x10 lui $3, %hi(%dtpoff(x1))               R_MIPS_TLS_LDO_HI16 x1
   0x14 daddu $3, $3, $2
   0x18 lw $3, %lo(%dtpoff(x1))($3)                R_MIPS_TLS_LDO_LO16 x1

Here, lw may be replaced with any other load/store instruction, using the same opcode format as lw, such as lb, lbu, lh, lwu, ld, sb, sh, sw, sd, ll, lld, lwl, lwr, ldl, or ldr.

If the size of the TLS area is known to be smaller than 32K, then the following sequences can be used instead of those above.

Code sequence III (32-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM  x
        ...
   0x10 addiu $3, $2, %lo(%dtpoff(x1))         R_MIPS_TLS_LDO_LO16 x1

Code sequence III (64-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 daddiu $4, $28, %tlsldm(x)             R_MIPS_TLS_LDM  x
        ...
   0x10 daddiu $3, $2, %lo(%dtpoff(x1))        R_MIPS_TLS_LDO_LO16 x1

The outstanding relocations are as for Code Sequence I.

If, rather than needing the address of the variable, the variable is to be read or written, and the size of the TLS area is known to be smaller than 32K, then the following code sequences may be used instead of either of the sequences above.

Code sequence IV (32-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM  x
        ...
   0x10 lw $3, %lo(%dtpoff(x1))($2)            R_MIPS_TLS_LDO_LO16 x1

Code sequence IV (64-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 daddiu $4, $28, %tlsldm(x)             R_MIPS_TLS_LDM  x
        ...
   0x10 lw $3, %lo(%dtpoff(x1))($2)            R_MIPS_TLS_LDO_LO16 x1

Here, lw may be replaced with any other load/store instruction, as above.


Linker Optimizations

Not yet specified.

Additional relocations may be required to mark instructions that the linker can transform.


ELF Definitions

New relocations:

   #define R_MIPS_TLS_DTPMOD32         38
   #define R_MIPS_TLS_DTPOFF32         39
   #define R_MIPS_TLS_DTPMOD64         40
   #define R_MIPS_TLS_DTPOFF64         41
   #define R_MIPS_TLS_GD               42
   #define R_MIPS_TLS_LDM              43
   #define R_MIPS_TLS_LDO_HI16         44
   #define R_MIPS_TLS_LDO_LO16         45

Authors

  • Mark Mitchell (mark@codesourcery.com), CodeSourcery, LLC
  • Daniel Jacobowitz (dan@debian.org), CodeSourcery, LLC