Difference between revisions of "NPTL"

From LinuxMIPS
Jump to: navigation, search
(Fix typos.)
(ELF Definitions)
 
(12 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
== Status ==
 
== Status ==
Currently NPTL for Linux/MIPS is work in progress.  Ulrich Drepper's [http://people.redhat.com/drepper/nptl-design.pdf NPTL Design Document] contains some information on the NTPL design including implementation details on other architectures. One prerequirement for NPTL is TLS, as documented in [http://people.redhat.com/drepper/tls.pdf Thread Local Storage].
+
NPTL for Linux/MIPS is functionally complete.  Ulrich Drepper's [http://people.redhat.com/drepper/nptl-design.pdf NPTL Design Document] contains some information on the NTPL design including implementation details on other architectures. One prerequirement for NPTL is TLS, as documented in [http://people.redhat.com/drepper/tls.pdf Thread Local Storage].
  
 
== Overview ==
 
== Overview ==
Line 9: Line 9:
 
relocations that must be generated by the assembler, and the processing
 
relocations that must be generated by the assembler, and the processing
 
that must be performed by the linker.
 
that must be performed by the linker.
+
 
+
 
== Design Choices ==
 
== Design Choices ==
  
 
* There are no available hardware registers to designate as the thread register.
 
* There are no available hardware registers to designate as the thread register.
 
Therefore, kernel magic will be used to make the thread pointer available to userspace. The mechanism for obtaining the thread pointer will be encapsulated in the <code>__tls_get_addr</code> function.  For the Initial Exec and Local Exec models, a <code>rdhwr</code> instruction will be used, and this will be emulated by the kernel as necessary.
 
Therefore, kernel magic will be used to make the thread pointer available to userspace. The mechanism for obtaining the thread pointer will be encapsulated in the <code>__tls_get_addr</code> function.  For the Initial Exec and Local Exec models, a <code>rdhwr</code> instruction will be used, and this will be emulated by the kernel as necessary.
* Use TLS Variant II (in which the TLS data areas precede the TCB in memory).
+
* Use TLS Variant I (in which the TLS data areas follow the TCB in memory).
As noted in Drepper's paper, this design permits the compiler to generate efficient code for the case that the main executable accesses TLS variables from the executable itself.
+
* The thread pointer is offset by 0x7000 from the start of the TLS data areas for modules loaded at startup, and the DTP for a module is offset by 0x8000 from the start of the TLS data area for that module.  The use of these offsets, taken from Power, makes more data accessible with signed 16-bit offsets.
 
* The <code>__tls_get_addr</code> function has the prototype:
 
* The <code>__tls_get_addr</code> function has the prototype:
  
Line 37: Line 36:
  
 
In what follows, all references to registers other than <code>$2</code> (when it is
 
In what follows, all references to registers other than <code>$2</code> (when it is
used as the return register), <code>$4</code> (when it used as an argument
+
used as the return register), <code>$3</code> (when it is used for the output of <code>rdhwr</code>), <code>$4</code> (when it used as an argument
 
register), <code>$25</code> (the address of a called function), and <code>$28</code> (the
 
register), <code>$25</code> (the address of a called function), and <code>$28</code> (the
 
global pointer) are arbitrary; the compiler is free to use other
 
global pointer) are arbitrary; the compiler is free to use other
registers instead.
+
registers instead.  <code>$3</code> must be used in the <code>rdhwr</code> instruction for fast-path emulation; using any other register in its place will be much slower.
 
   
 
   
 
Where <code>...</code> appears in a code sequence the compiler may insert zero or
 
Where <code>...</code> appears in a code sequence the compiler may insert zero or
Line 59: Line 58:
 
   
 
   
 
     GOT[n]                                      R_MIPS_TLS_DTPMOD32 x
 
     GOT[n]                                      R_MIPS_TLS_DTPMOD32 x
     GOT[n+1]                                    R_MIPS_TLS_DTPOFF32 x
+
     GOT[n+1]                                    R_MIPS_TLS_DTPREL32 x
 
   
 
   
 
Code sequence (64-bit mode):
 
Code sequence (64-bit mode):
Line 70: Line 69:
 
   
 
   
 
     GOT[n]                                      R_MIPS_TLS_DTPMOD64 x
 
     GOT[n]                                      R_MIPS_TLS_DTPMOD64 x
     GOT[n+1]                                    R_MIPS_TLS_DTPOFF64 x
+
     GOT[n+1]                                    R_MIPS_TLS_DTPREL64 x
 
   
 
   
 
At the end of the code sequence the address of <code>x</code> is available in <code>$2</code>.
 
At the end of the code sequence the address of <code>x</code> is available in <code>$2</code>.
Line 83: Line 82:
 
     0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM  x
 
     0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM  x
 
         ...
 
         ...
     0x10 lui $3, %hi(%dtpoff(x1))              R_MIPS_TLS_LDO_HI16 x1
+
     0x10 lui $3, %dtprel_hi(x1)              R_MIPS_TLS_DTPREL_HI16 x1
     0x14 addiu $3, $3, %lo(%dtpoff(x1))        R_MIPS_TLS_LDO_LO16 x1
+
     0x14 addiu $3, $3, %dtprel_lo(x1)        R_MIPS_TLS_DTPREL_LO16 x1
 
     0x18 addu $3, $3, $2
 
     0x18 addu $3, $3, $2
 
         ...
 
         ...
     0x1c lui $3, %hi(%dtpoff(x2))              R_MIPS_TLS_LDO_HI16 x2
+
     0x1c lui $3, %dtprel_hi(x2)              R_MIPS_TLS_DTPREL_HI16 x2
     0x20 addiu $3, $3, %lo(%dtpoff(x2))        R_MIPS_TLS_LDO_LO16 x2
+
     0x20 addiu $3, $3, %dtprel_lo(x2)        R_MIPS_TLS_DTPREL_LO16 x2
 
     0x24 addu $3, $3, $2
 
     0x24 addu $3, $3, $2
 
   
 
   
Line 101: Line 100:
 
     0x08 daddiu $4, $28, %tlsldm(x)            R_MIPS_TLS_LDM  x
 
     0x08 daddiu $4, $28, %tlsldm(x)            R_MIPS_TLS_LDM  x
 
         ...
 
         ...
     0x10 lui $3, %hi(%dtpoff(x1))              R_MIPS_TLS_LDO_HI16 x1
+
     0x10 lui $3, %dtprel_hi(x1)              R_MIPS_TLS_DTPREL_HI16 x1
     0x14 addiu $3, $3, %lo(%dtpoff(x1))        R_MIPS_TLS_LDO_LO16 x1
+
     0x14 addiu $3, $3, %dtprel_lo(x1)        R_MIPS_TLS_DTPREL_LO16 x1
 
     0x18 daddu $3, $3, $2
 
     0x18 daddu $3, $3, $2
 
         ...
 
         ...
     0x1c lui $3, %hi(%dtpoff(x2))              R_MIPS_TLS_LDO_HI16 x2
+
     0x1c lui $3, %dtprel_hi(x2)              R_MIPS_TLS_DTPREL_HI16 x2
     0x20 addiu $3, $3, %lo(%dtpoff(x2))        R_MIPS_TLS_LDO_LO16 x2
+
     0x20 addiu $3, $3, %dtprel_lo(x2)        R_MIPS_TLS_DTPREL_LO16 x2
 
     0x24 daddu $3, $3, $2
 
     0x24 daddu $3, $3, $2
 
   
 
   
Line 126: Line 125:
 
     0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM  x
 
     0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM  x
 
         ...
 
         ...
     0x10 lui $3, %hi(%dtpoff(x1))              R_MIPS_TLS_LDO_HI16 x1
+
     0x10 lui $3, %dtprel_hi(x1)              R_MIPS_TLS_DTPREL_HI16 x1
 
     0x14 addu $3, $3, $2
 
     0x14 addu $3, $3, $2
     0x18 lw $3, %lo(%dtpoff(x1))($3)            R_MIPS_TLS_LDO_LO16 x1
+
     0x18 lw $3, %dtprel_lo(x1)($3)            R_MIPS_TLS_DTPREL_LO16 x1
 
   
 
   
 
Code Sequence II (64-bit mode):
 
Code Sequence II (64-bit mode):
Line 136: Line 135:
 
     0x08 daddiu $4, $28, %tlsldm(x)            R_MIPS_TLS_LDM  x
 
     0x08 daddiu $4, $28, %tlsldm(x)            R_MIPS_TLS_LDM  x
 
         ...
 
         ...
     0x10 lui $3, %hi(%dtpoff(x1))              R_MIPS_TLS_LDO_HI16 x1
+
     0x10 lui $3, %dtprel_hi(x1)              R_MIPS_TLS_DTPREL_HI16 x1
 
     0x14 daddu $3, $3, $2
 
     0x14 daddu $3, $3, $2
     0x18 lw $3, %lo(%dtpoff(x1))($3)            R_MIPS_TLS_LDO_LO16 x1
+
     0x18 lw $3, %dtprel_lo(x1)($3)            R_MIPS_TLS_DTPREL_LO16 x1
 
   
 
   
 
Here, <code>lw</code> may be replaced with any other load/store instruction, using
 
Here, <code>lw</code> may be replaced with any other load/store instruction, using
Line 153: Line 152:
 
     0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM  x
 
     0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM  x
 
         ...
 
         ...
     0x10 addiu $3, $2, %lo(%dtpoff(x1))        R_MIPS_TLS_LDO_LO16 x1
+
     0x10 addiu $3, $2, %dtprel_lo(x1)        R_MIPS_TLS_DTPREL_LO16 x1
 
   
 
   
 
Code sequence III (64-bit mode):
 
Code sequence III (64-bit mode):
Line 161: Line 160:
 
     0x08 daddiu $4, $28, %tlsldm(x)            R_MIPS_TLS_LDM  x
 
     0x08 daddiu $4, $28, %tlsldm(x)            R_MIPS_TLS_LDM  x
 
         ...
 
         ...
     0x10 daddiu $3, $2, %lo(%dtpoff(x1))        R_MIPS_TLS_LDO_LO16 x1
+
     0x10 daddiu $3, $2, %dtprel_lo(x1)        R_MIPS_TLS_DTPREL_LO16 x1
 
   
 
   
 
The outstanding relocations are as for Code Sequence I.
 
The outstanding relocations are as for Code Sequence I.
Line 176: Line 175:
 
     0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM  x
 
     0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM  x
 
         ...
 
         ...
     0x10 lw $3, %lo(%dtpoff(x1))($2)            R_MIPS_TLS_LDO_LO16 x1
+
     0x10 lw $3, %dtprel_lo(x1)($2)            R_MIPS_TLS_DTPREL_LO16 x1
 
   
 
   
 
Code sequence IV (64-bit mode):
 
Code sequence IV (64-bit mode):
Line 184: Line 183:
 
     0x08 daddiu $4, $28, %tlsldm(x)            R_MIPS_TLS_LDM  x
 
     0x08 daddiu $4, $28, %tlsldm(x)            R_MIPS_TLS_LDM  x
 
         ...
 
         ...
     0x10 lw $3, %lo(%dtpoff(x1))($2)            R_MIPS_TLS_LDO_LO16 x1
+
     0x10 lw $3, %dtprel_lo(x1)($2)            R_MIPS_TLS_DTPREL_LO16 x1
 
   
 
   
 
Here, <code>lw</code> may be replaced with any other load/store instruction, as above.
 
Here, <code>lw</code> may be replaced with any other load/store instruction, as above.
Line 192: Line 191:
 
Code sequence (32-bit mode):
 
Code sequence (32-bit mode):
  
     0x00 rdhwr $2, $5
+
     0x00 rdhwr $3, $29
     0x04 lw $3, %tpoff(x1)($28)        R_MIPS_TLS_TPOFF       x1
+
     0x04 lw $2, %gottprel(x1)($28)        R_MIPS_TLS_GOTTPREL       x1
     0x08 addu $3, $3, $2
+
     0x08 addu $2, $2, $3
 
     ...
 
     ...
     0x0c lw $3, %tpoff(x2)($28)        R_MIPS_TLS_TPOFF       x2
+
     0x0c lw $2, %gottprel(x2)($28)        R_MIPS_TLS_GOTTPREL       x2
     0x10 addu $3, $3, $2                
+
     0x10 addu $2, $2, $3                
  
 
Outstanding relocations (32-bit mode):
 
Outstanding relocations (32-bit mode):
  
     GOT[n]                              R_MIPS_TLS_TPOFF32     x1
+
     GOT[n]                              R_MIPS_TLS_TPREL32     x1
  
 
Code sequence (64-bit mode):
 
Code sequence (64-bit mode):
  
     0x00 rdhwr $2, $5
+
     0x00 rdhwr $3, $29
     0x04 ld $3, %tpoff(x1)($28)        R_MIPS_TLS_TPOFF       x1
+
     0x04 ld $2, %gottprel(x1)($28)        R_MIPS_TLS_GOTTPREL       x1
     0x08 daddu $3, $3, $2
+
     0x08 daddu $2, $2, $3
 
     ...
 
     ...
     0x0c ld $3, %tpoff(x2)($28)        R_MIPS_TLS_TPOFF       x2
+
     0x0c ld $2, %gottprel(x2)($28)        R_MIPS_TLS_TPGOTTPREL       x2
     0x10 daddu $3, $3, $2             
+
     0x10 daddu $2, $2, $3
  
 
Outstanding relocations (64-bit mode):
 
Outstanding relocations (64-bit mode):
  
     GOT[n]                              R_MIPS_TLS_TPOFF64     x1
+
     GOT[n]                              R_MIPS_TLS_TPREL64     x1
  
 
The first instruction loads the virtual thread register.  The kernel emulates the <code>rdhwr</code> instruction if necessary to return the right value.
 
The first instruction loads the virtual thread register.  The kernel emulates the <code>rdhwr</code> instruction if necessary to return the right value.
Line 222: Line 221:
 
The second code sequence (beginning at address <code>0x0c</code>) demonstrates that the thread register can be reused after it has been loaded.
 
The second code sequence (beginning at address <code>0x0c</code>) demonstrates that the thread register can be reused after it has been loaded.
  
The use of registers <code>$2</code> and <code>$3</code> in the above code sequence is arbitrary.  The compiler is free to use alternative registers if convenient.
+
The use of register <code>$2</code> in the above code sequence is arbitrary.  The compiler is free to use alternative registers if convenient.
  
 
== Local Exec TLS Model ==
 
== Local Exec TLS Model ==
Line 230: Line 229:
 
Code sequence I (32-bit mode):
 
Code sequence I (32-bit mode):
  
     0x00 rdhwr $2, $5
+
     0x00 rdhwr $3, $29
     0x04 lui $3, %hi(%tpoff(x))        R_MIPS_TLS_TPOFF_HI16   x
+
     0x04 lui $2, %tprel_hi(x)        R_MIPS_TLS_TPREL_HI16   x
     0x08 ori $3, $3, %lo(%tpoff(x))    R_MIPS_TLS_TPOFF_LO16   x
+
     0x08 addiu $2, $2, %tprel_lo(x)   R_MIPS_TLS_TPREL_LO16   x
     0x0c addu $3, $3, $2
+
     0x0c addu $2, $2, $3
  
 
No outstanding relocations.
 
No outstanding relocations.
Line 239: Line 238:
 
Code sequence I (64-bit mode):
 
Code sequence I (64-bit mode):
  
     0x00 rdhwr $2, $5
+
     0x00 rdhwr $3, $29
     0x04 lui $3, %hi(%tpoff(x))        R_MIPS_TLS_TPOFF_HI16   x
+
     0x04 lui $2, %tprel_hi(x)        R_MIPS_TLS_TPREL_HI16   x
     0x08 ori $3, $3, %lo(%tpoff(x))    R_MIPS_TLS_TPOFF_LO16   x
+
     0x08 addiu $2, $2, %tprel_lo(x)   R_MIPS_TLS_TPREL_LO16   x
     0x0c daddu $3, $3, $2
+
     0x0c daddu $2, $2, $3
  
 
No outstanding relocations.
 
No outstanding relocations.
Line 254: Line 253:
 
Code sequence II (32-bit mode):
 
Code sequence II (32-bit mode):
  
     0x00 rdhwr $2, $5
+
     0x00 rdhwr $3, $29
     0x04 addiu $3, $2, %lo(%tpoff(x))    R_MIPS_TLS_TPOFF_L016   x
+
     0x04 addiu $2, $3, %tprel_lo(x)    R_MIPS_TLS_TPREL_LO16   x
  
 
Code sequence II (64-bit mode):
 
Code sequence II (64-bit mode):
  
     0x00 rdhwr $2, $5
+
     0x00 rdhwr $3, $29
     0x04 daddiu $3, $2, %lo(%tpoff(x)R_MIPS_TLS_TPOFF_L016   x
+
     0x04 daddiu $2, $3, %tprel_lo(x)  R_MIPS_TLS_TPREL_LO16   x
  
 
No outstanding relocations, as for Code Sequence I.
 
No outstanding relocations, as for Code Sequence I.
Line 268: Line 267:
 
Code sequence III (32-bit mode):
 
Code sequence III (32-bit mode):
  
     0x00 rdhwr $2, $5
+
     0x00 rdhwr $3, $29
     0x04 lw $3, %lo(%tpoff(x))($2)      R_MIPS_TLS_TPOFF_L016   x
+
     0x04 lw $2, %tprel_lo(x)($3)      R_MIPS_TLS_TPREL_LO16   x
  
 
Code sequence III (64-bit mode):
 
Code sequence III (64-bit mode):
  
     0x00 rdhwr $2, $5
+
     0x00 rdhwr $3, $29
     0x04 lw $3, %lo(%tpoff(x))($2)      R_MIPS_TLS_TPOFF_L016   x
+
     0x04 lw $2, %tprel_lo(x)($3)      R_MIPS_TLS_TPREL_LO16   x
 
    
 
    
 
Here, <code>lw</code> may be replaced with any other load/store instruction, as for Code Sequence III in the Local Dynamic model.
 
Here, <code>lw</code> may be replaced with any other load/store instruction, as for Code Sequence III in the Local Dynamic model.
 +
 +
== Debug Information ==
 +
 +
For DWARF-2 debug information for thread-local variables, a single static relocation for the whole 32-bit or 64-bit TLS offset is needed.  The relocations <code>R_MIPS_TLS_DTPREL32</code> and <code>R_MIPS_TLS_DTPREL64</code>, generated by assembler directives <code>.dtprelword</code> and <code>.dtpreldword</code>, are resolved by the static linker to a DTP-relative offset for the given variable.  A typical use is of the form
 +
 +
    DW_OP_addr
 +
    .dtprelword x+0x8000
 +
    DW_OP_GNU_push_tls_address
  
 
== Linker Optimizations ==
 
== Linker Optimizations ==
Line 289: Line 296:
 
   
 
   
 
     #define R_MIPS_TLS_DTPMOD32        38
 
     #define R_MIPS_TLS_DTPMOD32        38
     #define R_MIPS_TLS_DTPOFF32         39
+
     #define R_MIPS_TLS_DTPREL32         39
 
     #define R_MIPS_TLS_DTPMOD64        40
 
     #define R_MIPS_TLS_DTPMOD64        40
     #define R_MIPS_TLS_DTPOFF64         41
+
     #define R_MIPS_TLS_DTPREL64         41
 
     #define R_MIPS_TLS_GD              42
 
     #define R_MIPS_TLS_GD              42
 
     #define R_MIPS_TLS_LDM              43
 
     #define R_MIPS_TLS_LDM              43
     #define R_MIPS_TLS_LDO_HI16        44
+
     #define R_MIPS_TLS_DTPREL_HI16      44
     #define R_MIPS_TLS_LDO_LO16        45
+
     #define R_MIPS_TLS_DTPREL_LO16      45
     #define R_MIPS_TLS_TPOFF            46
+
     #define R_MIPS_TLS_GOTTPREL        46
     #define R_MIPS_TLS_TPOFF32         47
+
     #define R_MIPS_TLS_TPREL32         47
     #define R_MIPS_TLS_TPOFF64         48
+
     #define R_MIPS_TLS_TPREL64         48
     #define R_MIPS_TLS_TPOFF_HI16       49
+
     #define R_MIPS_TLS_TPREL_HI16       49
     #define R_MIPS_TLS_TPOFF_LO16       50
+
     #define R_MIPS_TLS_TPREL_LO16       50
 +
 
 +
Analogous static relocations for MIPS16 and microMIPS (dynamic relocations are the same as R_MIPS_TLS_*):
 +
 
 +
    #define R_MIPS16_TLS_GD              106
 +
    #define R_MIPS16_TLS_LDM            107
 +
    #define R_MIPS16_TLS_DTPREL_HI16    108
 +
    #define R_MIPS16_TLS_DTPREL_LO16    109
 +
    #define R_MIPS16_TLS_GOTTPREL        110
 +
    #define R_MIPS16_TLS_TPREL_HI16      111
 +
    #define R_MIPS16_TLS_TPREL_LO16      112
 +
 
 +
    #define R_MICROMIPS_TLS_GD          162
 +
    #define R_MICROMIPS_TLS_LDM          163
 +
    #define R_MICROMIPS_TLS_DTPREL_HI16  164
 +
    #define R_MICROMIPS_TLS_DTPREL_LO16  165
 +
    #define R_MICROMIPS_TLS_GOTTPREL    166
 +
    #define R_MICROMIPS_TLS_TPREL_HI16  169
 +
    #define R_MICROMIPS_TLS_TPREL_LO16  170
 +
 
 +
== History ==
 +
MIPS Technologies has reserved hardware register <tt>$29</tt> for ABI use.  Earlier drafts of this page were using <tt>$5</tt>.
 +
 
 +
<tt>$3</tt> was chosen for fast-path emulation in <code>rdhwr</code>; in earlier drafts the register used was said to be arbitrary.
 +
 
 +
The names of some of the relocations and associated assembler operations were changed in the course of inclusion into GNU binutils.
 +
 
 +
<code>R_MIPS_TLS_DTPREL32</code> and <code>R_MIPS_TLS_DTPREL64</code>, originally only resolved by the dynamic linker, were reused as static relocations as well for use in debug information.
  
 
== Authors ==
 
== Authors ==
* Mark Mitchell (mark@codesourcery.com), CodeSourcery, LLC
+
* Mark Mitchell (mark@codesourcery.com), CodeSourcery
* Daniel Jacobowitz (dan@debian.org), CodeSourcery, LLC
+
* Daniel Jacobowitz (dan@codesourcery.com), CodeSourcery
 +
* Joseph Myers (joseph@codesourcery.com), CodeSourcery

Latest revision as of 11:13, 19 December 2011

Status

NPTL for Linux/MIPS is functionally complete. Ulrich Drepper's NPTL Design Document contains some information on the NTPL design including implementation details on other architectures. One prerequirement for NPTL is TLS, as documented in Thread Local Storage.

Overview

This document presents a design for implementing Thread Local Storage (TLS) for MIPS Linux, in both 32-bit and 64-bit mode. This design specifies the code that must be generated by the compiler, the relocations that must be generated by the assembler, and the processing that must be performed by the linker.

Design Choices

  • There are no available hardware registers to designate as the thread register.

Therefore, kernel magic will be used to make the thread pointer available to userspace. The mechanism for obtaining the thread pointer will be encapsulated in the __tls_get_addr function. For the Initial Exec and Local Exec models, a rdhwr instruction will be used, and this will be emulated by the kernel as necessary.

  • Use TLS Variant I (in which the TLS data areas follow the TCB in memory).
  • The thread pointer is offset by 0x7000 from the start of the TLS data areas for modules loaded at startup, and the DTP for a module is offset by 0x8000 from the start of the TLS data area for that module. The use of these offsets, taken from Power, makes more data accessible with signed 16-bit offsets.
  • The __tls_get_addr function has the prototype:
  extern void *__tls_get_addr (tls_index *ti);

where the type tls_index is defined as:

  typedef struct {
          unsigned long ti_module;
          unsigned long ti_offset;
  } tls_index;

The type unsigned long is used because it is a 32-bit type in 32-bit mode and a 64-bit type in 64-bit mode; thus, the members will fit correctly into two consecutive GOT entries in both modes.

  • The compiler is not allowed to schedule the sequences below.

The sequences below must appear exactly as written in the code generated by the compiler. This restriction is present because we have not yet determined what linker optimizations may be possible. In order to facilitate adding linker optimizations in the future, without recompiling current code, the compiler is restricted from scheduling these sequences.

Conventions

In what follows, all references to registers other than $2 (when it is used as the return register), $3 (when it is used for the output of rdhwr), $4 (when it used as an argument register), $25 (the address of a called function), and $28 (the global pointer) are arbitrary; the compiler is free to use other registers instead. $3 must be used in the rdhwr instruction for fast-path emulation; using any other register in its place will be much slower.

Where ... appears in a code sequence the compiler may insert zero or more arbitrary instructions.

.set noreorder is assumed; the instruction after a jalr is in its delay slot.

General Dynamic TLS Model

Code sequence (32-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 addiu $4, $28, %tlsgd(x)               R_MIPS_TLS_GD   x

Outstanding relocations (32-bit mode):

   GOT[n]                                      R_MIPS_TLS_DTPMOD32 x
   GOT[n+1]                                    R_MIPS_TLS_DTPREL32 x

Code sequence (64-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 daddiu $4, $28, %tlsgd(x)              R_MIPS_TLS_GD   x

Outstanding relocations (64-bit mode):

   GOT[n]                                      R_MIPS_TLS_DTPMOD64 x
   GOT[n+1]                                    R_MIPS_TLS_DTPREL64 x

At the end of the code sequence the address of x is available in $2.


Local Dynamic TLS Model

Code sequence I (32-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM   x
        ...
   0x10 lui $3, %dtprel_hi(x1)               R_MIPS_TLS_DTPREL_HI16 x1
   0x14 addiu $3, $3, %dtprel_lo(x1)         R_MIPS_TLS_DTPREL_LO16 x1
   0x18 addu $3, $3, $2
        ...
   0x1c lui $3, %dtprel_hi(x2)               R_MIPS_TLS_DTPREL_HI16 x2
   0x20 addiu $3, $3, %dtprel_lo(x2)         R_MIPS_TLS_DTPREL_LO16 x2
   0x24 addu $3, $3, $2

Outstanding relocations (32-bit mode):

   GOT[n]                                      R_MIPS_TLS_DTPMOD32 x1

Code sequence I (64-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 daddiu $4, $28, %tlsldm(x)             R_MIPS_TLS_LDM   x
        ...
   0x10 lui $3, %dtprel_hi(x1)               R_MIPS_TLS_DTPREL_HI16 x1
   0x14 addiu $3, $3, %dtprel_lo(x1)         R_MIPS_TLS_DTPREL_LO16 x1
   0x18 daddu $3, $3, $2
        ...
   0x1c lui $3, %dtprel_hi(x2)               R_MIPS_TLS_DTPREL_HI16 x2
   0x20 addiu $3, $3, %dtprel_lo(x2)         R_MIPS_TLS_DTPREL_LO16 x2
   0x24 daddu $3, $3, $2

Outstanding relocations (64-bit mode):

   GOT[n]                                      R_MIPS_TLS_DTPMOD64 x1

After the instruction at 0x18 has executed, the address of x1 is available in $3.

If rather than needing the address of the variable, the variable is to be read or written, then the following code sequences may be used instead of either of the sequences above.

Code Sequence II (32-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM  x
        ...
   0x10 lui $3, %dtprel_hi(x1)               R_MIPS_TLS_DTPREL_HI16 x1
   0x14 addu $3, $3, $2
   0x18 lw $3, %dtprel_lo(x1)($3)            R_MIPS_TLS_DTPREL_LO16 x1

Code Sequence II (64-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 daddiu $4, $28, %tlsldm(x)             R_MIPS_TLS_LDM  x
        ...
   0x10 lui $3, %dtprel_hi(x1)               R_MIPS_TLS_DTPREL_HI16 x1
   0x14 daddu $3, $3, $2
   0x18 lw $3, %dtprel_lo(x1)($3)            R_MIPS_TLS_DTPREL_LO16 x1

Here, lw may be replaced with any other load/store instruction, using the same opcode format as lw, such as lb, lbu, lh, lwu, ld, sb, sh, sw, sd, ll, lld, lwl, lwr, ldl, or ldr.

If the size of the TLS area is known to be smaller than 32K, then the following sequences can be used instead of those above.

Code sequence III (32-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM  x
        ...
   0x10 addiu $3, $2, %dtprel_lo(x1)         R_MIPS_TLS_DTPREL_LO16 x1

Code sequence III (64-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 daddiu $4, $28, %tlsldm(x)             R_MIPS_TLS_LDM  x
        ...
   0x10 daddiu $3, $2, %dtprel_lo(x1)        R_MIPS_TLS_DTPREL_LO16 x1

The outstanding relocations are as for Code Sequence I.

If, rather than needing the address of the variable, the variable is to be read or written, and the size of the TLS area is known to be smaller than 32K, then the following code sequences may be used instead of either of the sequences above.

Code sequence IV (32-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM  x
        ...
   0x10 lw $3, %dtprel_lo(x1)($2)            R_MIPS_TLS_DTPREL_LO16 x1

Code sequence IV (64-bit mode):

   0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
   0x04 jalr $25
   0x08 daddiu $4, $28, %tlsldm(x)             R_MIPS_TLS_LDM  x
        ...
   0x10 lw $3, %dtprel_lo(x1)($2)            R_MIPS_TLS_DTPREL_LO16 x1

Here, lw may be replaced with any other load/store instruction, as above.

Initial Exec TLS Model

Code sequence (32-bit mode):

   0x00 rdhwr $3, $29
   0x04 lw $2, %gottprel(x1)($28)         R_MIPS_TLS_GOTTPREL        x1
   0x08 addu $2, $2, $3
   ...
   0x0c lw $2, %gottprel(x2)($28)         R_MIPS_TLS_GOTTPREL        x2
   0x10 addu $2, $2, $3                

Outstanding relocations (32-bit mode):

   GOT[n]                              R_MIPS_TLS_TPREL32      x1

Code sequence (64-bit mode):

   0x00 rdhwr $3, $29
   0x04 ld $2, %gottprel(x1)($28)         R_MIPS_TLS_GOTTPREL        x1
   0x08 daddu $2, $2, $3
   ...
   0x0c ld $2, %gottprel(x2)($28)         R_MIPS_TLS_TPGOTTPREL        x2
   0x10 daddu $2, $2, $3

Outstanding relocations (64-bit mode):

   GOT[n]                              R_MIPS_TLS_TPREL64      x1

The first instruction loads the virtual thread register. The kernel emulates the rdhwr instruction if necessary to return the right value.

The second instruction loads the offset of x1, relative to the thread pointer. The instruction at address 0x08 computes the address of x1 itself.

The second code sequence (beginning at address 0x0c) demonstrates that the thread register can be reused after it has been loaded.

The use of register $2 in the above code sequence is arbitrary. The compiler is free to use alternative registers if convenient.

Local Exec TLS Model

Relative to the Initial Exec TLS Model, the sequences in this section save one dynamic relocation.

Code sequence I (32-bit mode):

   0x00 rdhwr $3, $29
   0x04 lui $2, %tprel_hi(x)         R_MIPS_TLS_TPREL_HI16   x
   0x08 addiu $2, $2, %tprel_lo(x)   R_MIPS_TLS_TPREL_LO16   x
   0x0c addu $2, $2, $3

No outstanding relocations.

Code sequence I (64-bit mode):

   0x00 rdhwr $3, $29
   0x04 lui $2, %tprel_hi(x)         R_MIPS_TLS_TPREL_HI16   x
   0x08 addiu $2, $2, %tprel_lo(x)   R_MIPS_TLS_TPREL_LO16   x
   0x0c daddu $2, $2, $3

No outstanding relocations.

The first instruction loads the virtual thread register, as for the Initial Exec TLS Model. The next two instructions load the offset of the variable. The offsets in these instructions can be resolved by the static linker. The instruction at address 0x0c computes the address of x itself.

As with the Initial Exec TLS Model additional variables may be accessed without reloading the virtual thread register.

If the size of the TLS area is known to be smaller than 32K, then the following sequences can be used instead of those above.

Code sequence II (32-bit mode):

   0x00 rdhwr $3, $29
   0x04 addiu $2, $3, %tprel_lo(x)    R_MIPS_TLS_TPREL_LO16   x

Code sequence II (64-bit mode):

   0x00 rdhwr $3, $29
   0x04 daddiu $2, $3, %tprel_lo(x)   R_MIPS_TLS_TPREL_LO16   x

No outstanding relocations, as for Code Sequence I.

If, rather than needing the address of the variable, the variable is to be read or written, and the size of the TLS area is known to be smaller than 32K, then the following code sequences may be used instead of either of the sequences above.

Code sequence III (32-bit mode):

   0x00 rdhwr $3, $29
   0x04 lw $2, %tprel_lo(x)($3)       R_MIPS_TLS_TPREL_LO16   x

Code sequence III (64-bit mode):

   0x00 rdhwr $3, $29
   0x04 lw $2, %tprel_lo(x)($3)       R_MIPS_TLS_TPREL_LO16   x
 

Here, lw may be replaced with any other load/store instruction, as for Code Sequence III in the Local Dynamic model.

Debug Information

For DWARF-2 debug information for thread-local variables, a single static relocation for the whole 32-bit or 64-bit TLS offset is needed. The relocations R_MIPS_TLS_DTPREL32 and R_MIPS_TLS_DTPREL64, generated by assembler directives .dtprelword and .dtpreldword, are resolved by the static linker to a DTP-relative offset for the given variable. A typical use is of the form

   DW_OP_addr
   .dtprelword x+0x8000
   DW_OP_GNU_push_tls_address

Linker Optimizations

Not yet specified.

Additional relocations may be required to mark instructions that the linker can transform.

ELF Definitions

New relocations:

   #define R_MIPS_TLS_DTPMOD32         38
   #define R_MIPS_TLS_DTPREL32         39
   #define R_MIPS_TLS_DTPMOD64         40
   #define R_MIPS_TLS_DTPREL64         41
   #define R_MIPS_TLS_GD               42
   #define R_MIPS_TLS_LDM              43
   #define R_MIPS_TLS_DTPREL_HI16      44
   #define R_MIPS_TLS_DTPREL_LO16      45
   #define R_MIPS_TLS_GOTTPREL         46
   #define R_MIPS_TLS_TPREL32          47
   #define R_MIPS_TLS_TPREL64          48
   #define R_MIPS_TLS_TPREL_HI16       49
   #define R_MIPS_TLS_TPREL_LO16       50

Analogous static relocations for MIPS16 and microMIPS (dynamic relocations are the same as R_MIPS_TLS_*):

   #define R_MIPS16_TLS_GD              106
   #define R_MIPS16_TLS_LDM             107
   #define R_MIPS16_TLS_DTPREL_HI16     108
   #define R_MIPS16_TLS_DTPREL_LO16     109
   #define R_MIPS16_TLS_GOTTPREL        110
   #define R_MIPS16_TLS_TPREL_HI16      111
   #define R_MIPS16_TLS_TPREL_LO16      112
   #define R_MICROMIPS_TLS_GD           162
   #define R_MICROMIPS_TLS_LDM          163
   #define R_MICROMIPS_TLS_DTPREL_HI16  164
   #define R_MICROMIPS_TLS_DTPREL_LO16  165
   #define R_MICROMIPS_TLS_GOTTPREL     166
   #define R_MICROMIPS_TLS_TPREL_HI16   169
   #define R_MICROMIPS_TLS_TPREL_LO16   170

History

MIPS Technologies has reserved hardware register $29 for ABI use. Earlier drafts of this page were using $5.

$3 was chosen for fast-path emulation in rdhwr; in earlier drafts the register used was said to be arbitrary.

The names of some of the relocations and associated assembler operations were changed in the course of inclusion into GNU binutils.

R_MIPS_TLS_DTPREL32 and R_MIPS_TLS_DTPREL64, originally only resolved by the dynamic linker, were reused as static relocations as well for use in debug information.

Authors

  • Mark Mitchell (mark@codesourcery.com), CodeSourcery
  • Daniel Jacobowitz (dan@codesourcery.com), CodeSourcery
  • Joseph Myers (joseph@codesourcery.com), CodeSourcery