[Top] [All Lists]

Re: [PATCH] fast path for rdhwr emulation for TLS

To: Atsushi Nemoto <>
Subject: Re: [PATCH] fast path for rdhwr emulation for TLS
From: "Maciej W. Rozycki" <>
Date: Fri, 7 Jul 2006 17:58:44 +0100 (BST)
In-reply-to: <>
Original-recipient: rfc822;
References: <> <> <>
On Sat, 8 Jul 2006, Atsushi Nemoto wrote:

> >  For a VIVT I-cache this can result in a TLB exception.  TLB handlers are 
> > not currently prepared for being called at the exception level.
> Thanks, now I understand the problem.  Are there any good solutions?
> Only I can think now is using handle_ri_slow for such CPUs.

 I have implemented an appropriate update to the TLB handlers (or actually 
it's enough to care for this case for the TLBL exception), but it predates 
the current synthesized ones.  There is a small impact resulting from 
this change and the synthesized handlers have the advantage of making it 
only necessary for these chips that do need such handling.

 There are two possible ways of handling TLB exceptions from the exception 
level, both requiring checking cp0.index.p (which we do not do at the 
moment under the assumption a TLB refill exception has already been taken 
and handled) and if a failure is indicated either:

1. jumping to the TLB refill handler,


2. executing "tlbwr" rather than "tlbwi".

Both are good, but I have not benchmarked them -- note that a failure is 
expected to be an extremely rare event, so it's the performance for the 
probe success that matters.

> >  Also I am fairly sure gas won't fill the branch delay slot above -- a 
> > trivial rearrangement of code would save a cycle here (and this is a fast 
> > path, so we do not want wasting time).
> Well, here is a code compiled by binutils 2.17.  This version of gas
> can put MFC0 on the delay slot.  But it might be better to use
> noreorder by myself.
> 80012a80 <handle_ri>:
> 80012a80:     401a6800        mfc0    k0,c0_cause
> 80012a84:     0740fd2e        bltz    k0,80011f40 <handle_ri_slow>
> 80012a88:     401b7000        mfc0    k1,c0_epc
> 80012a8c:     8f7a0000        lw      k0,0(k1)

 Still bad -- you have a stall on $k1 here.  And on $k0 two instructions 

> 80012a90:     3c1b7c03        lui     k1,0x7c03
> 80012a94:     377be83b        ori     k1,k1,0xe83b
> 80012a98:     175bfd29        bne     k0,k1,80011f40 <handle_ri_slow>
> 80012a9c:     00000000        nop

 And this "nop" is a waste of time.

> 80012aa0:     3c1b801b        lui     k1,0x801b
> 80012aa4:     8f7b4008        lw      k1,16392(k1)
> 80012aa8:     401a7000        mfc0    k0,c0_epc
> 80012aac:     275a0004        addiu   k0,k0,4
> 80012ab0:     409a7000        mtc0    k0,c0_epc
> 80012ab4:     377b1fff        ori     k1,k1,0x1fff
> 80012ab8:     3b7b1fff        xori    k1,k1,0x1fff
> 80012abc:     8f63000c        lw      v1,12(k1)
> 80012ac0:     42000018        eret

 I'd restructure the code more or less like this, taking care for (almost) 
all stalls resulting from interlocks on coprocessor moves and memory loads 
and likewise avoiding the need for "nop" fillers there for MIPS I 

        .set    push
        .set    noat
        .set    noreorder
        mfc0    k0, CP0_CAUSE
        MFC0    k1, CP0_EPC
        bltz    k0, handle_ri_slow      /* if delay slot */
         lui    k0, 0x7c03
        lw      k1, (k1)
        ori     k0, 0xe83b              /* k0 := rdhwr v1,$29 */
        bne     k0, k1, handle_ri_slow  /* if not ours */
         get_saved_sp                   /* k1 := current_thread_info */
        MFC0    k0, CP0_EPC
#if defined(CONFIG_CPU_R3000) || defined(CONFIG_CPU_TX39XX)
        ori     k1, _THREAD_MASK
        xori    k1, _THREAD_MASK
        LONG_L  v1, TI_FLAGS(k1)
        PTR_ADDIU k0, 4
        jr      k0
        PTR_ADDIU k0, 4                 /* stall on $k0 */
        MTC0    k0, CP0_EPC
        ori     k1, _THREAD_MASK
        xori    k1, _THREAD_MASK
        LONG_L  v1, TI_FLAGS(k1)
        .set    pop

I hope I got this right. ;-)


<Prev in Thread] Current Thread [Next in Thread>