linux-mips-fnet
[Top] [All Lists]

Re: get_mmu_context()

To: ralf@uni-koblenz.de, linux-mips@fnet.fr, linux@engr.sgi.com, linux-mips@vger.rutgers.edu
Subject: Re: get_mmu_context()
From: Vladimir Roganov <roganov@niisi.msk.ru>
Date: Thu, 15 Oct 1998 15:08:09 +0400
Organization: NIISI RAS
References: <19981013215927.A2692@uni-koblenz.de>
Sender: vladimir@niisi.msk.ru
ralf@uni-koblenz.de wrote:
> 
> Ok, here is a draft version of an agressively optimized version of
> get_mmu_context().  I just didn't like the idea of referencing
> global variables in get_mmu_context() if avoidable.  The code below
> will work on both R3000 and R4000 with no performance penalty for
> being generic.  The trick is to patch the operands of two machine
> instructions at runtime, shoot me.
> 
> The code below should be integrated into <asm/mmu_context.h>.  The
> CPU specific code in arch/mips/mm/... will then just have to include
> that header file and call r3000_asid_setup rsp. r4xx0_asid_setup.
> 
> Have fun,
> 
>   Ralf
> 
> #define ASID_VERSION_SHIFT 16
> #define ASID_VERSION_MASK  ((~0UL) << ASID_VERSION_SHIFT)
> #define ASID_FIRST_VERSION (1UL << ASID_VERSION_SHIFT)
> 
> unsigned long asid_cache = ASID_FIRST_VERSION;
> 
> /* The next two macros know that they will only be assembled once
>    per kernel.  */
> #define ASID_VERSION_INC                                        \
>  ({ unsigned long __asid_inc;                                   \
>    __asm__(".globl\tasid_inc\n"                                 \
>            "asid_inc:\n\t"                                      \
>            "li\t%0,0\t\t\t#patched\n\t"                         \
>            :"=r" (__asid_inc));                                 \
>    __asid_inc; })
> 
> #define ASID_OVERFLOW(asid)                                     \
>  ({ unsigned long __res;                                        \
>    __asm__(".global\tasid_overflow\n"                           \
>            "asid_overflow:\n\t"                                 \
>            "sltu\t%0,%1,0\t\t\t#patched\n\t"                    \
>            :"=r" (__res)                                        \
>            :"r" (asid));                                        \
>    __res; })
> 
> extern inline void get_new_mmu_context(struct mm_struct *mm, unsigned long 
> asid)
> {
>         /* check if it's legal.. */
>         if (ASID_OVERFLOW(asid & ~ASID_VERSION_MASK)) {
>                 /* start a new version, invalidate all old asid's */
>                 flush_tlb_all();
>                 asid = (asid & ASID_VERSION_MASK) + ASID_FIRST_VERSION;
>                 if (!asid)
>                         asid = ASID_FIRST_VERSION;
>         }
>         asid_cache = asid + ASID_VERSION_INC;
>         mm->context = asid;                      /* full version + asid */
> }
> 
> extern void get_mmu_context(struct task_struct *p)
> {
>         struct mm_struct *mm = p->mm;
> 
>         if (mm) {
>                 unsigned long asid = asid_cache;
>                 /* Check if our ASID is of an older version and thus invalid 
> */
>                 if ((mm->context ^ asid) & ASID_VERSION_MASK)
>                         get_new_mmu_context(mm, asid);
>         }
> }
> 
> extern inline void __asid_setup(unsigned long asid_inc, unsigned long 
> asid_cmp)
> {
>         extern u32 asid_inc;
>         extern u32 asid_overflow;
> 
>         asid_inc = (asid_inc & 0xffff0000) | asid_inc;
>         flush_icache_range(&asid_inc, 4);
>         asid_overflow = (asid_overflow & 0xffff0000) | asid_cmp;
>         flush_icache_range(&asid_overflow, 4);
> }
> 
> extern inline void r3000_asid_setup(void)
> {
>         __asid_setup(0x40, 0xfc1);
> }
> 
> extern inline void r4xx0_asid_setup(void)
> {
>         __asid_setup(1, 0x100);
> }





;-----------------------------------------------------------------------


Hello Ralf, hello others !

Thanks for Your comments about optimization and 'asid_cache' parameter,
but I see no reason to make 'get_new_mmu_context' so complex, 
due code we suggested can be easily optimized without problems.
I looked assembler generated by gcc and don't see any performance
penalty
against code-modified version.


Optimized (and tested) code looks more easily: 

<<<<<<<<<<<<
/* 
 *  All unused by hardware upper bits will be considered 
 *  as software asid extension   --   asid version. 
 */
#define ASID_VERSION_MASK  ((unsigned long)~(ASID_MASK|(ASID_MASK-1))) 
#define ASID_FIRST_VERSION ((unsigned long)(~ASID_VERSION_MASK) + 1)

extern inline void get_new_mmu_context(struct mm_struct *mm, unsigned
long asid)
{
        if (! ((asid += (1<<ASID_SHIFT)) & ASID_MASK) ) {
                flush_tlb_all(); /* start new asid cycle */
                if (!asid)      /* fix version if needed */ 
                        asid = ASID_FIRST_VERSION;
        }
        mm->context = asid_cache = asid;
}
>>>>>>>>>>>>

    800d7d54:   26100040        addiu   $s0,$s0,64
    800d7d58:   32020fc0        andi    $v0,$s0,0xfc0
    800d7d5c:   14400009        bnez    $v0,800d7d84 
    800d7d60:   afbf0018        sw      $ra,24($sp)
    800d7d64:   3c028012        lui     $v0,0x8012
    800d7d68:   8c423708        lw      $v0,14088($v0)
    800d7d6c:   00000000        nop
    800d7d70:   0040f809        jalr    $v0
    800d7d74:   00000000        nop
    800d7d78:   16000002        bnez    $s0,800d7d84 
    800d7d7c:   00000000        nop
    800d7d80:   24101000        li      $s0,4096
    800d7d84:   3c018010        lui     $at,0x8010
    800d7d88:   ac307010        sw      $s0,28688($at)
    800d7d8c:   ae300020        sw      $s0,32($s1)

So it takes usually 6 instructions.


Code You suggested compiled as: 
(numbers must be changed relatively startup-patch values)

    800d7df8:   3202ffff        andi    $v0,$s0,0xffff
    800d7dfc:   2c420000        sltiu   $v0,$v0,0
    800d7e00:   1040000d        beqz    $v0,800d7e38 
    800d7e04:   00000000        nop
    800d7e08:   3c028012        lui     $v0,0x8012
    800d7e0c:   8c423708        lw      $v0,14088($v0)
    800d7e10:   00000000        nop
    800d7e14:   0040f809        jalr    $v0
    800d7e18:   00000000        nop
    800d7e1c:   3c02ffff        lui     $v0,0xffff
    800d7e20:   02021024        and     $v0,$s0,$v0
    800d7e24:   3c030001        lui     $v1,0x1
    800d7e28:   00438021        addu    $s0,$v0,$v1
    800d7e2c:   16000002        bnez    $s0,800d7e38 
    800d7e30:   00000000        nop
    800d7e34:   3c100001        lui     $s0,0x1
    800d7e38:   24020000        li      $v0,0
    800d7e3c:   02021021        addu    $v0,$s0,$v0
    800d7e40:   3c018010        lui     $at,0x8010
    800d7e44:   ac227010        sw      $v0,28688($at)
    800d7e48:   ae300020        sw      $s0,32($s1)
 
It does not add performance I can see.


So, I don't understand why idea of using all free upper bits as asid
extension is bad  -- same time it increases security it allows to
increment
version automatically when asid overflow occurs.

Best wishes,
Vladimir

<Prev in Thread] Current Thread [Next in Thread>