ralf@uni-koblenz.de wrote:
>
> Ok, here is a draft version of an agressively optimized version of
> get_mmu_context(). I just didn't like the idea of referencing
> global variables in get_mmu_context() if avoidable. The code below
> will work on both R3000 and R4000 with no performance penalty for
> being generic. The trick is to patch the operands of two machine
> instructions at runtime, shoot me.
>
> The code below should be integrated into <asm/mmu_context.h>. The
> CPU specific code in arch/mips/mm/... will then just have to include
> that header file and call r3000_asid_setup rsp. r4xx0_asid_setup.
>
> Have fun,
>
> Ralf
>
> #define ASID_VERSION_SHIFT 16
> #define ASID_VERSION_MASK ((~0UL) << ASID_VERSION_SHIFT)
> #define ASID_FIRST_VERSION (1UL << ASID_VERSION_SHIFT)
>
> unsigned long asid_cache = ASID_FIRST_VERSION;
>
> /* The next two macros know that they will only be assembled once
> per kernel. */
> #define ASID_VERSION_INC \
> ({ unsigned long __asid_inc; \
> __asm__(".globl\tasid_inc\n" \
> "asid_inc:\n\t" \
> "li\t%0,0\t\t\t#patched\n\t" \
> :"=r" (__asid_inc)); \
> __asid_inc; })
>
> #define ASID_OVERFLOW(asid) \
> ({ unsigned long __res; \
> __asm__(".global\tasid_overflow\n" \
> "asid_overflow:\n\t" \
> "sltu\t%0,%1,0\t\t\t#patched\n\t" \
> :"=r" (__res) \
> :"r" (asid)); \
> __res; })
>
> extern inline void get_new_mmu_context(struct mm_struct *mm, unsigned long
> asid)
> {
> /* check if it's legal.. */
> if (ASID_OVERFLOW(asid & ~ASID_VERSION_MASK)) {
> /* start a new version, invalidate all old asid's */
> flush_tlb_all();
> asid = (asid & ASID_VERSION_MASK) + ASID_FIRST_VERSION;
> if (!asid)
> asid = ASID_FIRST_VERSION;
> }
> asid_cache = asid + ASID_VERSION_INC;
> mm->context = asid; /* full version + asid */
> }
>
> extern void get_mmu_context(struct task_struct *p)
> {
> struct mm_struct *mm = p->mm;
>
> if (mm) {
> unsigned long asid = asid_cache;
> /* Check if our ASID is of an older version and thus invalid
> */
> if ((mm->context ^ asid) & ASID_VERSION_MASK)
> get_new_mmu_context(mm, asid);
> }
> }
>
> extern inline void __asid_setup(unsigned long asid_inc, unsigned long
> asid_cmp)
> {
> extern u32 asid_inc;
> extern u32 asid_overflow;
>
> asid_inc = (asid_inc & 0xffff0000) | asid_inc;
> flush_icache_range(&asid_inc, 4);
> asid_overflow = (asid_overflow & 0xffff0000) | asid_cmp;
> flush_icache_range(&asid_overflow, 4);
> }
>
> extern inline void r3000_asid_setup(void)
> {
> __asid_setup(0x40, 0xfc1);
> }
>
> extern inline void r4xx0_asid_setup(void)
> {
> __asid_setup(1, 0x100);
> }
;-----------------------------------------------------------------------
Hello Ralf, hello others !
Thanks for Your comments about optimization and 'asid_cache' parameter,
but I see no reason to make 'get_new_mmu_context' so complex,
due code we suggested can be easily optimized without problems.
I looked assembler generated by gcc and don't see any performance
penalty
against code-modified version.
Optimized (and tested) code looks more easily:
<<<<<<<<<<<<
/*
* All unused by hardware upper bits will be considered
* as software asid extension -- asid version.
*/
#define ASID_VERSION_MASK ((unsigned long)~(ASID_MASK|(ASID_MASK-1)))
#define ASID_FIRST_VERSION ((unsigned long)(~ASID_VERSION_MASK) + 1)
extern inline void get_new_mmu_context(struct mm_struct *mm, unsigned
long asid)
{
if (! ((asid += (1<<ASID_SHIFT)) & ASID_MASK) ) {
flush_tlb_all(); /* start new asid cycle */
if (!asid) /* fix version if needed */
asid = ASID_FIRST_VERSION;
}
mm->context = asid_cache = asid;
}
>>>>>>>>>>>>
800d7d54: 26100040 addiu $s0,$s0,64
800d7d58: 32020fc0 andi $v0,$s0,0xfc0
800d7d5c: 14400009 bnez $v0,800d7d84
800d7d60: afbf0018 sw $ra,24($sp)
800d7d64: 3c028012 lui $v0,0x8012
800d7d68: 8c423708 lw $v0,14088($v0)
800d7d6c: 00000000 nop
800d7d70: 0040f809 jalr $v0
800d7d74: 00000000 nop
800d7d78: 16000002 bnez $s0,800d7d84
800d7d7c: 00000000 nop
800d7d80: 24101000 li $s0,4096
800d7d84: 3c018010 lui $at,0x8010
800d7d88: ac307010 sw $s0,28688($at)
800d7d8c: ae300020 sw $s0,32($s1)
So it takes usually 6 instructions.
Code You suggested compiled as:
(numbers must be changed relatively startup-patch values)
800d7df8: 3202ffff andi $v0,$s0,0xffff
800d7dfc: 2c420000 sltiu $v0,$v0,0
800d7e00: 1040000d beqz $v0,800d7e38
800d7e04: 00000000 nop
800d7e08: 3c028012 lui $v0,0x8012
800d7e0c: 8c423708 lw $v0,14088($v0)
800d7e10: 00000000 nop
800d7e14: 0040f809 jalr $v0
800d7e18: 00000000 nop
800d7e1c: 3c02ffff lui $v0,0xffff
800d7e20: 02021024 and $v0,$s0,$v0
800d7e24: 3c030001 lui $v1,0x1
800d7e28: 00438021 addu $s0,$v0,$v1
800d7e2c: 16000002 bnez $s0,800d7e38
800d7e30: 00000000 nop
800d7e34: 3c100001 lui $s0,0x1
800d7e38: 24020000 li $v0,0
800d7e3c: 02021021 addu $v0,$s0,$v0
800d7e40: 3c018010 lui $at,0x8010
800d7e44: ac227010 sw $v0,28688($at)
800d7e48: ae300020 sw $s0,32($s1)
It does not add performance I can see.
So, I don't understand why idea of using all free upper bits as asid
extension is bad -- same time it increases security it allows to
increment
version automatically when asid overflow occurs.
Best wishes,
Vladimir
|