On Sat, 8 Jul 2006, Atsushi Nemoto wrote:
> > For a VIVT I-cache this can result in a TLB exception. TLB handlers are
> > not currently prepared for being called at the exception level.
>
> Thanks, now I understand the problem. Are there any good solutions?
> Only I can think now is using handle_ri_slow for such CPUs.
I have implemented an appropriate update to the TLB handlers (or actually
it's enough to care for this case for the TLBL exception), but it predates
the current synthesized ones. There is a small impact resulting from
this change and the synthesized handlers have the advantage of making it
only necessary for these chips that do need such handling.
There are two possible ways of handling TLB exceptions from the exception
level, both requiring checking cp0.index.p (which we do not do at the
moment under the assumption a TLB refill exception has already been taken
and handled) and if a failure is indicated either:
1. jumping to the TLB refill handler,
or:
2. executing "tlbwr" rather than "tlbwi".
Both are good, but I have not benchmarked them -- note that a failure is
expected to be an extremely rare event, so it's the performance for the
probe success that matters.
> > Also I am fairly sure gas won't fill the branch delay slot above -- a
> > trivial rearrangement of code would save a cycle here (and this is a fast
> > path, so we do not want wasting time).
>
> Well, here is a code compiled by binutils 2.17. This version of gas
> can put MFC0 on the delay slot. But it might be better to use
> noreorder by myself.
>
> 80012a80 <handle_ri>:
> 80012a80: 401a6800 mfc0 k0,c0_cause
> 80012a84: 0740fd2e bltz k0,80011f40 <handle_ri_slow>
> 80012a88: 401b7000 mfc0 k1,c0_epc
> 80012a8c: 8f7a0000 lw k0,0(k1)
Still bad -- you have a stall on $k1 here. And on $k0 two instructions
earlier.
> 80012a90: 3c1b7c03 lui k1,0x7c03
> 80012a94: 377be83b ori k1,k1,0xe83b
> 80012a98: 175bfd29 bne k0,k1,80011f40 <handle_ri_slow>
> 80012a9c: 00000000 nop
And this "nop" is a waste of time.
> 80012aa0: 3c1b801b lui k1,0x801b
> 80012aa4: 8f7b4008 lw k1,16392(k1)
> 80012aa8: 401a7000 mfc0 k0,c0_epc
> 80012aac: 275a0004 addiu k0,k0,4
> 80012ab0: 409a7000 mtc0 k0,c0_epc
> 80012ab4: 377b1fff ori k1,k1,0x1fff
> 80012ab8: 3b7b1fff xori k1,k1,0x1fff
> 80012abc: 8f63000c lw v1,12(k1)
> 80012ac0: 42000018 eret
I'd restructure the code more or less like this, taking care for (almost)
all stalls resulting from interlocks on coprocessor moves and memory loads
and likewise avoiding the need for "nop" fillers there for MIPS I
processors:
.set push
.set noat
.set noreorder
mfc0 k0, CP0_CAUSE
MFC0 k1, CP0_EPC
bltz k0, handle_ri_slow /* if delay slot */
lui k0, 0x7c03
lw k1, (k1)
ori k0, 0xe83b /* k0 := rdhwr v1,$29 */
bne k0, k1, handle_ri_slow /* if not ours */
get_saved_sp /* k1 := current_thread_info */
MFC0 k0, CP0_EPC
#if defined(CONFIG_CPU_R3000) || defined(CONFIG_CPU_TX39XX)
ori k1, _THREAD_MASK
xori k1, _THREAD_MASK
LONG_L v1, TI_FLAGS(k1)
PTR_ADDIU k0, 4
jr k0
rfe
#else
PTR_ADDIU k0, 4 /* stall on $k0 */
MTC0 k0, CP0_EPC
ori k1, _THREAD_MASK
xori k1, _THREAD_MASK
LONG_L v1, TI_FLAGS(k1)
eret
#endif
.set pop
I hope I got this right. ;-)
Maciej
|