On Tue, 12 May 2009, David Daney wrote:
> > > + /*
> > > + * Find the split point.
> > > + */
> > > + if (uasm_insn_has_bdelay(relocs, split - 1))
> > > + split--;
> > > + }
> > The code itself makes sense. Does this case actually happen much, or was
> > this just an itch?
> For my CPU it was happening 100% of the time when I add the soon to be
> submitted hugeTLBfs support patch. Although I have not measured it, this code
> is so hot that keeping the normal case fitting on a single cache line should
> be a big win.
Rather than this hack, I'd suggest microoptimising the code by shuffling
it such that unless the handler fits in 128 bytes entirely (I'm not sure
if that ever happens for XTLB refill) the part built by
build_get_pgd_vmalloc64() is placed in the TLB handler slot, saving an
unnecessary unconditional branch there. This way the problem of an
unconditional branch to ERET will solve automagically as a side-effect.
Unless the vmalloc part does not fit in 128 bytes, that is, in which case
it would have to overflow back to the XTLB slot. It should be pretty
straightforward to code. ;)