Maciej W. Rozycki wrote:
On Tue, 12 May 2009, David Daney wrote:
+ * Find the split point.
+ if (uasm_insn_has_bdelay(relocs, split - 1))
The code itself makes sense. Does this case actually happen much, or was
this just an itch?
For my CPU it was happening 100% of the time when I add the soon to be
submitted hugeTLBfs support patch. Although I have not measured it, this code
is so hot that keeping the normal case fitting on a single cache line should
be a big win.
Rather than this hack,
I don't really know what to say about that comment.
* We are synthesizing optimized TLB refill handlers, even small
improvements yield big gains in system performance.
* The optimization you suggest below, although a good one, is somewhat
different and would make a good follow on patch.
* I am trying to make forward progress and not have The perfect be the
enemy of the good.
I'd suggest microoptimising the code by shuffling
it such that unless the handler fits in 128 bytes entirely (I'm not sure
if that ever happens for XTLB refill) the part built by
build_get_pgd_vmalloc64() is placed in the TLB handler slot, saving an
unnecessary unconditional branch there. This way the problem of an
unconditional branch to ERET will solve automagically as a side-effect.
Unless the vmalloc part does not fit in 128 bytes, that is, in which case
it would have to overflow back to the XTLB slot. It should be pretty
straightforward to code. ;)