linux-mips
[Top] [All Lists]

Re: [PATCH] MIPS: Don't branch to eret in TLB refill.

To: "Maciej W. Rozycki" <macro@linux-mips.org>, ralf@linux-mips.org
Subject: Re: [PATCH] MIPS: Don't branch to eret in TLB refill.
From: David Daney <ddaney@caviumnetworks.com>
Date: Mon, 18 May 2009 09:25:12 -0700
Cc: David VomLehn <dvomlehn@cisco.com>, linux-mips@linux-mips.org
In-reply-to: <alpine.LFD.1.10.0905160706300.12158@ftp.linux-mips.org>
Original-recipient: rfc822;linux-mips@linux-mips.org
References: <1242168316-4009-1-git-send-email-ddaney@caviumnetworks.com> <20090513002337.GA12536@cuplxvomd02.corp.sa.net> <4A0A1E6B.6050908@caviumnetworks.com> <alpine.LFD.1.10.0905160706300.12158@ftp.linux-mips.org>
Sender: linux-mips-bounce@linux-mips.org
User-agent: Thunderbird 2.0.0.21 (X11/20090320)
Maciej W. Rozycki wrote:
On Tue, 12 May 2009, David Daney wrote:

+                       /*
+                        * Find the split point.
+                        */
+                       if (uasm_insn_has_bdelay(relocs, split - 1))
+                               split--;
+               }
The code itself makes sense. Does this case actually happen much, or was
this just an itch?

For my CPU it was happening 100% of the time when I add the soon to be
submitted hugeTLBfs support patch.  Although I have not measured it, this code
is so hot that keeping the normal case fitting on a single cache line should
be a big win.

 Rather than this hack,

I don't really know what to say about that comment.

* We are synthesizing optimized TLB refill handlers, even small improvements yield big gains in system performance.

* The optimization you suggest below, although a good one, is somewhat different and would make a good follow on patch.

* I am trying to make forward progress and not have The perfect be the enemy of the good.

I'd suggest microoptimising the code by shuffling it such that unless the handler fits in 128 bytes entirely (I'm not sure if that ever happens for XTLB refill) the part built by build_get_pgd_vmalloc64() is placed in the TLB handler slot, saving an unnecessary unconditional branch there. This way the problem of an unconditional branch to ERET will solve automagically as a side-effect. Unless the vmalloc part does not fit in 128 bytes, that is, in which case it would have to overflow back to the XTLB slot. It should be pretty straightforward to code. ;)

  Maciej



<Prev in Thread] Current Thread [Next in Thread>