linux-mips
[Top] [All Lists]

Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets

To: "Maciej W. Rozycki" <macro@linux-mips.org>
Subject: Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets
From: Nigel Stephens <nigel@mips.com>
Date: Mon, 02 Aug 2004 21:03:49 +0100
Cc: Ralf Baechle <ralf@linux-mips.org>, Richard Henderson <rth@redhat.com>, Richard Sandiford <rsandifo@redhat.com>, gcc-patches@gcc.gnu.org, linux-mips@linux-mips.org
In-reply-to: <Pine.LNX.4.58L.0407261325470.3873@blysk.ds.pg.gda.pl>
Organization: MIPS Technologies
Original-recipient: rfc822;linux-mips@linux-mips.org
References: <Pine.LNX.4.55.0407191648451.3667@jurand.ds.pg.gda.pl> <87hds49bmo.fsf@redhat.com> <Pine.LNX.4.55.0407191907300.3667@jurand.ds.pg.gda.pl> <20040719213801.GD14931@redhat.com> <Pine.LNX.4.55.0407201505330.14824@jurand.ds.pg.gda.pl> <20040723202703.GB30931@redhat.com> <20040723211232.GB5138@linux-mips.org> <Pine.LNX.4.58L.0407261325470.3873@blysk.ds.pg.gda.pl>
Sender: linux-mips-bounce@linux-mips.org
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.4) Gecko/20030624
Maciej W. Rozycki wrote:

On Fri, 23 Jul 2004, Ralf Baechle wrote:

With a bit of hand waiving because haven't done benchmarks I guess Richard
might be right.  The subroutine calling overhead on modern processors is
rather low and smaller code means better cache hit rates ...

Well, I just worry the call may itself include at least the same number
of instructions as the callee if inlined.  There would be no way for it to
be faster.

That may happen for a leaf function -- the call itself, plus $ra
saving/restoration is already four instructions.  Now it's sufficient for
two statics to be needed to preserve temporaries across such a call and
the size of the caller is already the same.  With three statics, you lose
even for a non-leaf function.  That's for a function containing a single
call to such a shift -- if there are more, then you may win (but is it
common?).

So not only it may not be faster, but the resulting code may be bigger as
well.  That said, the current GCC's implementation of these operations is
not exactly optimal for current MIPS processors.  That's trivial to deal
with in Linux, but would it be possible to pick a different implementation
from libgcc based on the "-march=" setting, too?




I second Maciej. My own recent experience when tuning the hell out of a software floating-point emulator was that efficient 64-bit shifts were really critical. I have a patch against gcc-3.4 which makes the 64-bit inline shifts somewhat smaller on ISAs which include the conditional move (movz/movn) instructions, but more importantly removes all branches from the inline code - which can be very expensive on long pipeline CPUs, since in this sort of code they tend to cause many branch mispredicts. Let me know if you want me to extract the patch - here's a table of the number of instructions generated by the original md pattern and the patched version:

                Instructions
                Old     New
ashldi3         12      9
ashrdi3         12      12
lshrdi3         12      9


If people really don't like the inline expansion, then maybe it could be enabled or disabled by a new -m option.

Nigel

--
                        Nigel Stephens         Mailto:nigel@mips.com
   _    _ ____  ___     MIPS Technologies      Phone.: +44 1223 706200
   |\  /|||___)(___     The Fruit Farm         Direct: +44 1223 706207
   | \/ |||    ____)    Ely Road, Chittering   Fax...: +44 1223 706250
    TECHNOLOGIES UK     Cambridge CB5 9PH      Cell..: +44 7976 686470
                        England                http://www.mips.com



<Prev in Thread] Current Thread [Next in Thread>