[Top] [All Lists]

Re: Lmbench results for Linux/MIPS 2.1.90

Subject: Re: Lmbench results for Linux/MIPS 2.1.90
From: "William J. Earl" <>
Date: Mon, 6 Apr 1998 09:35:12 -0700
In-reply-to: <>
References: <> <> <>
Sender: writes:
 > That's implemented now.  I'm also pulling another trick.  Why the hell
 > should be save all the s-registers during system calls?  The MIPS calling
 > sequence guarantees to us that they will not be destroyed.  Whoops,
 > another 150us or so.  It's what brought us down to 861ns, faster than
 > big bad Pentium from Borg.  All that it takes is adding some extra code;
 > sys_fork(), sys_clone() and do_signal expect the s-registers to be in
 > the stackframe, so I save them only in these routines.

     Yes, UMIPS-BSD (but not IRIX or RISC/os) did that too.  The 861 ns.
also means that you are not getting any cache misses at all, which is very
nice.  (Some systems seem to think a system call without a cache miss
is a day without sunshine.  :-) )

 > I'm thinking about changing the calling sequence of syscalls as well.
 > When we get more than four arguments passed, we have to dig them out of
 > the userstack.  While this is fast on Linux we still need time for the
 > safety checks.  The t registers which are being clobbered anyway would
 > be sooo nice to pass them.
 > (Hey people, remember I told ya static linking is evil?  That change would
 > fry all your binaries ...  still time to relink :-)

      You could or some suitable higher-order bit into the system call number
to distinguish the two cases (and mask it off in syscall before indexing
the system call table).  Since you have the system call number in a register,
that should be pretty cheap to check.

     Instead of changing the calling sequence, perhaps you could
do the fetching in a special assembly subroutine, and have the trap
handler notice if $epc is in the routine at the instruction which fetches
from the user space.  If so, it could change $epc to some recovery address
in the assembly routine, which would return the fault indication.
(There would probably be multiple load instructions in a fully
unrolled routine, but that would just be more locations to accept 
as valid exception points.)  This takes cycles out of the normal
path, at the cost of cycles in the trap path (for kernel traps only,
and then only for cases which are going to turn into EFAULT anyway).


<Prev in Thread] Current Thread [Next in Thread>