On Mon, Aug 23, 2004 at 07:12:57PM +0200, Ralf Baechle wrote:
> Thiemo and have been compiling various pieces of code with different
> gcc versions trying to find the best possible register for that purpose.
> We used code bloat as (weak ...) indicator for register pressure. It
> turned out that $t9 was the best choice for all tested compiler versions;
> thanks to the much improved register allocation of newer gcc the choice
> of a particular register made far less difference on recent compilers
> than on older compilers.
> I've also implemented a fast system call for reading the thread registers.
> Benchmarks did show that to have about half the latency of a regular
> syscall; the hope was if gcc was doing clever optimization that overhead
> would effectivly become zero.
> I was favoring this low-overhead syscall approach because it would avoid
> the loss of a register thus leaving performance of non-threaded code
> unchanged but other developers generally favor the permanent allocation
> of $t9 as a thread register.
Personally, I favor doing the low-overhead syscall for o32 and then
moving to the new ABI that MIPS is talking about with a thread
register. I'm not sure what to do about n32/n64.
> Other crazy ideas did include a per-thread mapping containing the thread
> pointer - and possibly more information in the future.
Does MIPS have an efficient way to do this for SMP?
> On the positive side if we had multiple register sets on a MIPSxx V2
> processor we could exploit that to get rid of this overheade and do
> other nice optimizations for TLB reload also. Unfortunately these
> register sets are optional feature of the architecture only.
That's more or less what was talked about for ARM v6.