On Wednesday 11 June 2008 Kevin D. Kissell wrote:
> Brian Foster wrote:
> >[ ... the FPU emulation ] trampoline, which is pushed on
> > the user-land stack is, unlike sigreturn, not fixed code.
> > It varies on a per-instance per-thread basis. Hence the
> > simple ‘vsyscall’ mechanism ((to be?) used for sigreturn)
> > is inappropriate.
> > The trampoline is only used to execute a non-FP instruction
> > (<instr>) in the delay slot of an FP-instruction [ ... ]
> > Belch! ;-\ Whilst I can think of a few things that may work
> > (temporarily change page permissions; or go ahead and use
> > the ‘vsyscall’ page with some interlocking magic; or a new
> > new dedicated per-thread page; or ...?) none seem appealing.
>[ ... ]
> As the jerk who originally bolted the FP emulator into the MIPS kernel
> and came up with the stack trampoline hack, I can explain why it seemed
> sane at the time. If an FP branch is emulated and to be taken, we have to
> find a way for the instruction in the delay slot to be executed prior to the
> transfer of control to the branch target. It has to execute with the user's
> permissions. Putting it on the user's stack and building a trampoline was
> the fairly classical way of doing it, but note that it's architecturally
> illegal to put a branch in a branch delay slot (floating point or otherwise),
> so there's no possibility of recursion. So one only needs 3-4 words (one
> could substitute another means of validation for the cookie) per thread.
Yes, once I worked out what it was doing it all seemed cute
(albeit I don't quite see what the danger is with recursion?).
My “Belch!” was referring to the problems it now causes with
> It just has to be part of the user's address space. I suppose
> that instead of using a few words just above the stack, one could use
> a few words just below the current "brk()" point, or, better still (but
> far more invasive) pad the text segment, which should always be
> executable, with 4 words that the kernel can find in a hurry.
First, you need to really careful about multithreaded code
concurrently doing FPU stuff. That is, it's possible there
may be more than one “live” emulated FPU delay slot in the
same address space. So stuffing the code into text, or
near the brk()-point, or similar, all has concurrency issues.
This is what makes the current on-the-stack approach neat;
the stack _is_ per-thread so there's no concurrency mess.
As for putting the trampoline near the brk()-point, besides
the concurrency problem, there's also the issue that the
containing page would have to be made user-executable (if
temporarily). Unless I'm confused, that page is nominally
data (heap-ish). With the addition of XI support, I would
expect data to nominally also be non-executable.
“How many surrealists does it take to | Brian Foster
change a lightbulb? Three. One calms | somewhere in south of France
the warthog, and two fill the bathtub | Stop E$$o (ExxonMobil)!
with brightly-coloured machine tools.” | http://www.stopesso.com