To recap, Pete found a hole in the FPU emulator code,
wherein a signal delivered between the setup and the
execution of the trampoline code for an instruction in
the delay slot of an emulated floating point branch
would trash the trampoline code and cause Bad
Things to happen. Adding a more space between
the user stack pointer and the trampoline code
solves the problem in a particular case, but is
provably still at risk, since signal handlers can
allocate arbitrary amounts of stack storage.
Carsten and I both developed patches to inhibit
the delivery of signals during the trampoline.
This had the defect of causing processes to block
indefinitely in cases where the instruction being
executed by the trampoline itself causes an exception,
as Carsten was able to demonstrate using "crashme".
My patch used relatively heavyweight high-level
mechanisms, which have the virtue of being easily
configurable to allow certain signals from which the
signal handler is unable or unlikely to return (such
as SIGILL, SIGSEGV, and SIGBUS) to be delivered.
That seems to be OK for both crashme and Pete's
program, but I believe it is still broken for things like
trap instructions in the branch delay slot, and for
ptrace() single-stepping. So I've got another idea
that I think is much better, and I'm kicking myself for
not having thought of it earlier.
The idea is to simply ensure that signal stack
frames always leave a space between the
current user SP and the stack frame itself. If no
signals fire, fine. If a signal is delivered and
caught, its frame will be beyond the FP emulator
trampoline. This is a trivial hack to get_sigframe()
that should be completely harmless (aside from
increased memory consumption), since it's
aleady set up to accept signal frames that aren't
on the stack as per Posix signal stacks.
There are, however, a flies in the ointment, as always.
Once we commit that original sin of allowing signals to
"play through" during the trampoline sequence, we open
the door to a signal handler containing FP branches itself,
which would need to be emulated. Yes, we could create
another trampoline further up the stack, but the problem is
that the thread-specific data to allow recovery from the first
trampoline will be overwritten by the second. Furthermore,
the trampoline mechanism is terminated by catching the
next unaligned access fault for the thread. A signal handler
could also generate an unaligned access fault. The former
problem could probably be worked around acceptably by
simply having the FP trap handler notice that it is already
in a delay-slot-emulation sequence, and nail the process
with SIGFPE or SIGILL if it happens to do another one.
Not nice, but at least consistent, and the case is highly
unlikely to arise. But we've still got the unaligned access
case to consider. So I submit the following for your consideration.
In the FP emulator, when we set up the trampoline, we set
up the following above the user stack:
AdELOAD (unaligned load of r0 off of r0)
Rather than use thread.desmul_epc as a flag to indicate
to the unaligned access handler that there is emulation
going on, the unaligned access handler would test to see
if the unaligned access instruction itself was the AdELOAD,
and that it is followed by the magic cookie, and if so, treat
that as an indication to stuff the value following the sequence
(as indicated by EPC, modulo branch delays) into the EPC
and return. This way, one could nest an arbitrary number of
emulation/signal/emulation sequences, and they should unroll
correctly. The further magic cookie is needed since, even though
it's a completely useless instruction, the AdELOAD could be
encountered for other reasons, as a misguided attempt
at cache prefetch, or by executing crashme. It's a useless instruction
to emulate, having r0 as the destination, but the normal Linux semantics
as I understand them would be to turn it into a very expensive
no-op and continue in what might otherwise be safe and sane
execution. With the addtional code word after the AdELOAD
(and before the EPC) on the dsemul stack that would be (more-or-less)
guaranteed to really be an illegal instruction, the only risk we would
be taking would be to have a program really containing the
unaligned nonsense load followed by the illegal instruction
blow up, not on the illegal instruction, but by picking up a bogus
EPC. Not perfect. But maybe the lesser of several evils.
I'm working on a prototype, but do you think that this scheme