On Thu, 11 Jul 2002, Kevin D. Kissell wrote:
> > - SIG_IGN: return to EPC with no action. A program will loop
> > indefinitely, but if that's what a user wants...
> I don't think that this is the right thing to do, philosophically.
> Hanging in an infinite loop and making no forward progress
> is not, to me "ignoring" an event. The old X/Open specs I've
> got say that SIGFPE, SIGILL, and SIGSEGV behavior is
> undefined if bound to SIG_IGN (curiously, they don't call
> out SIGBUS), but I think that in practical terms we need to
> provide whatever behavior people expect from Linux on
> x86 and PPC. What happens on those platforms? A
> quick look at the x86 kernel code makes me think that
> they do, indeed, do the "wrong" thing and beat their
> heads against the ignored event for all eternity, but I'm
> insufficiently an expert in x86 trap semantics to know
> for certain whether that's the case. If it is, right or
> wrong, that's what we ought to do.
Yes, they loop indefinitely. That my be useful for debugging -- you may
attach to a running program and you'll be sure to get at the faulting
instruction. Otherwise the warning from the libc manual applies:
"If you block or ignore these signals or establish handlers for them that
return normally, your program will probably break horribly when such
signals happen, unless they are generated by `raise' or `kill' instead of
a real error."
So a user (programmer) has been warned.
> > - HANDLER: call a handler with the signal context unmodified and let the
> > user code decide what to do.
> Independently of what we do for the SIG_IGN cases,
> this is important, and the user code cannot decide what
> to do if it cannot know what instruction caused the fault.
> Fixups on SIGFPE must be able to find the FP instruction,
> which is not currently possible if it was in a branch delay
> slot. Similarly, user-mode emulation of "memory" via
Well, the Cause register is passed to the userland, so only EPC needs to
> signal handlers cannot work unless the loads and stores
> can be identified. But, having "done the deed", return
> from the signal handler should resume at the instruction
> *following* the one generating the fault, and not replay
> the same instruction. We *could* punt that to the signal
> handler, but making every signal package carry its own
> copy of compute_return_epc() to handle the branch
> delay slot cases strikes me as being unfriendly to the
> user and is arguably slightly less reliable. I guess I'd like things
> to be rigged so that the sigcontext structure contains the address
> of the faulting instruction as the sc_pc, but where the return
> from signal goes to the address calculated by
> compute_return_epc(). But again, what do people expect
> in the "mainstream" world of x86 Linux?
FPE faults on the x87 fault before the *following* FP instruction (which
is a regular one or the special "wait" one). The context of the faulting
instruction (both the instruction and data addresses and the opcode) is
saved in special registers (as usually with i386, the most complex way was
chosen) and can be retrieved by dumping the FPU context to memory (see the
"fnstenv" and "fnsave" instructions).
So the i386 is very different and can't really be used as a reference.
However, a brief look at the Alpha port (which is mature and also the
Alpha CPU is much similar to MIPS) reveals the code never modifies the
saved PC in the kernel. But again, the FPU traps happen after faulting
instructions (for older models even imprecisely -- see the search back
code in alpha_fp_emul_imprecise()).
With current specifications I think the best way for the SIGFPE handler
(since it's somewhat special) would be to provide the address of the
faulting instruction in siginfo_t.si_addr and have the EPC in sigcontext
set up for a continuation (that would still allow longjmp(), etc.).
Ideally, I'd see it reversely, i.e. EPC unchanged and siginfo_t.si_addr
containing an address to continue, so that a handler would have to
explicitly copy the address to EPC if it decided it handled the signal
successfully (so that a program doesn't continue unpredictably after an
integer division by zero, because the handler expected only real FP
faults) -- maybe we should extend siginfo_t?
For other exceptions, I'd just leave EPC alone.
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+ e-mail: email@example.com, PGP key available +