> Ptrace has performance and/or reliability problems when used to
> sandbox threaded applications due to potential race conditions when
> inspecting system call arguments. We hope that we can avoid this
> problem with seccomp.
ptrace certainly has performance issues. I take it the only "reliability
problems" you are talking about are MT races with modifications to user
memory that is relevant to a system call. (Is there something else?)
That is not a "ptrace problem" per se at all. It's an intrinsic problem
with any method based on "generic" syscall interception, if the filtering
and enforcement decisions depend on examining user memory. By the same
token, no such method has a "reliability problem" if the filtering checks
only examine the registers (or other thread-synchronous state).
In the sense that I mean, seccomp is "generic syscall interception" too.
(That is, the checks/enforcement are "around" the call, rather than inside
it with direct atomicity controls binding the checks and uses together.)
The only reason seccomp does not have this "reliability problem" is that
its filtering is trivial and depends only on registers (in fact, only on
one register, the syscall number).
If you want to do checks that depend on shared or volatile state, then
syscall interception is really not the proper mechanism for you. (Likely
examples include user memory, e.g. for file names in open calls, or ioctl
struct contents, etc., fd tables or filesystem details, etc.) For that
you need mechanisms that look at stable kernel copies of user data that
are what the syscall will actually use, such as is done by audit, LSM, etc.
If you only have checks confined to thread-synchronous state such as the
user registers, then you don't have any "reliability problem" regardless
of the the particular syscall interception mechanism you use. (ptrace has
many problems for this or any other purpose, but this is not one of them.)
That's unless you are referring to some other "reliability problem" that
I'm not aware of. (And I'll leave aside the "is it registers or is it
user memory?" issue on ia64 as irrelevant, since, you know, it's ia64.)
If syscall interception is indeed an appropriate mechanism for your needs
and you want something tailored more specifically to your exact use in
future kernels, a module doing this would be easy to implement using the
utrace API. (That might be a "compelling use" of utrace by virtue of the
Midas brand name effect, if nothing else. ;-)