On Fri, Jun 13, 2014 at 2:37 PM, Alexei Starovoitov <firstname.lastname@example.org> wrote:
> On Fri, Jun 13, 2014 at 2:25 PM, Andy Lutomirski <email@example.com> wrote:
>> On Fri, Jun 13, 2014 at 2:22 PM, Alexei Starovoitov <firstname.lastname@example.org> wrote:
>>> On Tue, Jun 10, 2014 at 8:25 PM, Kees Cook <email@example.com> wrote:
>>>> This adds the new "seccomp" syscall with both an "operation" and "flags"
>>>> parameter for future expansion. The third argument is a pointer value,
>>>> used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must
>>>> be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...).
>>>> Signed-off-by: Kees Cook <firstname.lastname@example.org>
>>>> Cc: email@example.com
>>>> arch/x86/syscalls/syscall_32.tbl | 1 +
>>>> arch/x86/syscalls/syscall_64.tbl | 1 +
>>>> include/linux/syscalls.h | 2 ++
>>>> include/uapi/asm-generic/unistd.h | 4 ++-
>>>> include/uapi/linux/seccomp.h | 4 +++
>>>> kernel/seccomp.c | 63
>>>> kernel/sys_ni.c | 3 ++
>>>> 7 files changed, 69 insertions(+), 9 deletions(-)
>>>> diff --git a/arch/x86/syscalls/syscall_32.tbl
>>>> index d6b867921612..7527eac24122 100644
>>>> --- a/arch/x86/syscalls/syscall_32.tbl
>>>> +++ b/arch/x86/syscalls/syscall_32.tbl
>>>> @@ -360,3 +360,4 @@
>>>> 351 i386 sched_setattr sys_sched_setattr
>>>> 352 i386 sched_getattr sys_sched_getattr
>>>> 353 i386 renameat2 sys_renameat2
>>>> +354 i386 seccomp sys_seccomp
>>>> diff --git a/arch/x86/syscalls/syscall_64.tbl
>>>> index ec255a1646d2..16272a6c12b7 100644
>>>> --- a/arch/x86/syscalls/syscall_64.tbl
>>>> +++ b/arch/x86/syscalls/syscall_64.tbl
>>>> @@ -323,6 +323,7 @@
>>>> 314 common sched_setattr sys_sched_setattr
>>>> 315 common sched_getattr sys_sched_getattr
>>>> 316 common renameat2 sys_renameat2
>>>> +317 common seccomp sys_seccomp
>>>> # x32-specific system call numbers start at 512 to avoid cache impact
>>>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>>>> index b0881a0ed322..1713977ee26f 100644
>>>> --- a/include/linux/syscalls.h
>>>> +++ b/include/linux/syscalls.h
>>>> @@ -866,4 +866,6 @@ asmlinkage long sys_process_vm_writev(pid_t pid,
>>>> asmlinkage long sys_kcmp(pid_t pid1, pid_t pid2, int type,
>>>> unsigned long idx1, unsigned long idx2);
>>>> asmlinkage long sys_finit_module(int fd, const char __user *uargs, int
>>>> +asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
>>>> + const char __user *uargs);
>>> It looks odd to add 'flags' argument to syscall that is not even used.
>>> It don't think it will be extensible this way.
>>> 'uargs' is used only in 2nd command as well and it's not 'char __user *'
>>> but rather 'struct sock_fprog __user *'
>>> I think it makes more sense to define only first argument as 'int op' and
>>> rest as variable length array.
>>> Something like:
>>> long sys_seccomp(unsigned int op, struct nlattr *attrs, int len);
>>> then different commands can interpret 'attrs' differently.
>>> if op == mode_strict, then attrs == NULL, len == 0
>>> if op == mode_filter, then attrs->nla_type == seccomp_bpf_filter
>>> and nla_data(attrs) is 'struct sock_fprog'
>> Eww. If the operation doesn't imply the type, then I think we've
>> totally screwed up.
>>> If we decide to add new types of filters or new commands, the syscall
>>> won't need to change. New commands can be added preserving backward
>>> The basic TLV concept has been around forever in netlink world. imo makes
>>> sense to use it with new syscalls. Passing 'struct xxx' into syscalls
>>> is the thing
>>> of the past. TLV style is more extensible. Fields of structures can become
>>> optional in the future, new fields added, etc.
>>> 'struct nlattr' brings the same benefits to kernel api as protobuf did
>>> to user land.
>> I see no reason to bring nl_attr into this.
>> Admittedly, I've never dealt with nl_attr, but everything
>> netlink-related I've even been involved in has involved some sort of
>> API atrocity.
> netlink has a lot of legacy and there is genetlink which is not pretty
> either because of extra socket creation, binding, dealing with packet
> loss issues, but the key concept of variable length encoding is sound.
> Right now seccomp has two commands and they already don't fit
> into single syscall neatly. Are you saying there should be two syscalls
> here? What about another seccomp related command? Another syscall?
> imo all seccomp related commands needs to be mux/demux-ed under
> one syscall. What is the way to mux/demux potentially very different
> commands under one syscall? I cannot think of anything better than
> TLV style. 'struct nlattr' is what we have today and I think it works fine.
> I'm not suggesting to bring the whole netlink into the picture, but rather
> TLV style of encoding different arguments for different commands.
I'm unconvinced. These are simple commands, and I think the interface
should be simple. Syscalls are cheap.
As an example, the interface could be:
int seccomp_add_filter(const struct sock_fprog *filter, unsigned int flags);
The "tsync" operation would be seccomp_add_filter(NULL,
SECCOMP_ADD_FILTER_TSYNC) -- it's equivalent to adding an
always-accept filter and syncing threads.
But, frankly, this kind of stuff should probably be "do operation X".
IIUC nl_attr is more like "do something, with these tags and values",
which results in oddities like whatever should happen of more than one
tag is set.
AMA Capital Management, LLC