On Fri, Jun 13, 2014 at 2:25 PM, Andy Lutomirski <email@example.com> wrote:
> On Fri, Jun 13, 2014 at 2:22 PM, Alexei Starovoitov <firstname.lastname@example.org> wrote:
>> On Tue, Jun 10, 2014 at 8:25 PM, Kees Cook <email@example.com> wrote:
>>> This adds the new "seccomp" syscall with both an "operation" and "flags"
>>> parameter for future expansion. The third argument is a pointer value,
>>> used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must
>>> be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...).
>>> Signed-off-by: Kees Cook <firstname.lastname@example.org>
>>> Cc: email@example.com
>>> arch/x86/syscalls/syscall_32.tbl | 1 +
>>> arch/x86/syscalls/syscall_64.tbl | 1 +
>>> include/linux/syscalls.h | 2 ++
>>> include/uapi/asm-generic/unistd.h | 4 ++-
>>> include/uapi/linux/seccomp.h | 4 +++
>>> kernel/seccomp.c | 63
>>> kernel/sys_ni.c | 3 ++
>>> 7 files changed, 69 insertions(+), 9 deletions(-)
>>> diff --git a/arch/x86/syscalls/syscall_32.tbl
>>> index d6b867921612..7527eac24122 100644
>>> --- a/arch/x86/syscalls/syscall_32.tbl
>>> +++ b/arch/x86/syscalls/syscall_32.tbl
>>> @@ -360,3 +360,4 @@
>>> 351 i386 sched_setattr sys_sched_setattr
>>> 352 i386 sched_getattr sys_sched_getattr
>>> 353 i386 renameat2 sys_renameat2
>>> +354 i386 seccomp sys_seccomp
>>> diff --git a/arch/x86/syscalls/syscall_64.tbl
>>> index ec255a1646d2..16272a6c12b7 100644
>>> --- a/arch/x86/syscalls/syscall_64.tbl
>>> +++ b/arch/x86/syscalls/syscall_64.tbl
>>> @@ -323,6 +323,7 @@
>>> 314 common sched_setattr sys_sched_setattr
>>> 315 common sched_getattr sys_sched_getattr
>>> 316 common renameat2 sys_renameat2
>>> +317 common seccomp sys_seccomp
>>> # x32-specific system call numbers start at 512 to avoid cache impact
>>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>>> index b0881a0ed322..1713977ee26f 100644
>>> --- a/include/linux/syscalls.h
>>> +++ b/include/linux/syscalls.h
>>> @@ -866,4 +866,6 @@ asmlinkage long sys_process_vm_writev(pid_t pid,
>>> asmlinkage long sys_kcmp(pid_t pid1, pid_t pid2, int type,
>>> unsigned long idx1, unsigned long idx2);
>>> asmlinkage long sys_finit_module(int fd, const char __user *uargs, int
>>> +asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
>>> + const char __user *uargs);
>> It looks odd to add 'flags' argument to syscall that is not even used.
>> It don't think it will be extensible this way.
>> 'uargs' is used only in 2nd command as well and it's not 'char __user *'
>> but rather 'struct sock_fprog __user *'
>> I think it makes more sense to define only first argument as 'int op' and the
>> rest as variable length array.
>> Something like:
>> long sys_seccomp(unsigned int op, struct nlattr *attrs, int len);
>> then different commands can interpret 'attrs' differently.
>> if op == mode_strict, then attrs == NULL, len == 0
>> if op == mode_filter, then attrs->nla_type == seccomp_bpf_filter
>> and nla_data(attrs) is 'struct sock_fprog'
> Eww. If the operation doesn't imply the type, then I think we've
> totally screwed up.
>> If we decide to add new types of filters or new commands, the syscall
>> won't need to change. New commands can be added preserving backward
>> The basic TLV concept has been around forever in netlink world. imo makes
>> sense to use it with new syscalls. Passing 'struct xxx' into syscalls
>> is the thing
>> of the past. TLV style is more extensible. Fields of structures can become
>> optional in the future, new fields added, etc.
>> 'struct nlattr' brings the same benefits to kernel api as protobuf did
>> to user land.
> I see no reason to bring nl_attr into this.
> Admittedly, I've never dealt with nl_attr, but everything
> netlink-related I've even been involved in has involved some sort of
> API atrocity.
netlink has a lot of legacy and there is genetlink which is not pretty
either because of extra socket creation, binding, dealing with packet
loss issues, but the key concept of variable length encoding is sound.
Right now seccomp has two commands and they already don't fit
into single syscall neatly. Are you saying there should be two syscalls
here? What about another seccomp related command? Another syscall?
imo all seccomp related commands needs to be mux/demux-ed under
one syscall. What is the way to mux/demux potentially very different
commands under one syscall? I cannot think of anything better than
TLV style. 'struct nlattr' is what we have today and I think it works fine.
I'm not suggesting to bring the whole netlink into the picture, but rather
TLV style of encoding different arguments for different commands.