[Top] [All Lists]

[PATCH v7 1/1] man-pages: seccomp.2: document syscall

To: "Michael Kerrisk (man-pages)" <>
Subject: [PATCH v7 1/1] man-pages: seccomp.2: document syscall
From: Kees Cook <>
Date: Mon, 23 Jun 2014 15:01:50 -0700
Cc:, Andy Lutomirski <>, Alexei Starovoitov <>, Andrew Morton <>, Daniel Borkmann <>, Oleg Nesterov <>, Will Drewry <>, Julien Tinnes <>, David Drysdale <>,,,,,,
In-reply-to: <>
List-archive: <>
List-help: <>
List-id: linux-mips <>
List-owner: <>
List-post: <>
List-software: Ecartis version 1.0.0
List-subscribe: <>
List-unsubscribe: <>
Organization: Chromium
Original-recipient: rfc822;
Resent-date: Mon, 23 Jun 2014 16:29:08 -0700
Resent-from: Kees Cook <>
Resent-message-id: <>
Resent-to: "Michael Kerrisk (man-pages)" <>,, Andy Lutomirski <>, Alexei Starovoitov <>, Andrew Morton <>, Daniel Borkmann <>, Oleg Nesterov <>, Will Drewry <>, Julien Tinnes <>, David Drysdale <>,,,,,,,
Combines documentation from prctl, and in-kernel seccomp_filter.txt,
along with new details specific to the new syscall.

Signed-off-by: Kees Cook <>
 man2/seccomp.2 |  333 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 333 insertions(+)
 create mode 100644 man2/seccomp.2

diff --git a/man2/seccomp.2 b/man2/seccomp.2
new file mode 100644
index 0000000..de7fbf7
--- /dev/null
+++ b/man2/seccomp.2
@@ -0,0 +1,333 @@
+.\" Copyright (C) 2014 Kees Cook <>
+.\" and Copyright (C) 2012 Will Drewry <>
+.\" and Copyright (C) 2008 Michael Kerrisk <>
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein.  The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.TH SECCOMP 2 2014-06-23 "Linux" "Linux Programmer's Manual"
+seccomp \-
+operate on Secure Computing state of the process
+.B #include <linux/seccomp.h>
+.B #include <linux/filter.h>
+.B #include <linux/audit.h>
+.B #include <linux/signal.h>
+.B #include <sys/ptrace.h>
+.BI "int seccomp(unsigned int " operation ", unsigned int " flags ,
+.BI "            unsigned char *" args );
+.BR seccomp ()
+system call operates on the Secure Computing (seccomp) state of the
+current process.
+Currently, Linux supports the following
+.IR operation
+Only system calls that the thread is permitted to make are
+.BR read (2),
+.BR write (2),
+.BR _exit (2),
+.BR sigreturn (2).
+Other system calls result in the delivery of a
+signal. Strict secure computing mode is useful for number-crunching
+applications that may need to execute untrusted byte code, perhaps
+obtained by reading from a pipe or socket.
+This operation is available only if the kernel is configured with
+The value of
+.IR flags
+must be 0, and
+.IR args
+must be NULL.
+This operation is functionally identical to calling
+The system calls allowed are defined by a pointer to a Berkeley Packet
+Filter (BPF) passed via
+.IR args .
+This argument is a pointer to
+.IR "struct\ sock_fprog" ;
+it can be designed to filter arbitrary system calls and system call
+arguments. If the filter is invalid, the call will fail, returning
+.IR errno .
+.BR fork (2),
+.BR clone (2),
+.BR execve (2)
+are allowed by the filter, any child processes will be constrained to
+the same filters and system calls as the parent.
+Prior to using this operation, the process must call
+.IR "prctl(PR_SET_NO_NEW_PRIVS,\ 1)"
+or run with
+privileges in its namespace. If these are not true, the call will fail
+and return
+.IR errno .
+This requirement ensures that filter programs cannot be applied to child
+processes with greater privileges than the process that installed them.
+Additionally, if
+.BR prctl (2)
+.BR seccomp (2)
+is allowed by the attached filter, additional filters may be layered on
+which will increase evaluation time, but allow for further reduction of
+the attack surface during execution of a process.
+This operation is available only if the kernel is configured with
+.IR flags
+are 0, this operation is functionally identical to calling
+The recognized
+.IR flags
+When adding a new filter, synchronize all other threads of the current
+process to the same seccomp filter tree. If any thread cannot do this,
+the call will not attach the new seccomp filter, and will fail returning
+the first thread ID found that cannot synchronize.  Synchronization will
+fail if another thread is in
+or if it has attached new seccomp filters to itself, diverging from the
+calling thread's filter tree.
+When adding filters via
+.IR args
+points to a filter program:
+ +4n
+struct sock_fprog {
+    unsigned short      len;    /* Number of filter blocks */
+    struct sock_filter *filter;
+Each program must contain one or more BPF instructions:
+ +4n
+struct sock_filter {    /* Filter block */
+    __u16   code;       /* Actual filter code */
+    __u8    jt;         /* Jump true */
+    __u8    jf;         /* Jump false */
+    __u32   k;          /* Generic multiuse field */
+When executing the instructions, the BPF program executes over the
+syscall information made available via:
+ +4n
+struct seccomp_data {
+    int nr;                     /* system call number */
+    __u32 arch;                 /* AUDIT_ARCH_* value */
+    __u64 instruction_pointer;  /* CPU instruction pointer */
+    __u64 args[6];              /* up to 6 system call arguments */
+A seccomp filter may return any of the following values. If multiple
+filters exist, the return value for the evaluation of a given system
+call will always use the highest precedent value. (For example,
+will always take precedence.)
+In precedence order, they are:
+Results in the task exiting immediately without executing the
+system call.  The exit status of the task (status & 0x7f) will
+Results in the kernel sending a
+signal to the triggering task without executing the system call.
+.IR siginfo\->si_call_addr
+will show the address of the system call instruction, and
+.IR siginfo\->si_syscall
+.IR siginfo\->si_arch
+will indicate which syscall was attempted.  The program counter will be
+as though the syscall happened (i.e. it will not point to the syscall
+instruction).  The return value register will contain an arch\-dependent
+value; if resuming execution, set it to something sensible.
+(The architecture dependency is because replacing it with
+could overwrite some useful information.)
+portion of the return value will be passed as
+.IR si_errno .
+triggered by seccomp will have a
+.IR si_code
+Results in the lower 16-bits of the return value being passed
+to userland as the
+.IR errno
+without executing the system call.
+When returned, this value will cause the kernel to attempt to
+notify a ptrace()-based tracer prior to executing the system
+call.  If there is no tracer present,
+is returned to userland and the system call is not executed.
+A tracer will be notified if it requests
+The tracer will be notified of a
+and the
+portion of the BPF program return value will be available to the tracer
+The tracer can skip the system call by changing the syscall number
+to \-1.  Alternatively, the tracer can change the system call
+requested by changing the system call to a valid syscall number.  If
+the tracer asks to skip the system call, then the system call will
+appear to return the value that the tracer puts in the return value
+The seccomp check will not be run again after the tracer is
+notified.  (This means that seccomp-based sandboxes MUST NOT
+allow use of ptrace, even of other sandboxed processes, without
+extreme care; ptracers can use this mechanism to escape.)
+Results in the system call being executed.
+If multiple filters exist, the return value for the evaluation of a
+given system call will always use the highest precedent value.
+Precedence is only determined using the
+mask.  When multiple filters return values of the same precedence,
+only the
+from the most recently installed filter will be returned.
+On success,
+.BR seccomp ()
+returns 0.
+On error, if
+was used, the return value is the thread ID that caused the
+synchronization failure. On other errors, \-1 is returned, and
+.IR errno
+is set to indicate the cause of the error.
+.BR seccomp ()
+can fail for the following reasons:
+the caller did not have the
+capability, or had not set
+.IR no_new_privs
+before using
+.IR args
+was required to be a valid address.
+.IR operation
+is unknown; or
+.IR flags
+are invalid for the given
+.IR operation
+Another thread caused a failure during thread sync, but its ID could not
+be determined.
+This system call first appeared in Linux 3.16.
+.\" FIXME Add glibc version
+This system call is a nonstandard Linux extensions.
+.BR seccomp ()
+provides a superset of the functionality provided by
+.BR prctl (2) .
+(Which does not support
+.IR flags .)
+.BR prctl (2),
+.BR ptrace (2),
+.BR signal (7)
+.BR socket (7)

Kees Cook                                  

<Prev in Thread] Current Thread [Next in Thread>