[Top] [All Lists]

Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole

To: Roland McGrath <>
Subject: Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
From: Markus Gutschke (顧孟勤) <>
Date: Thu, 7 May 2009 01:01:21 -0700
Cc: Ingo Molnar <>, Linus Torvalds <>, Andrew Morton <>,,,,,,
Dkim-signature: v=1; a=rsa-sha1; c=relaxed/relaxed;; s=beta; t=1241683284; bh=FkwDFjbNTpPqDB7uxbxMXybST3k=; h=DomainKey-Signature:MIME-Version:In-Reply-To:References:Date: Message-ID:Subject:From:To:Cc:Content-Type: Content-Transfer-Encoding:X-System-Of-Record; b=hVr5TaDeNaC5mCWsdp WwoDU+5xcfid4V0LgYwQULxPAs9/r0Uv5mtID2MvzRka6cDoFKBPlLaUJu7/v89N8Fg Q==
Domainkey-signature: a=rsa-sha1; s=beta;; c=nofws; q=dns; h=mime-version:in-reply-to:references:date:message-id:subject:from:to: cc:content-type:content-transfer-encoding:x-system-of-record; b=P9hY1gy9V71P4o+gsW8ZHu2QTENEskCMUgojNfS5D3b+yF9g44b7mZhHj26mNYShw yrYvmRzHwoz+FQ/IQSWcg==
In-reply-to: <>
Original-recipient: rfc822;
References: <> <> <alpine.LFD.2.00.0902271932520.3111@localhost.localdomain> <alpine.LFD.2.00.0902271948570.3111@localhost.localdomain> <> <alpine.LFD.2.00.0902280916470.3111@localhost.localdomain> <> <> <> <>
On Thu, May 7, 2009 at 00:03, Roland McGrath <> wrote:
> That is not a "ptrace problem" per se at all.  It's an intrinsic problem
> with any method based on "generic" syscall interception, if the filtering
> and enforcement decisions depend on examining user memory.

Yes, this is indeed the main problem that we are aware of. It can be
avoided by suspending all threads during user memory inspection, but
that's a horrible price to pay (also: see below for an alternative
approach, that could in principle be adapted to use with ptrace)

> The only reason seccomp does not have this "reliability problem" is that
> its filtering is trivial and depends only on registers (in fact, only on
> one register, the syscall number).

Simplicity is really the beauty of seccomp. It is very easy to verify
that it does the right thing from a security point of view, because
any attempt to call unsafe system calls results in the kernel
terminating the program. This is much preferable over most ptrace
solutions which is more difficult to audit for correctness.

The downside is that the sandbox'd code needs to delegate execution of
most of its system calls to a monitor process. This is slow and rather
awkward. Although due to the magic of clone(), (almost) all system
calls can in fact be serialized, sent to the monitor process, have
their arguments safely inspected, and then executed on behalf of the
sandbox'd process. Details are tedious but we believe they are
solvable with current kernel APIs.

The other issue is performance. For system calls that are known to be
safe, we would rather not pay the penalty of redirecting them. A
kernel patch that made seccomp more efficient for these system calls
would be very welcome, and we will post such a patch for discussion

> If you want to do checks that depend on shared or volatile state, then
> syscall interception is really not the proper mechanism for you.

We agree that syscall interception is a poor abstraction level for a
sandbox. But in the short term, we need to work with the APIs that are
available in today's kernels. And we believe that seccomp is one of
the more promising API that are currently available to us.


<Prev in Thread] Current Thread [Next in Thread>