Join WhatsApp
Join Now
Join Telegram
Join Now

Sandboxing Linux Applications with Seccomp Filters

By Noman Mohammad

Published on:

Your rating ?

Why a single bad call can crash your whole server

Let me tell you about the day everything went wrong. My friend runs a small web shop on an Ubuntu box. One rogue Node plugin made a single syscall. One bad permission later, his server joined a botnet.

All this could’ve been stopped with a tiny Linux feature called Seccomp.

What actually is Seccomp?

Think of a bouncer at a club. Seccomp stands between your app and the operating system. It checks every syscall.

  • Need read()? Fine, go in.
  • Want reboot(), mount(), or ptrace()? Nope. Denied.

Total lockdown in two lines of C. Crazy.

Two flavors: strict vs. filter

Strict mode is nuclear. Only the base four syscalls survive:

- read
- write
- exit
- sigreturn

Filter mode lets you write tiny rules. A pick-and-choose list, written with an old packet-filter language called BPF.

I tried strict mode once. The SSH daemon promptly died because it couldn’t open log files. Filter mode is the sweet spot.

The 5-minute setup with libseccomp

No one wants to write raw BPF. Grab libseccomp and you’ll have a filter—even if you barely C.

#include <seccomp.h>
int main() {
    scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_ALLOW);
    // No need for openat()
    seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM),
                     SCMP_SYS(openat), 0);
    seccomp_load(ctx);
    seccomp_release(ctx);
    // Your app now boots openat() above the knee
    openat(AT_FDCWD, "/tmp/hello", O_CREAT);
}

If it builds, run:

gcc demo.c -lseccomp && ./a.out

You’ll get a friendly “Operation not permitted.” Boom. Locked.

Seccomp and containers: a match made in heaven

Docker ships with a default profile that whacks 44 syscalls. I didn’t know this until I missed `fstatat64` inside a Rust build. Adding a custom profile fixed it in ten seconds.

Create `allow.json`:

{
  "defaultAction": "SCMP_ACT_ERRNO",
  "syscalls": [
    {
      "names": ["read", "write", "fstatat64"],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}
docker run --security-opt seccomp=allow.json my-rust-image

My 3-step checklist

Before typing a single seccomp line:

  1. Arm yourself with strace

    `strace -ff -e trace=all ./app 2>&1 | tee calls.log`

    Read it like a grocery list.
  2. Tune, don’t ban everything

    If you kill a syscall your web framework needs, the whole thing dies on startup. Been there.
  3. Test on staging first

    I once denied mkdir() by accident. CI passed, users saw 500 errors.

Common gotchas

  • ARM vs x86—syscall numbers wander. Use libseccomp to resolve names: `seccomp_syscall_resolve_name(“openat”)`.
  • Threads—filters stick per-thread. Apply before you spawn workers.
  • Over-restriction—If the app needs to `bind()` for DNS, let it. Security is useless if the app won’t run.

Too lazy for C?

Use Firejail. One-liner:

firejail --seccomp firefox

Done. Firefox runs with a hardened profile under your current X session.

Wrapping up

Seccomp isn’t magic. It’s one wall in a layered defense. Bind it with namespaces, cgroups, and capabilities and you get a fortress.

Need more? The official kernel docs helped me understand BPF jump offsets. And Docker’s guide gives real-world profiles you can tweak today.

Give it ten minutes. Your server—and whoever depends on it—will sleep better tonight.

Leave a Comment

Exit mobile version