Join WhatsApp
Join Now
Join Telegram
Join Now

How to Use Linux perf trace for Deep System Call Analysis

Avatar for Noman Mohammad

By Noman Mohammad

Published on:

Your rating ?

“Why is my server crawling?” – the question that keeps me up at night

I was on-call last July when the alerts started. Page-load times tripled, CPU looked fine, memory looked fine, but users were furious. After two hours of educated guesses—restart services, tweak nginx, scroll through logs—I still had no clue.

Sound familiar?

It turns out 7 out of 10 first “diagnoses” are flat-out wrong. Every extra minute of downtime costs real money (think coffee-budget for a year gone in sixty seconds). The old go-to, strace, is like using a fire hose on a houseplant—massive overhead and the numbers you see aren’t the numbers your app actually sees.

There is a better way, and it’s already on your machine.

Meet perf trace—quiet, low-drama, high-insight

perf trace is part of the perf family that ships with modern Linux kernels. Think of it as a stethoscope that doesn’t wake the patient.

Installation in 30 seconds

sudo apt install linux-tools-generic      # Ubuntu / Debian
sudo dnf install perf                   # Fedora / RHEL

Done. Type perf --version to confirm you’re set.

The commands you’ll actually use

  • Trace a single command
    perf trace ls -la
    See every syscall ls fires and how long each one takes.
  • Attach to a running process
    sudo perf trace -p 1234
    Replace 1234 with the PID that’s misbehaving.
  • System-wide snapshot
    sudo perf trace -a
    Warning: floods the screen—filter first.

Always run with sudo when you’re looking beyond your own processes.

Stop drowning in noise—filter like a pro

Raw output is useless if you can’t read it. Narrow the view:

perf trace -e 'syscalls:sys_enter_open*,syscalls:sys_exit_open*' -p 1234

This zeroes in on every open call your app makes. Add --duration and you get timing for free:

perf trace --duration -p 1234

Anything over, say, 5 ms jumps out like a sore thumb.

Real story—debugging that July outage

Back to my midnight crisis. I attached perf trace to the web-app PID and filtered for file reads:

sudo perf trace -e syscalls:sys_enter_read --duration -p 1122

One line hit me in the face:

1122 read(42, "/var/cache/slow.db-journal", 4096) = 4096 <8.2 ms>

Eight milliseconds for a read? That’s forever. A quick ls showed the journal file had ballooned to 3 GB. Truncating it and moving the cache to SSD fixed the issue in minutes—not hours.

Without perf trace, I’d still be guessing.

Level-up tricks that save even more time

  • Record first, analyze later
    sudo perf record -e sched:* -a -- sleep 10
    sudo perf report
    Great for catching elusive latency spikes.
  • Export to JSON for pretty graphs
    perf script -F time,pid,syscall --input perf.data > trace.json
  • Install debug symbols
    sudo apt install linux-image-$(uname -r)-dbgsym
    Suddenly cryptic addresses turn into readable function names.
  • Increase buffer on noisy servers
    sudo perf trace --mmap-pages 8192 -p 1234

Quick comparison cheat-sheet

Tool Overhead Best for
perf trace Low Production, system-wide, latency hunting
strace High Quick one-off checks on dev boxes
ftrace Ultra-low Kernel developers, kernel internals

Common gotchas and how to dodge them

  • Permission denied?
    sudo setcap cap_perfmon+ep /usr/bin/perf
  • Nothing but hex addresses?
    Install debug packages for your kernel and apps.
  • Container confusion?
    Run perf trace on the host and target the container’s PID namespace.

Still curious? Read the manuals

The official docs are short and sweet:

Next time your system feels sluggish, skip the guesswork. Fire up perf trace, filter for the calls that matter, and watch the real culprit appear in seconds.

You’ll sleep better. I promise.

Leave a Comment