Join WhatsApp
Join Now
Join Telegram
Join Now

Using eBPF (bcc/bpftrace) for Deep Linux Performance Analysis

Avatar for Noman Mohammad

By Noman Mohammad

Published on:

Your rating ?

Why eBPF Feels Like Having X-Ray Vision For Your Linux Box

Ever watched a server grind to a crawl and thought, *what on Earth is it doing in there?*
Old tools like top or strace give you the **what**, but rarely the **why**.
I hit this wall last year when a customer’s database started stalling every few minutes.
perf said *kernel time – 78 %*. Nice, but where inside the kernel?
A friend nudged me toward eBPF. Two hours later I was staring at the exact line of kernel code that held a spin-lock too long. **Problem fixed before dinner.**

eBPF is basically a tiny, super-fast virtual machine that lives inside the Linux kernel.
It lets us drop little probes in there **while the machine is running**.
Two easy ways to talk to it are:

  • BCC – big toolbox written in Python/C: loads of ready-made commands.
  • bpftrace – mini scripting language for “explain this weird blip **now**”.

Traditional vs. eBPF – A 60-second Comparison

Classic profiler
Collect stack traces → dump 5 MB/s to disk → crunch for ten minutes → maybe find the bottleneck.

eBPF one-liner
Count how many times every process hits a slow path function, **live in RAM**, zero disk IO. Ctrl-C to print a table, done.

That order-of-magnitude reduction in effort? It changes how you think about debugging.

My 15-minute Start-Up Routine Anytime Something Feels Sluggish

  1. Run sudo biolatency-bpfcc 1
    Shows a histogram of disk latency every second. Quick eyeball test for “disk is trashing”.
  2. If disk is clean, try:
    sudo execsnoop-bpfcc
    Tells me which new commands just spawn. Often it’s a rogue cron job or healthcheck script.
  3. Still no clue?
    sudo bpftrace -e 'profile:hz:99 { @[kstack] = count(); }'
    Samples every running CPU 99 times a second and prints the hottest kernel stacks.
    I see a wall of *spinlock*, notice it’s the same file system code line every time – SMR disk firmware bug.

Zero restarts, zero downtime, answers in under a minute.

Two Mini Case Studies (Copy-paste to Try)

Case 1 – Finding the Chatty Container

sudo bpftrace -e 'tracepoint:syscalls:sys_enter_openat /container_id != 0/
                 { @[comm, container_id] = count(); }'

I pipe the container ID into the filter. Within 30 seconds I spot the log spitter that opens /var/log/debug.log 42,000 times an hour.

Case 2 – Unexplained TCP Retransmits

sudo bpftrace -e 'kprobe:tcp_retransmit_skb {
  @retransmits[comm] = count();
  @total = count();
}'

A single Go binary pops at 5 % of all retransmits. Turns out the dev forgot to set GSO offloading. Fixing that cut latency by 25 ms at the 95-th percentile.

BCC or bpftrace – Which to Reach For?

BCC** when I need a reusable one-bin tool. Example: I always keep tcptop aliased so I can see, by connection, who is chewing bandwidth in real time.

bpftrace** when the problem is new, weird, and small. One Friday I randomly traced brk syscalls inside elasticsearch to prove the JVM wasn’t resizing the heap after all – it was a transparent hugepage compaction issue instead.

Quick Safety and Setup Notes

Kernel check?
uname -r
If it’s 5.x or newer, you’re golden. 4.x maybe needs backports.

Install chain (Ubuntu/Debian one-liner)
sudo apt-get install bpfcc-tools linux-headers-$(uname -r)

That’s it. No recompilation, no kernel modules.

Cheat-Sheet of My Top 5 Tools

  • opensnoop-bpfcc – see every file open call in real time
  • biolatency-bpfcc -Q – disk delay, split by individual spindle
  • execsnoop-bpfcc – catch short-lived processes
  • tcplife – lifespan and traffic of each TCP flow
  • profile-bpfcc -F 99 -adf – full system stack flame graph

Pin those to an alias, and you’ve got a portable MRI for almost any Linux box.

One Last Thought

eBPF isn’t some next-gen magic only kernel hackers should touch.
It’s more like strace got supercharged and moved to kernel mode.
The first time you find a 3-line script that saves you a 2-hour outage, you’ll never **not** have eBPF in your back pocket.

Useful links:

Go grab one command, run it, and see what surprises your server has to show you tonight.

Leave a Comment