Using eBPF (bcc/bpftrace) for Deep Linux Performance Analysis

Published on: 27/08/2025

Your rating ?

1 Why eBPF Feels Like Having X-Ray Vision For Your Linux Box
2 Traditional vs. eBPF – A 60-second Comparison
3 My 15-minute Start-Up Routine Anytime Something Feels Sluggish
4 Two Mini Case Studies (Copy-paste to Try)
- 4.1 Case 1 – Finding the Chatty Container
- 4.2 Case 2 – Unexplained TCP Retransmits
5 BCC or bpftrace – Which to Reach For?
6 Quick Safety and Setup Notes
7 Cheat-Sheet of My Top 5 Tools
8 One Last Thought

Why eBPF Feels Like Having X-Ray Vision For Your Linux Box

Ever watched a server grind to a crawl and thought, *what on Earth is it doing in there?*
Old tools like top or strace give you the **what**, but rarely the **why**.
I hit this wall last year when a customer’s database started stalling every few minutes.
perf said *kernel time – 78 %*. Nice, but where inside the kernel?
A friend nudged me toward eBPF. Two hours later I was staring at the exact line of kernel code that held a spin-lock too long. **Problem fixed before dinner.**

eBPF is basically a tiny, super-fast virtual machine that lives inside the Linux kernel.
It lets us drop little probes in there **while the machine is running**.
Two easy ways to talk to it are:

BCC – big toolbox written in Python/C: loads of ready-made commands.
bpftrace – mini scripting language for “explain this weird blip **now**”.

Traditional vs. eBPF – A 60-second Comparison

Classic profiler
Collect stack traces → dump 5 MB/s to disk → crunch for ten minutes → maybe find the bottleneck.

eBPF one-liner
Count how many times every process hits a slow path function, **live in RAM**, zero disk IO. Ctrl-C to print a table, done.

That order-of-magnitude reduction in effort? It changes how you think about debugging.

My 15-minute Start-Up Routine Anytime Something Feels Sluggish

Run sudo biolatency-bpfcc 1Shows a histogram of disk latency every second. Quick eyeball test for “disk is trashing”.


If disk is clean, try:

sudo execsnoop-bpfcc

Tells me which new commands just spawn. Often it’s a rogue cron job or healthcheck script.
Still no clue?

sudo bpftrace -e 'profile:hz:99 { @[kstack] = count(); }'

Samples every running CPU 99 times a second and prints the hottest kernel stacks.

I see a wall of *spinlock*, notice it’s the same file system code line every time – SMR disk firmware bug.


Zero restarts, zero downtime, answers in under a minute.
Two Mini Case Studies (Copy-paste to Try)
Case 1 – Finding the Chatty Container
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_openat /container_id != 0/
                 { @[comm, container_id] = count(); }'
I pipe the container ID into the filter. Within 30 seconds I spot the log spitter that opens /var/log/debug.log 42,000 times an hour.
Case 2 – Unexplained TCP Retransmits
sudo bpftrace -e 'kprobe:tcp_retransmit_skb {
  @retransmits[comm] = count();
  @total = count();
}'
A single Go binary pops at 5 % of all retransmits. Turns out the dev forgot to set GSO offloading. Fixing that cut latency by 25 ms at the 95-th percentile.
BCC or bpftrace – Which to Reach For?
BCC** when I need a reusable one-bin tool. Example: I always keep tcptop aliased so I can see, by connection, who is chewing bandwidth in real time.

bpftrace** when the problem is new, weird, and small. One Friday I randomly traced brk syscalls inside elasticsearch to prove the JVM wasn’t resizing the heap after all – it was a transparent hugepage compaction issue instead.

Quick Safety and Setup Notes
Kernel check?

uname -r

If it’s 5.x or newer, you’re golden. 4.x maybe needs backports.
Install chain (Ubuntu/Debian one-liner)

sudo apt-get install bpfcc-tools linux-headers-$(uname -r)
That’s it. No recompilation, no kernel modules.
Cheat-Sheet of My Top 5 Tools

opensnoop-bpfcc – see every file open call in real time
biolatency-bpfcc -Q – disk delay, split by individual spindle
execsnoop-bpfcc – catch short-lived processes
tcplife – lifespan and traffic of each TCP flow
profile-bpfcc -F 99 -adf – full system stack flame graph

Pin those to an alias, and you’ve got a portable MRI for almost any Linux box.
One Last Thought
eBPF isn’t some next-gen magic only kernel hackers should touch.

It’s more like strace got supercharged and moved to kernel mode.

The first time you find a 3-line script that saves you a 2-hour outage, you’ll never **not** have eBPF in your back pocket.
Useful links:

BCC Documentation
bpftrace Reference Guide
Brendan Gregg’s eBPF Blog (the textbook on real-world tricks)
eBPF.io

Go grab one command, run it, and see what surprises your server has to show you tonight.





advanced linux commands

Using eBPF (bcc/bpftrace) for Deep Linux Performance Analysis

Why eBPF Feels Like Having X-Ray Vision For Your Linux Box

Traditional vs. eBPF – A 60-second Comparison

My 15-minute Start-Up Routine Anytime Something Feels Sluggish

Two Mini Case Studies (Copy-paste to Try)

Case 1 – Finding the Chatty Container

Case 2 – Unexplained TCP Retransmits

BCC or bpftrace – Which to Reach For?

Quick Safety and Setup Notes

Cheat-Sheet of My Top 5 Tools

One Last Thought

Linux for AI/ML: Running Stable Diffusion with an AMD GPU on Linux

Time-Series Monitoring on Linux: Setting Up Prometheus Node Exporter

Exploring Lesser-Known Distros: Guix, Nix, and PureOS Deep Dives

Leave a Comment Cancel reply

Noman Mohammad

Latest Post

Follow Us

Quick Links

Categories

Follow Us

Using eBPF (bcc/bpftrace) for Deep Linux Performance Analysis

Why eBPF Feels Like Having X-Ray Vision For Your Linux Box

Traditional vs. eBPF – A 60-second Comparison

My 15-minute Start-Up Routine Anytime Something Feels Sluggish

Two Mini Case Studies (Copy-paste to Try)

Case 1 – Finding the Chatty Container

Case 2 – Unexplained TCP Retransmits

BCC or bpftrace – Which to Reach For?

Quick Safety and Setup Notes

Cheat-Sheet of My Top 5 Tools

One Last Thought

Related Posts

Leave a Comment Cancel reply

Noman Mohammad

Latest Post

Follow Us

Quick Links

Categories

Follow Us