- 1 Watching Your Disk Like a Hawk (Without the Guesswork)
- 2 The Real Price of Blind I/O Debugging
- 3 Meet eBPF: Your New X-Ray Glasses
- 4 Install the Toolchain in 30 Seconds
- 5 Five Commands That End the Guessing Game
- 6 Real Walk-Through: Finding the Naughty Backup Job
- 7 Write Your Own 15-Line Detector
- 8 Quick Wins You Can Try Today
- 9 Frequently Asked Questions
- 10 Your Next Five Minutes
Watching Your Disk Like a Hawk (Without the Guesswork)
Ever sit in front of a slow server and wonder, “Who’s hammering the disk?”
You run iostat. It shows 100 % util. Great.
That’s like your mechanic saying, “Your engine is hot” and then walking away.
Most built-in tools give you averages, not the story.
They hide the exact process, the exact file, the exact moment things went sideways.
The result? Late nights, angry users, and a lot of swearing at dashboards.
The Real Price of Blind I/O Debugging
Last month a friend’s e-commerce site froze during a flash sale.
Revenue dipped 30 % in 15 minutes.
Their dashboards all looked green.
Turned out a logging cron job was dumping megabytes into a temp file every second.
Classic “death by a thousand tiny writes.”
That pain could have been avoided with real-time, per-event data.
Not averages. Not summaries. The raw, unfiltered truth.
Meet eBPF: Your New X-Ray Glasses
eBPF is a tiny, safe program you slip into the Linux kernel.
It watches events as they happen—disk reads, writes, even the moment a process opens a file.
Think of it as a security camera inside the kernel.
Zero reboots. Zero kernel recompiles.
bcc is the toolbox that ships with ready-made eBPF programs.
You just run them like any other command.
Install the Toolchain in 30 Seconds
- Ubuntu / Debian
sudo apt update && sudo apt install bpfcc-tools linux-headers-$(uname -r)
- Fedora / RHEL
sudo dnf install bcc-tools
- Arch
sudo pacman -S bcc
Type sudo biosnoop-bpfcc.
If you see live disk events, you’re ready.
Five Commands That End the Guessing Game
- biosnoop – Who touched the disk last?
TIME(s) COMM PID DISK T BYTES LAT(ms) 0.004 nginx 987 sda R 4096 0.9 0.007 mysqld 234 sdb W 65536 4.2
Spot the 4 ms write? That’s your first clue.
- biotop – Top I/O hogs, updated every 2 s.
sudo biotop-bpfcc 2
- filetop – Which files are getting beaten up?
PID COMM READ_KB WRITE_KB FILE 234 mysqld 0 8192 /var/lib/mysql/ibdata1 987 nginx 256 0 /var/log/access.log
- biolatency – Histogram of latency.
sudo biolatency-bpfcc 5
Shows if 90 % of reads finish in under 1 ms while the tail takes 100 ms.
That tail is where your users cry. - funccount – Count any kernel function.
sudo funccount-bpfcc 'ext4_file_write_iter'
Great for proving “yes, that new code path is being called 10 k times per second.”
Real Walk-Through: Finding the Naughty Backup Job
Picture this: web pages load like molasses.
Step 1 – run biotop.
It shows rsync writing 120 MB/s.
Step 2 – run filetop.
It shows /backups/daily.tar is the target.
Step 3 – run biolatency.
90 % of writes are under 2 ms, but 5 % spike to 200 ms.
That tail latency is caused by the USB external drive used for backups.
Solution? Move the backup to off-peak hours.
Problem solved in 3 commands and 2 minutes.
Write Your Own 15-Line Detector
Need something special? Python + bcc makes it stupid-simple:
from bcc import BPF
program = """
TRACEPOINT_PROBE(block, block_rq_issue) {
u32 pid = bpf_get_current_pid_tgid() >> 32;
bpf_trace_printk("PID %d issued I/O\\n", pid);
return 0;
}
"""
b = BPF(text=program)
print("Tracing... Ctrl-C to stop.")
while True:
try:
(_, _, _, _, ts, msg) = b.trace_fields()
print("%.3f %s" % (ts, msg.decode()))
except KeyboardInterrupt:
break
Run it, and every disk request prints the culprit PID and timestamp.
Tweak the probe and you can trace filesystem calls, network packets, or even scheduler events.
Quick Wins You Can Try Today
- Add
biotop -oto your tmux dashboard. One glance shows the top I/O process. - Schedule
biolatency 60in cron. Log the histogram daily to spot creeping latency. - Hook
filetopinto your CI pipeline. If any test writes more than 500 MB to /tmp, fail the build.
Frequently Asked Questions
- Is eBPF safe on production boxes?
- Yes. The kernel verifier checks every eBPF program before it runs. It either passes or the kernel refuses to load it. No crashes, no panics.
- Do I need to recompile the kernel?
- Nope. Install the bcc package and you’re done.
- How much CPU does this burn?
- Less than 1 % on a busy server. eBPF runs inside the kernel, so no expensive context switches.
- Can I trace network I/O too?
- Sure. Tools like
tcpconnectandtcplifeuse the same engine. Same install, new superpowers. - Is this only for kernel wizards?
- I’m a sysadmin who once thought “kernel” was a swear word.
If you can runtop, you can use these tools.
Your Next Five Minutes
Install bcc.
Run sudo biosnoop-bpfcc.
Watch the disk chatter in real time.
I bet you’ll spot something surprising before your coffee cools.
Stop guessing. Start seeing.







