Real-Time I/O Heatmaps with eBPF and bcc to Spot Storage Bottlenecks

Published on: 17/08/2025

Your rating ?

1 Stop Guessing Where Your Disk is Dying
2 Why iotop Lies to You
3 Install It in Three Commands
- 3.1 Ubuntu / Debian
- 3.2 RHEL / CentOS / Alma
4 Your First Five Minutes
5 Turn Raw Lines into a Picture
6 Reading the Pain
7 Real-World Fix in 11 Minutes
8 Next Steps

Stop Guessing Where Your Disk is Dying

I once watched a team spend two weeks tuning their MySQL server. They tweaked every buffer, index, and cache they could find. Yet every night at 2:17 a.m. the queries crawled. CPU was fine, RAM was fine, disk usage only 40 %. They were chasing phantoms.

The real culprit? A single nightly backup job that flushed the RAID cache and turned the storage layer into molasses. Classic story, right? The twist: I found the spike in eleven minutes once I pointed biosnoop at the box.

Today I’ll show you the simple setup I used—no PhD in kernel internals required.

Why iotop Lies to You

Old tools like iostat or iotop give you one number: “disk busy 35 %”. That’s like saying “the highway is 35 % full” while ignoring the mile-long pile-up causing the traffic jam.

What we need is a per-operation movie, not a single snapshot. That’s where eBPF comes in. Think of it as strapping a GoPro to every read and write that hits your kernel.

Zero extra load. Your app won’t even notice.
Per-process view. Spot the exact PID that’s hammering the disk.
Microsecond resolution. You’ll see the 9 ms spike the old tools round down to zero.

Install It in Three Commands

Ubuntu / Debian

sudo apt install bpfcc-tools linux-headers-$(uname -r)

RHEL / CentOS / Alma

sudo yum install bcc-tools kernel-devel-$(uname -r)

That’s it. If you’re on a recent kernel (4.1+) you’re ready to roll.

Your First Five Minutes

Open two terminal windows.

In the first, run:

sudo biosnoop > io.log

In the second, start your slow job: a backup, a batch import, whatever. Let it run for thirty seconds then kill biosnoop with Ctrl-C.

You now have a file that looks like:

TIME           PID COMM           DISK T  SECTOR    BYTES   LAT(ms)
19:02:01.123   892 mysqld         sdb  W  88172664  4096    47.92
19:02:01.124   892 mysqld         sdb  W  88172672  4096    48.11
...

Every row is a single disk operation. PID, disk, latency, timestamp. No guessing.

Turn Raw Lines into a Picture

Numbers are boring. Colors are fast. Paste this tiny Python script (I keep it as heatmap.py):

import pandas as pd, seaborn as sns, matplotlib.pyplot as plt

df = pd.read_csv('io.log', delim_whitespace=True, parse_dates=['TIME'])
df['bucket'] = df['TIME'].dt.floor('2s')
pivot = df.pivot_table(index='bucket', columns='DISK', values='LAT(ms)', aggfunc='mean')

plt.figure(figsize=(12,4))
sns.heatmap(pivot, cmap='RdYlBu_r', linewidths=.5)
plt.title('Disk Latency Heatmap (darker = slower)')
plt.tight_layout()
plt.savefig('io_heatmap.png')

Run:

python3 heatmap.py

Open io_heatmap.png. Dark red stripes? That’s pain. Bright blue? All good.

Reading the Pain

Vertical red streak on sdb at 02:17–02:19? That’s your backup.
Diagonal red line? Sequential scan that’s turned random—index missing its cache.
Single bright cell? One process doing a huge synchronous write. Probably logging.

Overlay the heatmap with your cron schedule. You’ll see the match in seconds.

Real-World Fix in 11 Minutes

Back to that MySQL story:

I ran biosnoop for one backup cycle.
The heatmap lit up sdb exactly at 02:17.
Latency jumped from 1 ms to 50 ms for the entire window.
We moved the backup to 04:00 and added ionice -c 3.
Problem gone. Two weeks of tuning avoided.

Sometimes the fastest optimization is not running the wrong job at the wrong time.

Next Steps

You’re already faster than iostat. To go deeper:

Filter by PID: sudo biosnoop -p $(pgrep mysqld)
Live view in the terminal: sudo biolatency -m 1
Cron it: Dump daily logs and auto-mail the heatmap.

Storage mysteries hate sunlight. Shine the eBPF flashlight and they disappear.

Resources (no fluff, just links):

advanced linux commands

Real-Time I/O Heatmaps with eBPF and bcc to Spot Storage Bottlenecks

Stop Guessing Where Your Disk is Dying

Why iotop Lies to You

Install It in Three Commands

Ubuntu / Debian

RHEL / CentOS / Alma

Your First Five Minutes

Turn Raw Lines into a Picture

Reading the Pain

Real-World Fix in 11 Minutes

Next Steps

Linux for AI/ML: Running Stable Diffusion with an AMD GPU on Linux

Time-Series Monitoring on Linux: Setting Up Prometheus Node Exporter

Exploring Lesser-Known Distros: Guix, Nix, and PureOS Deep Dives

Leave a Comment Cancel reply

Noman Mohammad

Latest Post

Follow Us

Quick Links

Categories

Follow Us

Real-Time I/O Heatmaps with eBPF and bcc to Spot Storage Bottlenecks

Stop Guessing Where Your Disk is Dying

Why iotop Lies to You

Install It in Three Commands

Ubuntu / Debian

RHEL / CentOS / Alma

Your First Five Minutes

Turn Raw Lines into a Picture

Reading the Pain

Real-World Fix in 11 Minutes

Next Steps

Related Posts

Leave a Comment Cancel reply

Noman Mohammad

Latest Post

Follow Us

Quick Links

Categories

Follow Us