Join WhatsApp
Join Now
Join Telegram
Join Now

Practical eBPF: Using bpftool to Monitor Per-Process Network Latency on Linux

Avatar for Noman Mohammad

By Noman Mohammad

Published on:

Your rating ?

Why your 3 AM nightmare keeps happening

Picture this. It’s 3:17 AM. Your phone is buzzing. Slack is on fire. Users are furious. Your dashboards? All green.

Been there. Last winter I spent four hours chasing a “phantom” latency spike that cost my team thousands in lost sales. The culprit? A single Python microservice doing batch uploads every 15 minutes. Traditional tools showed normal traffic patterns. Meanwhile, our checkout flow crawled.

Here’s what they don’t tell you: 60% of network latency hides where most tools can’t see it. A USENIX study proved this. One bad process. That’s all it takes.

The tools aren’t broken. They’re blind.

I used to love iftop. Thought it was magic. Then I realized…

It shows interfaces. Not processes. It’s like having a city’s traffic report when you need to know which specific driver keeps blocking the bridge.

What happens next is predictable:

  • We restart services randomly
  • Add more servers (expensive guesswork)
  • Watch users leave when 3-second delays become minutes

Sound familiar?

Meet your new best friend: bpftool

Think of eBPF as X-ray vision for your network. bpftool is the remote control.

Best part? It’s already on your Linux box. Free. No vendors. No sales calls.

Let’s set this up in 10 minutes

Step 1: Check your kernel

grep BPF /proc/filesystems

See nodev bpf? You’re golden. If not, grab coffee and update your kernel.

Step 2: Install bpftool

# Ubuntu/Debian
sudo apt install linux-tools-common linux-tools-$(uname -r)

# RHEL/CentOS
sudo yum install bpftool

Step 3: The actual magic

Create latency.c:

#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>

struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 4096);
    __type(key, u32);
    __type(value, u64);
} latency_map SEC(".maps");

SEC("kprobe/tcp_sendmsg")
int BPF_KPROBE(tcp_sendmsg_entry)
{
    u64 start_time = bpf_ktime_get_ns();
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    bpf_map_update_elem(&latency_map, &pid, &start_time, BPF_ANY);
    return 0;
}

SEC("kretprobe/tcp_sendmsg")
int BPF_KRETPROBE(tcp_sendmsg_exit)
{
    u64 end_time = bpf_ktime_get_ns();
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    u64 *start_time = bpf_map_lookup_elem(&latency_map, &pid);
    if (start_time) {
        u64 latency = end_time - *start_time;
        bpf_printk("PID %d latency: %llu ns", pid, latency);
        bpf_map_delete_elem(&latency_map, &pid);
    }
    return 0;
}

Step 4: Compile and run

clang -O2 -target bpf -c latency.c -o latency.o

sudo bpftool prog load latency.o /sys/fs/bpf/latency
sudo bpftool perf attach /sys/fs/bpf/latency kprobe/tcp_sendmsg
sudo bpftool perf attach /sys/fs/bpf/latency kretprobe/tcp_sendmsg

Step 5: Watch the culprits

sudo cat /sys/kernel/debug/tracing/trace_pipe

You’ll see lines like:

PID 2341 latency: 125000 ns (0.125ms)
PID 5678 latency: 890000 ns (0.89ms)

Making it actually useful

Raw numbers are nice. Context is better.

I pipe this to a simple script that:

  • Maps PIDs to service names
  • Tracks 95th percentile latency
  • Alerts when any service hits 5ms+

The difference? Instead of guessing, I know exactly which container needs attention.

Real-world wins

Last month, this caught a logging service that spiked to 200ms every 5 minutes. Turned out someone enabled debug mode. Fixed in 30 seconds.

Another time, it exposed a Redis client that wasn’t pooling connections. Saved us from a $12k/month over-provision.

Beyond the basics

Once you’re comfortable:

  • Swap tcp_sendmsg for udp_sendmsg to catch UDP issues
  • Add BPF_MAP_TYPE_PERCPU_ARRAY for better performance at scale
  • Set latency thresholds to reduce noise

Remember: Every millisecond you save is a millisecond your users don’t wait.

Your turn

Try this on a test server first. Run a few curl commands. Watch the output. Then imagine having this running 24/7.

The 3 AM calls? They become 3 PM coffee breaks.

Questions? Hit me up. I’ve got the scars to prove this works.

Leave a Comment