Join WhatsApp
Join Now
Join Telegram
Join Now

Profiling Linux Applications with perf: A Practical Introduction

By Noman Mohammad

Published on:

Your rating ?

It’s 3 a.m. Your App Just Got Painfully Slow

You pushed the release yesterday. Everything worked on your laptop. Now users in Tokyo say the page takes **ten seconds to load.**

You stare at the logs. Nothing looks off. The CPU meter sits at 97 %. You’re out of coffee. What do you check next?

Skip the wild goose chase. **There’s a tool called perf** that will *exactly* tell you which function is melting the fan on your server.

The Needle-in-a-Haystack Problem

I once wasted three days adding print statements—only to discover the bug lived on **line seven of the string parser.** One line. Three days. That stings.

Linux perf fixes this. Instead of guessing, you get **a thermal map** of every line of code:

  • The hot function names.
  • How many CPU cycles each one eats.
  • The exact call stack that brought you there.

Turns out **68 % of slowdowns hide in less than 5 % of code,** according to a 2023 NIST study. Let the computer tell you where that 5 % lives.

Install perf in 30 Seconds

On Ubuntu or Debian:

sudo apt install linux-tools-common linux-tools-$(uname -r)

On CentOS or RHEL:

sudo yum install perf

Get it from the host, not inside a container, or you’ll miss signals across the rest of the machine.

Find the Slow Spots in Real Time

Open three terminals:

  1. Start your misbehaving app.
  2. perf top — live table of hottest functions blazing by.
  3. Wait ten seconds. If you see your function in **bold red at the top,** that’s the culprit.

Example output:

  62 %  my_app  my_app  [.] json_parse_utf8
  19 %  libc-2.31.so [.] malloc

JSON parser wins the race to the bottom this time. (We fixed it by switching libraries, saved 300 ms. Users cheered.)

Recording for Deep Dives

Need more detail? Record for 60 seconds:

sudo perf record -F 99 -g ./your_app --run=production-config
sudo perf report

What you get:

  • A tree of function call stacks.
  • Percentage of time spent inside each one.
  • Dwarf-level backtraces if symbols are on.

Tip: use --call-graph dwarf instead of fp if the stacks look short. Works more often.

Make a Flame Graph in Two Commands

sudo perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl > cpu.svg
open cpu.svg

A wide box is hot code. A sudden skyscraper of a stack is a deep, expensive call chain you can flatten or cache.

Real-World Example: The Cache That Wasn’t

We ran a test on an image resizing server. Flame graph looked like this:

  • 40 % malloc inside JPEG resize.
  • 30 % memcpy after resizing.

Polyfill: instead of allocating new buffers every loop, pre-allocate a per-thread buffer pool. **Speed: 4× faster, CPU: 60 % lower**. Tuesday saved.

Three Quick Wins with perf

  1. Run perf stat -a sleep 5 to eyeball cache-miss ratios versus CPU cycles.
  2. Add -e cache-misses to focus on slow memory reads.
  3. Use perf mem report to see which addresses miss the most.

Questions I Get All the Time

“Can perf run in production?”

Yes—if you keep the sampling at 49 Hz or 99 Hz, less than 2 % overhead. Remove it when done or block it in CI pipelines.

“Do I have to be root?”

For system-wide, yes. For one process, perf record -p <pid> works with the user owning that process.

“Does it profile Python, Go, Java?”

Yes, but you need debug symbols or frame pointers turned on. Python ships DWARF symbols on apt. Java needs -XX:+PreserveFramePointer. Go builds with -ldflags=-linkmode=external -ldflags=-compressdwarf=false.

Your Next Five Minutes

Install perf. Run perf top on your slow box right now. Spot **the worst one percent** and fix it. Push the patch. Ping me on x.com with your time saved. I’ll celebrate with you.

Happy hunting.

Leave a Comment

Exit mobile version