eBPF Dive
🧑🚀 published on Fri Mar 27 2026 · 7 min read
Ever wanted to run your own code inside the Linux kernel without writing a kernel module, without rebooting, and without breaking everything? That’s eBPF.
This post is a practical intro. I’ll cover how eBPF programs actually work, walk through a real example, explain the two ways to get data out of the kernel, and show how the userspace side ties it all together. At the end I’ll touch on the network security toolkit I’ve been building on top of this.
What is eBPF
eBPF stands for extended Berkeley Packet Filter. The name is historical baggage. Today it has nothing to do with packet filtering specifically. It’s a virtual machine embedded in the Linux kernel that lets you load and run sandboxed programs at specific hook points, without touching kernel source code.
The killer property is that it’s safe by design. Before your program runs, the kernel passes it through a verifier that checks for things like out-of-bounds memory access, unbounded loops, and null pointer dereferences. If it doesn’t pass, it doesn’t load. No crashes, no kernel panics.
Once verified, the JIT compiler converts the BPF bytecode to native machine code and attaches it to a hook. From that point on, every time that hook fires, your code runs at near-native speed, inside the kernel.
Hook points
eBPF programs don’t run on their own. They attach to events. The main ones you’ll use in practice:
- kprobe / kretprobe: fires on entry or return of any kernel function
- uprobe / uretprobe: same, but for userspace functions (think
SSL_write) - tracepoint: stable, documented kernel trace events
- XDP: packet processing at the NIC, before the kernel networking stack even sees the packet
- TC hook: traffic control, slightly later in the network path
The hook type you pick determines what context your program gets and what it’s allowed to do.
A real example: tracing execve
Let’s walk through something concrete. Every time a process executes a binary,
the kernel calls sys_execve. We’ll attach a kprobe to it and log the PID,
UID, and process name.
The eBPF program lives in two files: the kernel-side BPF C code, and the userspace loader that loads it and reads the output.
Kernel side
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
struct event {
__u32 pid;
__u32 uid;
char comm[16];
};
struct {
__uint(type, BPF_MAP_TYPE_RINGBUF);
__uint(max_entries, 1 << 24);
} events SEC(".maps");
SEC("kprobe/sys_execve")
int trace_execve(struct pt_regs *ctx)
{
struct event *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
if (!e)
return 0;
e->pid = bpf_get_current_pid_tgid() >> 32;
e->uid = bpf_get_current_uid_gid() & 0xffffffff;
bpf_get_current_comm(&e->comm, sizeof(e->comm));
bpf_ringbuf_submit(e, 0);
return 0;
}
char LICENSE[] SEC("license") = "GPL";
A few things to notice. The function signature takes struct pt_regs *ctx.
That’s the CPU register state at the moment the hook fired. It’s how the kernel
passes context into your program. The SEC() macro tells the compiler which ELF
section to put this symbol in, which is how the loader knows what type of program
it is and where to attach it.
The bpf_ringbuf_reserve / bpf_ringbuf_submit pair is how we ship data to
userspace. More on that below.
How context flows from kernel to eBPF
When a kprobe fires, the kernel saves the full CPU register state into a
pt_regs struct and hands your BPF program a pointer to it. That struct is your
window into what was happening at the exact moment the hook fired.
sys_execve(filename, argv, envp)
|
CPU registers:
rdi = filename ← PT_REGS_PARM1(ctx)
rsi = argv ← PT_REGS_PARM2(ctx)
rdx = envp ← PT_REGS_PARM3(ctx)
|
pt_regs struct
|
your BPF program receives *ctx
From there you can read the function’s arguments using macros like
PT_REGS_PARM1(ctx), PT_REGS_PARM2(ctx), etc. These are architecture-aware
so they map to the right registers on x86-64, arm64, and so on.
One critical rule: if any argument is a pointer into userspace memory, you cannot
dereference it directly. The verifier will reject it. You have to use
bpf_probe_read_user() to safely copy the data into your BPF stack. Kernel
pointers you can access directly through the CO-RE helpers (more on that below).
Reading function arguments directly
The pt_regs approach works, but there’s a cleaner way. The BPF_KPROBE macro
lets you declare your BPF function with the exact same signature as the kernel
function you’re hooking.
Take tcp_sendmsg. In the kernel it looks like this:
int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
With BPF_KPROBE you just mirror that signature directly:
SEC("kprobe/tcp_sendmsg")
int BPF_KPROBE(tcp_sendmsg, struct sock *sk, struct msghdr *msg, size_t size)
{
struct tcp_event ev = {};
if (fill_event(&ev, sk, size, EVENT_TX) < 0)
return 0;
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &ev, sizeof(ev));
return 0;
}
At the moment tcp_sendmsg is called, those arguments, sk, msg, size, are
live in CPU registers. BPF_KPROBE expands into the boilerplate that reads
them from pt_regs and hands them to your function as proper typed C variables.
So sk is a real struct sock * pointing to the socket that’s sending data.
You can pull fields off it directly, source and destination IP, ports, socket
state, whatever you need. size tells you exactly how many bytes are being sent
in this call.
This is the core idea: you hook a function, you get its arguments, you inspect the kernel’s own live data structures at the exact moment they’re being used. No guessing at offsets, no manual register math.
The only rule remains: if an argument points to userspace memory, use
bpf_probe_read_user(). Kernel structs like struct sock * you can access
directly, but keep reading for why that works reliably across kernel versions.
CO-RE: write once, run anywhere
Here’s a problem. Kernel structs like struct sock change between kernel
versions. A field might be at offset 24 in kernel 5.15 and offset 32 in 6.1.
If you hardcode struct offsets at compile time, your BPF program breaks the
moment someone runs it on a different kernel.
CO-RE (Compile Once, Run Everywhere) solves this. It’s a combination of three things working together:
BTF (BPF Type Format) is debug information the kernel embeds about its own
types, every struct, every field, every offset. Modern kernels ship with BTF
built in, exposed at /sys/kernel/btf/vmlinux.
vmlinux.h is a single header generated from that BTF data. Instead of
including dozens of kernel headers, you include one file that has the exact type
definitions for the kernel you’re running on.
bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
CO-RE relocations are annotations the compiler embeds in your .o file.
When libbpf loads your program, it reads those annotations, checks the target
kernel’s BTF, and rewrites the field offsets to match. Your program compiled on
kernel 5.15 loads correctly on kernel 6.1 because libbpf fixed the offsets at
load time.
In practice it looks like this:
#include "vmlinux.h"
#include <bpf/bpf_core_read.h>
SEC("kprobe/tcp_sendmsg")
int BPF_KPROBE(tcp_sendmsg, struct sock *sk, struct msghdr *msg, size_t size)
{
// BPF_CORE_READ handles the offset relocation automatically
__u16 family = BPF_CORE_READ(sk, __sk_common.skc_family);
__u16 dport = BPF_CORE_READ(sk, __sk_common.skc_dport);
// ...
}
BPF_CORE_READ looks like a normal struct field access but it goes through the
CO-RE machinery. The compiler marks it as a relocatable access, and libbpf
resolves it against the target kernel’s BTF at load time.
Without CO-RE you’d be shipping pre-compiled BPF objects per kernel version, or
compiling on the target machine. With CO-RE you compile once and distribute a
single .o that works everywhere with BTF support.
Perf buffer vs ring buffer
There are two main ways to stream events from your BPF program to userspace.
Perf buffer
BPF_MAP_TYPE_PERF_EVENT_ARRAY is the original mechanism. It creates one
per-CPU circular buffer. Each CPU writes to its own buffer and userspace polls
all of them.
The problem: per-CPU allocation is wasteful. If you allocate 1MB per CPU on a 32-core machine you’ve committed 32MB even if most cores are idle. You also get events out of order across CPUs because you’re merging multiple independent streams.
Ring buffer
BPF_MAP_TYPE_RINGBUF, added in kernel 5.8, uses a single shared buffer. All
CPUs write to it, userspace reads from one place.
The advantages are real:
- one allocation shared across all CPUs
- events are ordered by submission time
- supports a reserve/commit pattern so you fill the struct in place without an extra copy
- userspace can be notified via epoll instead of busy-polling
Perf buffer:
CPU0 → [buf0] ─┐
CPU1 → [buf1] ─┼─→ userspace merges all
CPU2 → [buf2] ─┘
Ring buffer:
CPU0 ─┐
CPU1 ─┼─→ [shared buf] ─→ userspace
CPU2 ─┘
Unless you’re on a kernel older than 5.8, use ring buffer.
The skeleton: how userspace ties it together
Compiling your BPF C file with clang gives you a .o ELF object. But you still
need to load it into the kernel, set up the maps, and attach to the hook.
With libbpf, the modern approach is the skeleton pattern. You run:
# compile BPF code to BPF bytecode
clang -O2 -target bpf -g -c execve_trace.bpf.c -o execve_trace.bpf.o
# generate the skeleton header
bpftool gen skeleton execve_trace.bpf.o > execve_trace.skel.h
# compile the userspace loader
gcc execve_trace.c -o execve_trace -lbpf
The skeleton header has a struct representing your entire BPF program, maps, programs, links, and generated functions to manage it. Your userspace code then follows a clean three-step sequence:
#include "execve_trace.skel.h"
static int handle_event(void *ctx, void *data, size_t size)
{
struct event *e = data;
printf("pid=%-6d uid=%-6d comm=%s\n", e->pid, e->uid, e->comm);
return 0;
}
int main(void)
{
// 1. open: parse the .o, prepare internal state
struct execve_trace_bpf *skel = execve_trace_bpf__open();
// 2. load: verify and load into kernel via bpf() syscall
execve_trace_bpf__load(skel);
// 3. attach: create the kprobe and wire the program to it
execve_trace_bpf__attach(skel);
// consume events from the ring buffer
struct ring_buffer *rb = ring_buffer__new(
bpf_map__fd(skel->maps.events),
handle_event, NULL, NULL
);
while (1)
ring_buffer__poll(rb, 100);
}
Open, Load, Attach. That’s the whole thing. Under the hood __load calls
bpf(BPF_PROG_LOAD, ...) which triggers the verifier, and __attach calls
bpf(BPF_LINK_CREATE, ...) which installs the kprobe. After that your program
is live.
The skeleton handles all the boilerplate so you’re not manually calling the
bpf() syscall with magic integers and hand-rolled structs.
Before we wrap up, here’s a quick visual to anchor the flow.

Tagged: linuxebpfnetwork-security