Stackwhere

Where is the stack usage of my BPF program coming from?

In BPF, every byte of stack usage counts, the verifier limits us to 512 bytes of stack per BPF program or 256 bytes when combining tail calls and BPF-to-BPF functions. You cannot always tell which variables in your sources contribute to stack usage due to compiler optimizations. stackwhere uses DWARF debug info to get information about stack usage from the actual ELF file.

Installation

Release binary

Go to https://github.com/cilium/stackwhere/releases and find the latest stable release.
Download the binary for your OS and CPU architecture.
Unpack the .tar.gz archive and move binary to permanent directory

Example on Linux AMD64

wget https://github.com/cilium/stackwhere/releases/download/v0.1.0/stackwhere_0.1.0_linux_amd64.tar.gz

# Optional, verify checksum
wget https://github.com/cilium/stackwhere/releases/download/v0.1.0/stackwhere_0.1.0_checksums.txt
sha256sum -c stackwhere_0.1.0_checksums.txt --ignore-missing
# Should say: stackwhere_0.1.0_linux_amd64.tar.gz: OK

tar -xzf stackwhere_0.1.0_linux_amd64.tar.gz
sudo mv stackwhere /usr/bin/stackwhere

Go install

Install the latest version by running go install github.com/cilium/stackwhere/cmd/stackwhere@latest.

Usage

Stackwhere uses DWARF debug info, which requires the .o files to be created with the -g compiler option and to not be stripped.

Inspecting collections

The collection (alias coll/c) sub-command lists the peak stack usage for each program / function in the given object file.

$ stackwhere collection {path to .o}
128 bytes - my_example_program
 80 bytes - my_other_program
 20 bytes - small_function

Inspecting programs

The program (alias prog/p) sub-command lists the stack usage of a single program / function.

$ stackwhere program {path to .o} {name of program}
R10-16:
  8 - example_var1 @ /some/path/to/your/sources/bpf.c:12
R10-24:
  16 - example_var2 @ /some/path/to/your/sources/bpf.c:14
R10-40:
  8 - example_var3 @ /some/path/to/your/sources/bpf.c:20
  4 - example_var5 @ /some/path/to/your/sources/bpf.c:41
R10-48:
  8 - example_var4 @ /some/path/to/your/sources/bpf.c:21

This prints each offset into the stack found in the program when attributable to something found in the sources. Stack offsets are always R10 minus some fixed offset, BPF does not have push/pop instructions, so stack size is always predetermined and offsets within it assigned by the compiler. R10 is the stack frame the function.

Stack usage happens in three situations:

A local variable or function argument is larger than 8 bytes, and thus cannot fit in a register value.
All register values are full, variables or intermediate values are spilled onto the stack.
A pointer to a local variable or attribute is passed to a helper function or kfunc.

So local variables and function arguments of inlined functions can appear in the output, be referred to as "variables".

Multiple variables can share the same offset when their lifetimes do not overlap, so stack slots are reused by the compiler when it thinks its safely able to. Stackwhere groups variables at the same offsets together and outputs them from low offsets (the "bottom" of the stack) to high offsets ("top" top of the stack).

Spilling of unamed values, such as intermediate values of an expression does not appear in this output (at this this). Such spilled values can share stack slots with named variables and may use gaps in offsets that are shown.

Tips for reducing stack usage

The limiting factor for a BPF program is the peak memory usage, some ideas to reduce peak memory usage are:

Removing variables. not always possible, but a 1 byte variable can take up a whole 8 byte stack slot.
Shorten variable lifetimes. Use inlined functions or variable scopes to limit the lifetime of a variable. Attempt to order code such that variables can have smaller lifetimes. This promotes stack slot reuse, less variables alive a the same time, smaller peak usage.
Utilize non-stack memory. Avoid storing variables which can be derived from the program context. Use map values instead of copying and holding onto individual field values. Use per-CPU map values or global variables to store large values (that are not used as input to branching logic).
Split programs into tail calls.
Reduce function call depth.

Warning, some of these measures may increase verifier complexity and/or runtime performance, there are always tradeoffs.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
cmd/stackwhere		cmd/stackwhere
internal/dwarf		internal/dwarf
testdata		testdata
.gitignore		.gitignore
.golangci.yaml		.golangci.yaml
.goreleaser.yaml		.goreleaser.yaml
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stackwhere

Installation

Release binary

Example on Linux AMD64

Go install

Usage

Inspecting collections

Inspecting programs

Tips for reducing stack usage

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stackwhere

Installation

Release binary

Example on Linux AMD64

Go install

Usage

Inspecting collections

Inspecting programs

Tips for reducing stack usage

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages