SYNOPSIS
perf report [-i <file> | --input=file]
DESCRIPTION
This command displays the performance counter profile information recorded via perf record.
OPTIONS
- -i
- --input=
-
Input file name. (default: perf.data unless stdin is a fifo)
- -v
- --verbose
-
Be more verbose. (show symbol address, etc)
- -n
- --show-nr-samples
-
Show the number of samples for each symbol
- --showcpuutilization
-
Show sample percentage for different cpu modes.
- -T
- --threads
-
Show per-thread event counters
- -c
- --comms=
-
Only consider symbols in these comms. CSV that understands file://filename entries. This option will affect the percentage of the overhead column. See --percentage for more info.
- --pid=
-
Only show events for given process ID (comma separated list).
- --tid=
-
Only show events for given thread ID (comma separated list).
- -d
- --dsos=
-
Only consider symbols in these dsos. CSV that understands file://filename entries. This option will affect the percentage of the overhead column. See --percentage for more info.
- -S
- --symbols=
-
Only consider these symbols. CSV that understands file://filename entries. This option will affect the percentage of the overhead column. See --percentage for more info.
- --symbol-filter=
-
Only show symbols that match (partially) with this filter.
- -U
- --hide-unresolved
-
Only display entries resolved to a symbol.
- -s
- --sort=
-
Sort histogram entries by given key(s) - multiple keys can be specified in CSV format. Following sort keys are available: pid, comm, dso, symbol, parent, cpu, srcline, weight, local_weight.
Each key has following meaning:
-
comm: command (name) of the task which can be read via /proc/<pid>/comm
-
pid: command and tid of the task
-
dso: name of library or module executed at the time of sample
-
symbol: name of function executed at the time of sample
-
parent: name of function matched to the parent regex filter. Unmatched entries are displayed as "[other]".
-
cpu: cpu number the task ran at the time of sample
-
srcline: filename and line number executed at the time of sample. The DWARF debugging info must be provided.
-
weight: Event specific weight, e.g. memory latency or transaction abort cost. This is the global weight.
-
local_weight: Local weight version of the weight above.
-
transaction: Transaction abort flags.
-
overhead: Overhead percentage of sample
-
overhead_sys: Overhead percentage of sample running in system mode
-
overhead_us: Overhead percentage of sample running in user mode
-
overhead_guest_sys: Overhead percentage of sample running in system mode on guest machine
-
overhead_guest_us: Overhead percentage of sample running in user mode on guest machine
-
sample: Number of sample
-
period: Raw number of event count of sample
By default, comm, dso and symbol keys are used. (i.e. --sort comm,dso,symbol)
If --branch-stack option is used, following sort keys are also available: dso_from, dso_to, symbol_from, symbol_to, mispredict.
-
dso_from: name of library or module branched from
-
dso_to: name of library or module branched to
-
symbol_from: name of function branched from
-
symbol_to: name of function branched to
-
mispredict: "N" for predicted branch, "Y" for mispredicted branch
-
in_tx: branch in TSX transaction
-
abort: TSX transaction abort.
And default sort keys are changed to comm, dso_from, symbol_from, dso_to and symbol_to, see '--branch-stack'.
-
- -F
- --fields=
-
Specify output field - multiple keys can be specified in CSV format. Following fields are available: overhead, overhead_sys, overhead_us, overhead_children, sample and period. Also it can contain any sort key(s).
By default, every sort keys not specified in -F will be appended automatically.
If --mem-mode option is used, following sort keys are also available (incompatible with --branch-stack): symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline.
-
symbol_daddr: name of data symbol being executed on at the time of sample
-
dso_daddr: name of library or module containing the data being executed on at the time of sample
-
locked: whether the bus was locked at the time of sample
-
tlb: type of tlb access for the data at the time of sample
-
mem: type of memory access for the data at the time of sample
-
snoop: type of snoop (if any) for the data at the time of sample
-
dcacheline: the cacheline the data address is on at the time of sample
And default sort keys are changed to local_weight, mem, sym, dso, symbol_daddr, dso_daddr, snoop, tlb, locked, see '--mem-mode'.
-
- -p
- --parent=<regex>
-
A regex filter to identify parent. The parent is a caller of this function and searched through the callchain, thus it requires callchain information recorded. The pattern is in the exteneded regex format and defaults to "^sys_|^do_page_fault", see --sort parent.
- -x
- --exclude-other
-
Only display entries with parent-match.
- -w
- --column-widths=<width[,width…]>
-
Force each column width to the provided list, for large terminal readability. 0 means no limit (default behavior).
- -t
- --field-separator=
-
Use a special separator character and don’t pad with spaces, replacing all occurrences of this separator in symbol names (and other output) with a . character, that thus it’s the only non valid separator.
- -D
- --dump-raw-trace
-
Dump raw trace in ASCII.
- -g [type,min[,limit],order[,key][,branch]]
- --call-graph
-
Display call chains using type, min percent threshold, optional print limit and order. type can be either:
-
flat: single column, linear exposure of call chains.
-
graph: use a graph tree, displaying absolute overhead rates.
-
fractal: like graph, but displays relative rates. Each branch of the tree is considered as a new profiled object.
order can be either: - callee: callee based call graph. - caller: inverted caller based call graph.
key can be: - function: compare on functions - address: compare on individual code addresses
branch can be: - branch: include last branch information in callgraph when available. Usually more convenient to use --branch-history for this.
Default: fractal,0.5,callee,function.
-
- --children
-
Accumulate callchain of children to parent entry so that then can show up in the output. The output will have a new "Children" column and will be sorted on the data. It requires callchains are recorded.
- --max-stack
-
Set the stack depth limit when parsing the callchain, anything beyond the specified depth will be ignored. This is a trade-off between information loss and faster processing especially for workloads that can have a very long callchain stack.
Default: 127
- -G
- --inverted
-
alias for inverted caller based call graph.
- --ignore-callees=<regex>
-
Ignore callees of the function(s) matching the given regex. This has the effect of collecting the callers of each such function into one place in the call-graph tree.
- --pretty=<key>
-
Pretty printing style. key: normal, raw
- --stdio
-
Use the stdio interface.
- --tui
-
Use the TUI interface, that is integrated with annotate and allows zooming into DSOs or threads, among other features. Use of --tui requires a tty, if one is not present, as when piping to other commands, the stdio interface is used.
- --gtk
-
Use the GTK2 interface.
- -k
- --vmlinux=<file>
-
vmlinux pathname
- --kallsyms=<file>
-
kallsyms pathname
- -m
- --modules
-
Load module symbols. WARNING: This should only be used with -k and a LIVE kernel.
- -f
- --force
-
Don’t complain, do it.
- --symfs=<directory>
-
Look for files with symbols relative to this directory.
- -C
- --cpu
-
Only report samples for the list of CPUs provided. Multiple CPUs can be provided as a comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. Default is to report samples on all CPUs.
- -M
- --disassembler-style=
-
Set disassembler style for objdump.
- --source
-
Interleave source code with assembly code. Enabled by default, disable with --no-source.
- --asm-raw
-
Show raw instruction encoding of assembly instructions.
- --show-total-period
-
Show a column with the sum of periods.
- -I
- --show-info
-
Display extended information about the perf.data file. This adds information which may be very large and thus may clutter the display. It currently includes: cpu and numa topology of the host system.
- -b
- --branch-stack
-
Use the addresses of sampled taken branches instead of the instruction address to build the histograms. To generate meaningful output, the perf.data file must have been obtained using perf record -b or perf record --branch-filter xxx where xxx is a branch filter option. perf report is able to auto-detect whether a perf.data file contains branch stacks and it will automatically switch to the branch view mode, unless --no-branch-stack is used.
- --branch-history
-
Add the addresses of sampled taken branches to the callstack. This allows to examine the path the program took to each sample. The data collection must have used -b (or -j) and -g.
- --objdump=<path>
-
Path to objdump binary.
- --group
-
Show event group information together.
- --demangle
-
Demangle symbol names to human readable form. It’s enabled by default, disable with --no-demangle.
- --demangle-kernel
-
Demangle kernel symbol names to human readable form (for C++ kernels).
- --mem-mode
-
Use the data addresses of samples in addition to instruction addresses to build the histograms. To generate meaningful output, the perf.data file must have been obtained using perf record -d -W and using a special event -e cpu/mem-loads/ or -e cpu/mem-stores/. See perf mem for simpler access.
- --percent-limit
-
Do not show entries which have an overhead under that percent. (Default: 0).
- --percentage
-
Determine how to display the overhead percentage of filtered entries. Filters can be applied by --comms, --dsos and/or --symbols options and Zoom operations on the TUI (thread, dso, etc).
"relative" means it's relative to filtered entries only so that the sum of shown entries will be always 100%. "absolute" means it retains the original value before and after the filter is applied.
- --header
-
Show header information in the perf.data file. This includes various information like hostname, OS and perf version, cpu/mem info, perf command line, event list and so on. Currently only --stdio output supports this feature.
- --header-only
-
Show only perf.data header (forces --stdio).