Commit bbeb1088 authored by Namhyung Kim's avatar Namhyung Kim Committed by Arnaldo Carvalho de Melo
Browse files

perf mem: Document new output fields (op, cache, mem, dtlb, snoop)



Update the documentation of the new fields with examples and caveats.

Also update the related documentation for AMD IBS.

Reviewed-by: default avatarRavi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250610005742.2173050-1-namhyung@kernel.org


Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
parent 11cfaf37
Loading
Loading
Loading
Loading
+42 −17
Original line number Diff line number Diff line
@@ -171,23 +171,48 @@ Below is a simple example of the perf mem tool.
	# perf mem report

A normal perf mem report output will provide detailed memory access profile.
However, it can also be aggregated based on output fields. For example:

	# perf mem report -F mem,sample,snoop
	Samples: 3M of event 'ibs_op//', Event count (approx.): 23524876
	Memory access                                 Samples  Snoop
	N/A                                           1903343  N/A
	L1 hit                                        1056754  N/A
	L2 hit                                          75231  N/A
	L3 hit                                           9496  HitM
	L3 hit                                           2270  N/A
	RAM hit                                          8710  N/A
	Remote node, same socket RAM hit                 3241  N/A
	Remote core, same node Any cache hit             1572  HitM
	Remote core, same node Any cache hit              514  N/A
	Remote node, same socket Any cache hit           1216  HitM
	Remote node, same socket Any cache hit            350  N/A
	Uncached hit                                       18  N/A
New output fields will show related access info together.  For example:

	# perf mem report -F overhead,cache,snoop,comm
	...
	# Samples: 92K of event 'ibs_op//'
	# Total weight : 531104
	#
	#           ---------- Cache -----------  --- Snoop ----
	# Overhead       L1     L2 L1-buf  Other     HitM  Other  Command
	# ........  ............................  ..............  ..........
	#
	    76.07%     5.8%  35.7%   0.0%  34.6%    23.3%  52.8%  cc1
	     5.79%     0.2%   0.0%   0.0%   5.6%     0.1%   5.7%  make
	     5.78%     0.1%   4.4%   0.0%   1.2%     0.5%   5.3%  gcc
	     5.33%     0.3%   3.9%   0.0%   1.1%     0.2%   5.2%  as
	     5.00%     0.1%   3.8%   0.0%   1.0%     0.3%   4.7%  sh
	     1.56%     0.1%   0.1%   0.0%   1.4%     0.6%   0.9%  ld
	     0.28%     0.1%   0.0%   0.0%   0.2%     0.1%   0.2%  pkg-config
	     0.09%     0.0%   0.0%   0.0%   0.1%     0.0%   0.1%  git
	     0.03%     0.0%   0.0%   0.0%   0.0%     0.0%   0.0%  rm
	     ...

Also, it can be aggregated based on various memory access info using the
sort keys.  For example:

	# perf mem report -s mem,snoop
	...
	# Samples: 92K of event 'ibs_op//'
	# Total weight : 531104
	# Sort order   : mem,snoop
	#
	# Overhead       Samples  Memory access                            Snoop
	# ........  ............  .......................................  ............
	#
	    47.99%          1509  L2 hit                                   N/A
	    25.08%           338  core, same node Any cache hit            HitM
	    10.24%         54374  N/A                                      N/A
	     6.77%         35938  L1 hit                                   N/A
	     6.39%           101  core, same node Any cache hit            N/A
	     3.50%            69  RAM hit                                  N/A
	     0.03%           158  LFB/MAB hit                              N/A
	     0.00%             2  Uncached hit                             N/A

Please refer to their man page for more detail.

+50 −0
Original line number Diff line number Diff line
@@ -119,6 +119,22 @@ REPORT OPTIONS
	And the default sort keys are changed to local_weight, mem, sym, dso,
	symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat.

-F::
--fields=::
	Specify output field - multiple keys can be specified in CSV format.
	Please see linkperf:perf-report[1] for details.

	In addition to the default fields, 'perf mem report' will provide the
	following fields to break down sample periods.

	- op: operation in the sample instruction (load, store, prefetch, ...)
	- cache: location in CPU cache (L1, L2, ...) where the sample hit
	- mem: location in memory or other places the sample hit
	- dtlb: location in Data TLB (L1, L2) where the sample hit
	- snoop: snoop result for the sampled data access

	Please take a look at the OUTPUT FIELD SELECTION section for caveats.

-T::
--type-profile::
	Show data-type profile result instead of code symbols.  This requires
@@ -156,6 +172,40 @@ but one sample with weight 180 and the other with weight 20:
  90%   [k] memcpy
  10%   [.] strcmp

OUTPUT FIELD SELECTION
----------------------
"perf mem report" adds a number of new output fields specific to data source
information in the sample.  Some of them have the same name with the existing
sort keys ("mem" and "snoop").  So unlike other fields and sort keys, they'll
behave differently when it's used by -F/--fields or -s/--sort.

Using those two as output fields will aggregate samples altogether and show
breakdown.

  $ perf mem report -F mem,snoop
  ...
  # ------ Memory -------  --- Snoop ----
  #     RAM Uncach  Other     HitM  Other
  # .....................  ..............
  #
       3.5%   0.0%  96.5%    25.1%  74.9%

But using the same name for sort keys will aggregate samples for each type
separately.

  $ perf mem report -s mem,snoop
  # Overhead       Samples  Memory access                            Snoop
  # ........  ............  .......................................  ............
  #
      47.99%          1509  L2 hit                                   N/A
      25.08%           338  core, same node Any cache hit            HitM
      10.24%         54374  N/A                                      N/A
       6.77%         35938  L1 hit                                   N/A
       6.39%           101  core, same node Any cache hit            N/A
       3.50%            69  RAM hit                                  N/A
       0.03%           158  LFB/MAB hit                              N/A
       0.00%             2  Uncached hit                             N/A

SEE ALSO
--------
linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-arm-spe[1]