Commit f2d64a22 authored by Will Deacon's avatar Will Deacon
Browse files

Merge branch 'for-next/perf' into for-next/core

* for-next/perf: (29 commits)
  perf/dwc_pcie: Fix use of uninitialized variable
  Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU
  Documentation: hisi-pmu: Fix of minor format error
  drivers/perf: hisi: Add support for L3C PMU v3
  drivers/perf: hisi: Refactor the event configuration of L3C PMU
  drivers/perf: hisi: Extend the field of tt_core
  drivers/perf: hisi: Extract the event filter check of L3C PMU
  drivers/perf: hisi: Simplify the probe process of each L3C PMU version
  drivers/perf: hisi: Export hisi_uncore_pmu_isr()
  drivers/perf: hisi: Relax the event ID check in the framework
  perf: Fujitsu: Add the Uncore PMU driver
  perf/arm-cmn: Fix CMN S3 DTM offset
  perf: arm_spe: Prevent overflow in PERF_IDX2OFF()
  coresight: trbe: Prevent overflow in PERF_IDX2OFF()
  MAINTAINERS: Remove myself from HiSilicon PMU maintainers
  drivers/perf: hisi: Add support for HiSilicon MN PMU driver
  drivers/perf: hisi: Add support for HiSilicon NoC PMU
  perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions
  arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS
  arm64/boot: Factor out a macro to check SPE version
  ...
parents 77dfca70 2084660a
Loading
Loading
Loading
Loading
+2 −2
Original line number Diff line number Diff line
@@ -16,8 +16,8 @@ provides the following two features:

- one 64-bit counter for Time Based Analysis (RX/TX data throughput and
  time spent in each low-power LTSSM state) and
- one 32-bit counter for Event Counting (error and non-error events for
  a specified lane)
- one 32-bit counter per event for Event Counting (error and non-error
  events for a specified lane)

Note: There is no interrupt for counter overflow.

+110 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0-only

================================================
Fujitsu Uncore Performance Monitoring Unit (PMU)
================================================

This driver supports the Uncore MAC PMUs and the Uncore PCI PMUs found
in Fujitsu chips.
Each MAC PMU on these chips is exposed as a uncore perf PMU with device name
mac_iod<iod>_mac<mac>_ch<ch>.
And each PCI PMU on these chips is exposed as a uncore perf PMU with device name
pci_iod<iod>_pci<pci>.

The driver provides a description of its available events and configuration
options in sysfs, see /sys/bus/event_sources/devices/mac_iod<iod>_mac<mac>_ch<ch>/
and /sys/bus/event_sources/devices/pci_iod<iod>_pci<pci>/.
This driver exports:
- formats, used by perf user space and other tools to configure events
- events, used by perf user space and other tools to create events
  symbolically, e.g.:
    perf stat -a -e mac_iod0_mac0_ch0/event=0x21/ ls
    perf stat -a -e pci_iod0_pci0/event=0x24/ ls
- cpumask, used by perf user space and other tools to know on which CPUs
  to open the events

This driver supports the following events for MAC:
- cycles
  This event counts MAC cycles at MAC frequency.
- read-count
  This event counts the number of read requests to MAC.
- read-count-request
  This event counts the number of read requests including retry to MAC.
- read-count-return
  This event counts the number of responses to read requests to MAC.
- read-count-request-pftgt
  This event counts the number of read requests including retry with PFTGT
  flag.
- read-count-request-normal
  This event counts the number of read requests including retry without PFTGT
  flag.
- read-count-return-pftgt-hit
  This event counts the number of responses to read requests which hit the
  PFTGT buffer.
- read-count-return-pftgt-miss
  This event counts the number of responses to read requests which miss the
  PFTGT buffer.
- read-wait
  This event counts outstanding read requests issued by DDR memory controller
  per cycle.
- write-count
  This event counts the number of write requests to MAC (including zero write,
  full write, partial write, write cancel).
- write-count-write
  This event counts the number of full write requests to MAC (not including
  zero write).
- write-count-pwrite
  This event counts the number of partial write requests to MAC.
- memory-read-count
  This event counts the number of read requests from MAC to memory.
- memory-write-count
  This event counts the number of full write requests from MAC to memory.
- memory-pwrite-count
  This event counts the number of partial write requests from MAC to memory.
- ea-mac
  This event counts energy consumption of MAC.
- ea-memory
  This event counts energy consumption of memory.
- ea-memory-mac-write
  This event counts the number of write requests from MAC to memory.
- ea-ha
  This event counts energy consumption of HA.

  'ea' is the abbreviation for 'Energy Analyzer'.

Examples for use with perf::

  perf stat -e mac_iod0_mac0_ch0/ea-mac/ ls

And, this driver supports the following events for PCI:
- pci-port0-cycles
  This event counts PCI cycles at PCI frequency in port0.
- pci-port0-read-count
  This event counts read transactions for data transfer in port0.
- pci-port0-read-count-bus
  This event counts read transactions for bus usage in port0.
- pci-port0-write-count
  This event counts write transactions for data transfer in port0.
- pci-port0-write-count-bus
  This event counts write transactions for bus usage in port0.
- pci-port1-cycles
  This event counts PCI cycles at PCI frequency in port1.
- pci-port1-read-count
  This event counts read transactions for data transfer in port1.
- pci-port1-read-count-bus
  This event counts read transactions for bus usage in port1.
- pci-port1-write-count
  This event counts write transactions for data transfer in port1.
- pci-port1-write-count-bus
  This event counts write transactions for bus usage in port1.
- ea-pci
  This event counts energy consumption of PCI.

  'ea' is the abbreviation for 'Energy Analyzer'.

Examples for use with perf::

  perf stat -e pci_iod0_pci0/ea-pci/ ls

Given that these are uncore PMUs the driver does not support sampling, therefore
"perf record" will not work. Per-task perf sessions are not supported.
+47 −2
Original line number Diff line number Diff line
@@ -18,9 +18,10 @@ HiSilicon SoC uncore PMU driver
Each device PMU has separate registers for event counting, control and
interrupt, and the PMU driver shall register perf PMU drivers like L3C,
HHA and DDRC etc. The available events and configuration options shall
be described in the sysfs, see:
be described in the sysfs, see::

/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>

/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>.
The "perf list" command shall list the available events from sysfs.

Each L3C, HHA and DDRC is registered as a separate PMU with perf. The PMU
@@ -112,6 +113,50 @@ uring channel. It is 2 bits. Some important codes are as follows:
- 2'b00: default value, count the events which sent to the both uring and
  uring_ext channel;

6. ch: NoC PMU supports filtering the event counts of certain transaction
channel with this option. The current supported channels are as follows:

- 3'b010: Request channel
- 3'b100: Snoop channel
- 3'b110: Response channel
- 3'b111: Data channel

7. tt_en: NoC PMU supports counting only transactions that have tracetag set
if this option is set. See the 2nd list for more information about tracetag.

For HiSilicon uncore PMU v3 whose identifier is 0x40, some uncore PMUs are
further divided into parts for finer granularity of tracing, each part has its
own dedicated PMU, and all such PMUs together cover the monitoring job of events
on particular uncore device. Such PMUs are described in sysfs with name format
slightly changed::

/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}_{Z}/ddrc{Y}_{Z}/noc{Y}_{Z}>

Z is the sub-id, indicating different PMUs for part of hardware device.

Usage of most PMUs with different sub-ids are identical. Specially, L3C PMU
provides ``ext`` option to allow exploration of even finer granual statistics
of L3C PMU.  L3C PMU driver uses that as hint of termination when delivering
perf command to hardware:

- ext=0: Default, could be used with event names.
- ext=1 and ext=2: Must be used with event codes, event names are not supported.

An example of perf command could be::

  $# perf stat -a -e hisi_sccl0_l3c1_0/rd_spipe/ sleep 5

or::

  $# perf stat -a -e hisi_sccl0_l3c1_0/event=0x1,ext=1/ sleep 5

As above, ``hisi_sccl0_l3c1_0`` locates PMU of Super CPU CLuster 0, L3 cache 1
pipe0.

First command locates the first part of L3C since ``ext=0`` is implied by
default. Second command issues the counting on another part of L3C with the
event ``0x1``.

Users could configure IDs to count data come from specific CCL/ICL, by setting
srcid_cmd & srcid_msk, and data desitined for specific CCL/ICL by setting
tgtid_cmd & tgtid_msk. A set bit in srcid_msk/tgtid_msk means the PMU will not
+1 −0
Original line number Diff line number Diff line
@@ -29,3 +29,4 @@ Performance monitor support
   cxl
   ampere_cspmu
   mrvl-pem-pmu
   fujitsu_uncore_pmu
+11 −0
Original line number Diff line number Diff line
@@ -466,6 +466,17 @@ Before jumping into the kernel, the following conditions must be met:
    - HDFGWTR2_EL2.nPMICFILTR_EL0 (bit 3) must be initialised to 0b1.
    - HDFGWTR2_EL2.nPMUACR_EL1 (bit 4) must be initialised to 0b1.

  For CPUs with SPE data source filtering (FEAT_SPE_FDS):

  - If EL3 is present:

    - MDCR_EL3.EnPMS3 (bit 42) must be initialised to 0b1.

  - If the kernel is entered at EL1 and EL2 is present:

    - HDFGRTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
    - HDFGWTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.

  For CPUs with Memory Copy and Memory Set instructions (FEAT_MOPS):

  - If the kernel is entered at EL1 and EL2 is present:
Loading