Commit 41755299 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull powerpc updates from Madhavan Srinivasan:

 - powerpc support for BPF arena and arena atomics

 - Patches to switch to msi parent domain (per-device MSI domains)

 - Add a lock contention tracepoint in the queued spinlock slowpath

 - Fixes for underflow in pseries/powernv msi and pci paths

 - Switch from legacy-of-mm-gpiochip dependency to platform driver

 - Fixes for handling TLB misses

 - Introduce support for powerpc papr-hvpipe

 - Add vpa-dtl PMU driver for pseries platform

 - Misc fixes and cleanups

Thanks to Aboorva Devarajan, Aditya Bodkhe, Andrew Donnellan, Athira
Rajeev, Cédric Le Goater, Christophe Leroy, Erhard Furtner, Gautam
Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Joe Lawrence,
Kajol Jain, Kienan Stewart, Linus Walleij, Mahesh Salgaonkar, Nam Cao,
Nicolas Schier, Nysal Jan K.A., Ritesh Harjani (IBM), Ruben Wauters,
Saket Kumar Bhaskar, Shashank MS, Shrikanth Hegde, Tejas Manhas, Thomas
Gleixner, Thomas Huth, Thorsten Blum, Tyrel Datwyler, and Venkat Rao
Bagalkote.

* tag 'powerpc-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (49 commits)
  powerpc/pseries: Define __u{8,32} types in papr_hvpipe_hdr struct
  genirq/msi: Remove msi_post_free()
  powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU
  powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is needed
  powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer
  powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing DTL data
  docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu
  powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf
  powerpc/time: Expose boot_tb via accessor
  powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure
  powerpc/fprobe: fix updated fprobe for function-graph tracer
  powerpc/ftrace: support CONFIG_FUNCTION_GRAPH_RETVAL
  powerpc64/modules: replace stub allocation sentinel with an explicit counter
  powerpc64/modules: correctly iterate over stubs in setup_ftrace_ool_stubs
  powerpc/ftrace: ensure ftrace record ops are always set for NOPs
  powerpc/603: Really copy kernel PGD entries into all PGDIRs
  powerpc/8xx: Remove left-over instruction and comments in DataStoreTLBMiss handler
  powerpc/pseries: HVPIPE changes to support migration
  powerpc/pseries: Enable hvpipe with ibm,set-system-parameter RTAS
  powerpc/pseries: Enable HVPIPE event message interrupt
  ...
parents 9cc220a4 ef104054
Loading
Loading
Loading
Loading
+25 −0
Original line number Diff line number Diff line
What:           /sys/bus/event_source/devices/vpa_dtl/format
Date:           February 2025
Contact:        Linux on PowerPC Developer List <linuxppc-dev at lists.ozlabs.org>
Description:    Read-only. Attribute group to describe the magic bits
                that go into perf_event_attr.config for a particular pmu.
                (See ABI/testing/sysfs-bus-event_source-devices-format).

                Each attribute under this group defines a bit range of the
                perf_event_attr.config. Supported attribute are listed
                below::

				event  = "config:0-7"  - event ID

		For example::

		  dtl_cede = "event=0x1"

What:           /sys/bus/event_source/devices/vpa_dtl/events
Date:           February 2025
Contact:        Linux on PowerPC Developer List <linuxppc-dev at lists.ozlabs.org>
Description:	(RO) Attribute group to describe performance monitoring events
                for the Virtual Processor Dispatch Trace Log. Each attribute in
		this group describes a single performance monitoring event
		supported by vpa_dtl pmu.  The name of the file is the name of
		the event (See ABI/testing/sysfs-bus-event_source-devices-events).
+1 −0
Original line number Diff line number Diff line
@@ -37,6 +37,7 @@ powerpc
    vas-api
    vcpudispatch_stats
    vmemmap_dedup
    vpa-dtl

    features

+156 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0
.. _vpa-dtl:

===================================
DTL (Dispatch Trace Log)
===================================

Athira Rajeev, 19 April 2025

.. contents::
    :depth: 3


Basic overview
==============

The pseries Shared Processor Logical Partition(SPLPAR) machines can
retrieve a log of dispatch and preempt events from the hypervisor
using data from Disptach Trace Log(DTL) buffer. With this information,
user can retrieve when and why each dispatch & preempt has occurred.
The vpa-dtl PMU exposes the Virtual Processor Area(VPA) DTL counters
via perf.

Infrastructure used
===================

The VPA DTL PMU counters do not interrupt on overflow or generate any
PMI interrupts. Therefore, hrtimer is used to poll the DTL data. The timer
nterval can be provided by user via sample_period field in nano seconds.
vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. DTL (Dispatch
Trace Log) contains information about dispatch/preempt, enqueue time etc.
We directly copy the DTL buffer data as part of auxiliary buffer and it
will be processed later. This will avoid time taken to create samples
in the kernel space. The PMU driver collecting Dispatch Trace Log (DTL)
entries makes use of AUX support in perf infrastructure. On the tools side,
this data is made available as PERF_RECORD_AUXTRACE records.

To correlate each DTL entry with other events across CPU's, an auxtrace_queue
is created for each CPU. Each auxtrace queue has a array/list of auxtrace buffers.
All auxtrace queues is maintained in auxtrace heap. The queues are sorted
based on timestamp. When the different PERF_RECORD_XX records are processed,
compare the timestamp of perf record with timestamp of top element in the
auxtrace heap so that DTL events can be co-related with other events
Process the auxtrace queue if the timestamp of element from heap is
lower than timestamp from entry in perf record. Sometimes it could happen that
one buffer is only partially processed. if the timestamp of occurrence of
another event is more than currently processed element in the queue, it will
move on to next perf record. So keep track of position of buffer to continue
processing next time. Update the timestamp of the auxtrace heap with the timestamp
of last processed entry from the auxtrace buffer.

This infrastructure ensures dispatch trace log entries can be correlated
and presented along with other events like sched.

vpa-dtl PMU example usage
=========================

.. code-block:: sh

  # ls /sys/devices/vpa_dtl/
  events  format  perf_event_mux_interval_ms  power  subsystem  type  uevent


To capture the DTL data using perf record:
.. code-block:: sh

  # ./perf record -a -e sched:\*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1

The result can be interpreted using perf record. Snippet of perf report -D

.. code-block:: sh

  # ./perf report -D

There are different PERF_RECORD_XX records. In that records corresponding to
auxtrace buffers includes:

1. PERF_RECORD_AUX
   Conveys that new data is available in AUX area

2. PERF_RECORD_AUXTRACE_INFO
   Describes offset and size of auxtrace data in the buffers

3. PERF_RECORD_AUXTRACE
   This is the record that defines the auxtrace data which here in case of
   vpa-dtl pmu is dispatch trace log data.

Snippet from perf report -D showing the PERF_RECORD_AUXTRACE dump

.. code-block:: sh

0 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690  offset: 0  ref: 0  idx: 0  tid: -1  cpu: 0
.
. ... VPA DTL PMU data: size 1680 bytes, entries is 35
.  00000000: boot_tb: 21349649546353231, tb_freq: 512000000
.  00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773
.  00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437
.  00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709
.  000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243
.  000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648
.  00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446
.  00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126
.  00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665

Above is representation of dtl entry of below format:

struct dtl_entry {
        u8      dispatch_reason;
        u8      preempt_reason;
        u16     processor_id;
        u32     enqueue_to_dispatch_time;
        u32     ready_to_enqueue_time;
        u32     waiting_to_ready_time;
        u64     timebase;
        u64     fault_addr;
        u64     srr0;
        u64     srr1;

};

First two fields represent the dispatch reason and preempt reason. The post
processing of PERF_RECORD_AUXTRACE records will translate to meaningful data
for user to consume.

Visualize the dispatch trace log entries with perf report
=========================================================

.. code-block:: sh

  # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.300 MB perf.data ]

  # ./perf report
  # Samples: 321  of event 'vpa-dtl'
  # Event count (approx.): 321
  #
  # Children      Self  Command  Shared Object      Symbol
  # ........  ........  .......  .................  ..............................
  #
     100.00%   100.00%  swapper  [kernel.kallsyms]  [k] plpar_hcall_norets_notrace

Visualize the dispatch trace log entries with perf script
=========================================================

.. code-block:: sh

   # ./perf script
     migration/9      67 [009] 105373.359903:                     sched:sched_waking: comm=perf pid=13418 prio=120 target_cpu=009
     migration/9      67 [009] 105373.359904:               sched:sched_migrate_task: comm=perf pid=13418 prio=120 orig_cpu=9 dest_cpu=10
     migration/9      67 [009] 105373.359907:               sched:sched_stat_runtime: comm=migration/9 pid=67 runtime=4050 [ns]
     migration/9      67 [009] 105373.359908:                     sched:sched_switch: prev_comm=migration/9 prev_pid=67 prev_prio=0 prev_state=S ==> next_comm=swapper/9 next_pid=0 next_prio=120
            :256     256 [016] 105373.359913:                                vpa-dtl: timebase: 21403600706628832 dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4854,                        ready_to_enqueue_time:139, waiting_to_ready_time:511842115 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
            :256     256 [017] 105373.360012:                                vpa-dtl: timebase: 21403600706679454 dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:236,                         ready_to_enqueue_time:0, waiting_to_ready_time:133864583 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
            perf   13418 [010] 105373.360048:               sched:sched_stat_runtime: comm=perf pid=13418 runtime=139748 [ns]
            perf   13418 [010] 105373.360052:                     sched:sched_waking: comm=migration/10 pid=72 prio=0 target_cpu=010
+2 −0
Original line number Diff line number Diff line
@@ -374,6 +374,8 @@ Code Seq# Include File Comments
                                                                       <mailto:linuxppc-dev@lists.ozlabs.org>
0xB2  08     arch/powerpc/include/uapi/asm/papr-physical-attestation.h powerpc/pseries Physical Attestation API
                                                                       <mailto:linuxppc-dev@lists.ozlabs.org>
0xB2  09     arch/powerpc/include/uapi/asm/papr-hvpipe.h               powerpc/pseries HVPIPE API
                                                                       <mailto:linuxppc-dev@lists.ozlabs.org>
0xB3  00     linux/mmc/ioctl.h
0xB4  00-0F  linux/gpio.h                                              <mailto:linux-gpio@vger.kernel.org>
0xB5  00-0F  uapi/linux/rpmsg.h                                        <mailto:linux-remoteproc@vger.kernel.org>
+3 −1
Original line number Diff line number Diff line
@@ -243,12 +243,14 @@ config PPC
	select HAVE_EFFICIENT_UNALIGNED_ACCESS
	select HAVE_GUP_FAST
	select HAVE_FTRACE_GRAPH_FUNC
	select HAVE_FTRACE_REGS_HAVING_PT_REGS
	select HAVE_FUNCTION_ARG_ACCESS_API
	select HAVE_FUNCTION_DESCRIPTORS	if PPC64_ELF_ABI_V1
	select HAVE_FUNCTION_ERROR_INJECTION
	select HAVE_FUNCTION_GRAPH_FREGS
	select HAVE_FUNCTION_GRAPH_TRACER
	select HAVE_FUNCTION_TRACER		if !COMPILE_TEST && (PPC64 || (PPC32 && CC_IS_GCC))
	select HAVE_GCC_PLUGINS			if GCC_VERSION >= 50200   # plugin support on gcc <= 5.1 is buggy on PPC
	select HAVE_GCC_PLUGINS
	select HAVE_GENERIC_VDSO
	select HAVE_HARDLOCKUP_DETECTOR_ARCH	if PPC_BOOK3S_64 && SMP
	select HAVE_HARDLOCKUP_DETECTOR_PERF	if PERF_EVENTS && HAVE_PERF_EVENTS_NMI
Loading