Commit e4bf304f authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge tag 'trace-ringbuffer-v7.1' of...

Merge tag 'trace-ringbuffer-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ring-buffer updates from Steven Rostedt:

 - Add remote buffers for pKVM

   pKVM has a hypervisor component that is used to protect the guest
   from the host kernel. This hypervisor is a black box to the kernel as
   the kernel is to user space. The remote buffers are used to have a
   memory mapping between the hypervisor and the kernel where kernel may
   send commands to enable tracing within the hypervisor. Then the
   kernel will read this memory mapping just like user space can read
   the memory mapped ring buffer of the kernel tracing system.

   Since the hypervisor only has a single context, it doesn't need to
   worry about races between normal context, interrupt context and NMIs
   like the kernel does. The ring buffer it uses doesn't need to be as
   complex. The remote buffers are a simple version of the ring buffer
   that works in a single context. They are still per-CPU and use sub
   buffers. The data layout is the same as the kernel's ring buffer to
   share the same parsing.

   Currently, only ARM64 implements pKVM, but there's work to implement
   it also in x86. The remote buffer code is separated out from the ARM
   implementation so that it can be used in the future by x86.

   The ARM64 updates for pKVM is in the ARM/KVM tree and it merged in
   the remote buffers of this tree.

 - Make the backup instance non reusable

   The backup instance is a copy of the persistent ring buffer so that
   the persistent ring buffer could start recording again without using
   the data from the previous boot. The backup isn't for normal tracing.
   It is made read-only, and after it is consumed, it is automatically
   removed.

 - Have backup copy persistent instance before it starts recording

   To allow the persistent ring buffer to start recording from the
   kernel command line commands, move the copy of the backup instance to
   before the the command line options start recording.

 - Report header_page overwrite field as "char" and not "int'

   The rust parser of the header_page file was triggering a warning when
   it defined the overwrite variable as "int" but it was only a single
   byte in size.

 - Fix memory barriers for the trace_buffer CPU mask

   When a CPU comes online, the bit is set to allow readers to know that
   the CPU buffer is allocated. The bit is set after the allocation is
   done, and a smp_wmb() is performed after the allocation and before
   the setting of the bit. But instead of adding a smp_rmb() to all
   readers, since once a buffer is created for a CPU it is not deleted
   if that CPU goes offline, so this allocation is almost always done at
   boot up before any readers exist.

   If for the unlikely case where a CPU comes online for the first time
   after the system boot has finished, send an IPI to all CPUs to force
   the smp_rmb() for each CPU.

 - Show clock function being used in debugging ring buffer data

   When the ring buffer checks are enabled and the ring buffer detects
   an inconsistency in the times of the invents, print out the clock
   being used when the error occurred. There was a very hard to hit bug
   that would happen every so often and it ended up being only triggered
   when the jiffies clock was being used. If the bug showed the clock
   being used, it would have been much easier to find the problem (which
   was an internal function was being traced which caused the clock
   accounting to go off).

* tag 'trace-ringbuffer-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits)
  ring-buffer: Prevent off-by-one array access in ring_buffer_desc_page()
  ring-buffer: Report header_page overwrite as char
  tracing: Allow backup to save persistent ring buffer before it starts
  tracing/Documentation: Add a section about backup instance
  tracing: Remove the backup instance automatically after read
  tracing: Make the backup instance non-reusable
  ring-buffer: Enforce read ordering of trace_buffer cpumask and buffers
  ring-buffer: Show what clock function is used on timestamp errors
  tracing: Check for undefined symbols in simple_ring_buffer
  tracing: load/unload page callbacks for simple_ring_buffer
  Documentation: tracing: Add tracing remotes
  tracing: selftests: Add trace remote tests
  tracing: Add a trace remote module for testing
  tracing: Introduce simple_ring_buffer
  ring-buffer: Export buffer_data_page and macros
  tracing: Add helpers to create trace remote events
  tracing: Add events/ root files to trace remotes
  tracing: Add events to trace remotes
  tracing: Add init callback to trace remotes
  tracing: Add non-consuming read to trace remotes
  ...
parents 15218296 6170922f
Loading
Loading
Loading
Loading
+19 −0
Original line number Diff line number Diff line
@@ -159,3 +159,22 @@ If setting it from the kernel command line, it is recommended to also
disable tracing with the "traceoff" flag, and enable tracing after boot up.
Otherwise the trace from the most recent boot will be mixed with the trace
from the previous boot, and may make it confusing to read.

Using a backup instance for keeping previous boot data
------------------------------------------------------

It is also possible to record trace data at system boot time by specifying
events with the persistent ring buffer, but in this case the data before the
reboot will be lost before it can be read. This problem can be solved by a
backup instance. From the kernel command line::

  reserve_mem=12M:4096:trace trace_instance=boot_map@trace,sched,irq trace_instance=backup=boot_map

On boot up, the previous data in the "boot_map" is copied to the "backup"
instance, and the "sched:*" and "irq:*" events for the current boot are traced
in the "boot_map". Thus the user can read the previous boot data from the "backup"
instance without stopping the trace.

Note that this "backup" instance is readonly, and will be removed automatically
if you clear the trace data or read out all trace data from the "trace_pipe"
or the "trace_pipe_raw" files.
+11 −0
Original line number Diff line number Diff line
@@ -92,6 +92,17 @@ interactions.
   user_events
   uprobetracer

Remote Tracing
--------------

This section covers the framework to read compatible ring-buffers, written by
entities outside of the kernel (most likely firmware or hypervisor)

.. toctree::
   :maxdepth: 1

   remotes

Additional Resources
--------------------

+66 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0

===============
Tracing Remotes
===============

:Author: Vincent Donnefort <vdonnefort@google.com>

Overview
========
Firmware and hypervisors are black boxes to the kernel. Having a way to see what
they are doing can be useful to debug both. This is where remote tracing buffers
come in. A remote tracing buffer is a ring buffer executed by the firmware or
hypervisor into memory that is memory mapped to the host kernel. This is similar
to how user space memory maps the kernel ring buffer but in this case the kernel
is acting like user space and the firmware or hypervisor is the "kernel" side.
With a trace remote ring buffer, the firmware and hypervisor can record events
for which the host kernel can see and expose to user space.

Register a remote
=================
A remote must provide a set of callbacks `struct trace_remote_callbacks` whom
description can be found below. Those callbacks allows Tracefs to enable and
disable tracing and events, to load and unload a tracing buffer (a set of
ring-buffers) and to swap a reader page with the head page, which enables
consuming reading.

.. kernel-doc:: include/linux/trace_remote.h

Once registered, an instance will appear for this remote in the Tracefs
directory **remotes/**. Buffers can then be read using the usual Tracefs files
**trace_pipe** and **trace**.

Declare a remote event
======================
Macros are provided to ease the declaration of remote events, in a similar
fashion to in-kernel events. A declaration must provide an ID, a description of
the event arguments and how to print the event:

.. code-block:: c

	REMOTE_EVENT(foo, EVENT_FOO_ID,
		RE_STRUCT(
			re_field(u64, bar)
		),
		RE_PRINTK("bar=%lld", __entry->bar)
	);

Then those events must be declared in a C file with the following:

.. code-block:: c

	#define REMOTE_EVENT_INCLUDE_FILE foo_events.h
	#include <trace/define_remote_events.h>

This will provide a `struct remote_event remote_event_foo` that can be given to
`trace_remote_register`.

Registered events appear in the remote directory under **events/**.

Simple ring-buffer
==================
A simple implementation for a ring-buffer writer can be found in
kernel/trace/simple_ring_buffer.c.

.. kernel-doc:: include/linux/simple_ring_buffer.h
+1 −0
Original line number Diff line number Diff line
@@ -664,6 +664,7 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode,
	fsnotify_create(d_inode(dentry->d_parent), dentry);
	return tracefs_end_creating(dentry);
}
EXPORT_SYMBOL_GPL(tracefs_create_file);

static struct dentry *__create_dir(const char *name, struct dentry *parent,
				   const struct inode_operations *ops)
+58 −0
Original line number Diff line number Diff line
@@ -251,4 +251,62 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu,
void ring_buffer_map_dup(struct trace_buffer *buffer, int cpu);
int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);

struct ring_buffer_desc {
	int		cpu;
	unsigned int	nr_page_va; /* excludes the meta page */
	unsigned long	meta_va;
	unsigned long	page_va[] __counted_by(nr_page_va);
};

struct trace_buffer_desc {
	int		nr_cpus;
	size_t		struct_len;
	char		__data[]; /* list of ring_buffer_desc */
};

static inline struct ring_buffer_desc *__next_ring_buffer_desc(struct ring_buffer_desc *desc)
{
	size_t len = struct_size(desc, page_va, desc->nr_page_va);

	return (struct ring_buffer_desc *)((void *)desc + len);
}

static inline struct ring_buffer_desc *__first_ring_buffer_desc(struct trace_buffer_desc *desc)
{
	return (struct ring_buffer_desc *)(&desc->__data[0]);
}

static inline size_t trace_buffer_desc_size(size_t buffer_size, unsigned int nr_cpus)
{
	unsigned int nr_pages = max(DIV_ROUND_UP(buffer_size, PAGE_SIZE), 2UL) + 1;
	struct ring_buffer_desc *rbdesc;

	return size_add(offsetof(struct trace_buffer_desc, __data),
			size_mul(nr_cpus, struct_size(rbdesc, page_va, nr_pages)));
}

#define for_each_ring_buffer_desc(__pdesc, __cpu, __trace_pdesc)		\
	for (__pdesc = __first_ring_buffer_desc(__trace_pdesc), __cpu = 0;	\
	     (__cpu) < (__trace_pdesc)->nr_cpus;				\
	     (__cpu)++, __pdesc = __next_ring_buffer_desc(__pdesc))

struct ring_buffer_remote {
	struct trace_buffer_desc	*desc;
	int				(*swap_reader_page)(unsigned int cpu, void *priv);
	int				(*reset)(unsigned int cpu, void *priv);
	void				*priv;
};

int ring_buffer_poll_remote(struct trace_buffer *buffer, int cpu);

struct trace_buffer *
__ring_buffer_alloc_remote(struct ring_buffer_remote *remote,
			   struct lock_class_key *key);

#define ring_buffer_alloc_remote(remote)			\
({								\
	static struct lock_class_key __key;			\
	__ring_buffer_alloc_remote(remote, &__key);		\
})
#endif /* _LINUX_RING_BUFFER_H */
Loading