Commit ad584d73 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull tracing updates from Steven Rostedt:
 "Main user visible change:

   - User events can now have "multi formats"

     The current user events have a single format. If another event is
     created with a different format, it will fail to be created. That
     is, once an event name is used, it cannot be used again with a
     different format. This can cause issues if a library is using an
     event and updates its format. An application using the older format
     will prevent an application using the new library from registering
     its event.

     A task could also DOS another application if it knows the event
     names, and it creates events with different formats.

     The multi-format event is in a different name space from the single
     format. Both the event name and its format are the unique
     identifier. This will allow two different applications to use the
     same user event name but with different payloads.

   - Added support to have ftrace_dump_on_oops dump out instances and
     not just the main top level tracing buffer.

  Other changes:

   - Add eventfs_root_inode

     Only the root inode has a dentry that is static (never goes away)
     and stores it upon creation. There's no reason that the thousands
     of other eventfs inodes should have a pointer that never gets set
     in its descriptor. Create a eventfs_root_inode desciptor that has a
     eventfs_inode descriptor and a dentry pointer, and only the root
     inode will use this.

   - Added WARN_ON()s in eventfs

     There's some conditionals remaining in eventfs that should never be
     hit, but instead of removing them, add WARN_ON() around them to
     make sure that they are never hit.

   - Have saved_cmdlines allocation also include the map_cmdline_to_pid
     array

     The saved_cmdlines structure allocates a large amount of data to
     hold its mappings. Within it, it has three arrays. Two are already
     apart of it: map_pid_to_cmdline[] and saved_cmdlines[]. More memory
     can be saved by also including the map_cmdline_to_pid[] array as
     well.

   - Restructure __string() and __assign_str() macros used in
     TRACE_EVENT()

     Dynamic strings in TRACE_EVENT() are declared with:

         __string(name, source)

     And assigned with:

        __assign_str(name, source)

     In the tracepoint callback of the event, the __string() is used to
     get the size needed to allocate on the ring buffer and
     __assign_str() is used to copy the string into the ring buffer.
     There's a helper structure that is created in the TRACE_EVENT()
     macro logic that will hold the string length and its position in
     the ring buffer which is created by __string().

     There are several trace events that have a function to create the
     string to save. This function is executed twice. Once for
     __string() and again for __assign_str(). There's no reason for
     this. The helper structure could also save the string it used in
     __string() and simply copy that into __assign_str() (it also
     already has its length).

     By using the structure to store the source string for the
     assignment, it means that the second argument to __assign_str() is
     no longer needed.

     It will be removed in the next merge window, but for now add a
     warning if the source string given to __string() is different than
     the source string given to __assign_str(), as the source to
     __assign_str() isn't even used and will be going away.

   - Added checks to make sure that the source of __string() is also the
     source of __assign_str() so that it can be safely removed in the
     next merge window.

     Included fixes that the above check found.

   - Other minor clean ups and fixes"

* tag 'trace-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (34 commits)
  tracing: Add __string_src() helper to help compilers not to get confused
  tracing: Use strcmp() in __assign_str() WARN_ON() check
  tracepoints: Use WARN() and not WARN_ON() for warnings
  tracing: Use div64_u64() instead of do_div()
  tracing: Support to dump instance traces by ftrace_dump_on_oops
  tracing: Remove second parameter to __assign_rel_str()
  tracing: Add warning if string in __assign_str() does not match __string()
  tracing: Add __string_len() example
  tracing: Remove __assign_str_len()
  ftrace: Fix most kernel-doc warnings
  tracing: Decrement the snapshot if the snapshot trigger fails to register
  tracing: Fix snapshot counter going between two tracers that use it
  tracing: Use EVENT_NULL_STR macro instead of open coding "(null)"
  tracing: Use ? : shortcut in trace macros
  tracing: Do not calculate strlen() twice for __string() fields
  tracing: Rework __assign_str() and __string() to not duplicate getting the string
  cxl/trace: Properly initialize cxl_poison region name
  net: hns3: tracing: fix hclgevf trace event strings
  drm/i915: Add missing ; to __assign_str() macros in tracepoint code
  NFSD: Fix nfsd_clid_class use of __string_len() macro
  ...
parents 2cb5c868 7604256c
Loading
Loading
Loading
Loading
+21 −5
Original line number Diff line number Diff line
@@ -1572,12 +1572,28 @@
			The above will cause the "foo" tracing instance to trigger
			a snapshot at the end of boot up.

	ftrace_dump_on_oops[=orig_cpu]
	ftrace_dump_on_oops[=2(orig_cpu) | =<instance>][,<instance> |
			  ,<instance>=2(orig_cpu)]
			[FTRACE] will dump the trace buffers on oops.
			If no parameter is passed, ftrace will dump
			buffers of all CPUs, but if you pass orig_cpu, it will
			dump only the buffer of the CPU that triggered the
			oops.
			If no parameter is passed, ftrace will dump global
			buffers of all CPUs, if you pass 2 or orig_cpu, it
			will dump only the buffer of the CPU that triggered
			the oops, or the specific instance will be dumped if
			its name is passed. Multiple instance dump is also
			supported, and instances are separated by commas. Each
			instance supports only dump on CPU that triggered the
			oops by passing 2 or orig_cpu to it.

			ftrace_dump_on_oops=foo=orig_cpu

			The above will dump only the buffer of "foo" instance
			on CPU that triggered the oops.

			ftrace_dump_on_oops,foo,bar=orig_cpu

			The above will dump global buffer on all CPUs, the
			buffer of "foo" instance on all CPUs and the buffer
			of "bar" instance on CPU that triggered the oops.

	ftrace_filter=[function-list]
			[FTRACE] Limit the functions traced by the function
+24 −6
Original line number Diff line number Diff line
@@ -296,12 +296,30 @@ kernel panic). This will output the contents of the ftrace buffers to
the console.  This is very useful for capturing traces that lead to
crashes and outputting them to a serial console.

= ===================================================
======================= ===========================================
0                       Disabled (default).
1                       Dump buffers of all CPUs.
2 Dump the buffer of the CPU that triggered the oops.
= ===================================================
2(orig_cpu)             Dump the buffer of the CPU that triggered the
                        oops.
<instance>              Dump the specific instance buffer on all CPUs.
<instance>=2(orig_cpu)  Dump the specific instance buffer on the CPU
                        that triggered the oops.
======================= ===========================================

Multiple instance dump is also supported, and instances are separated
by commas. If global buffer also needs to be dumped, please specify
the dump mode (1/2/orig_cpu) first for global buffer.

So for example to dump "foo" and "bar" instance buffer on all CPUs,
user can::

  echo "foo,bar" > /proc/sys/kernel/ftrace_dump_on_oops

To dump global buffer and "foo" instance buffer on all
CPUs along with the "bar" instance buffer on CPU that triggered the
oops, user can::

  echo "1,foo,bar=2" > /proc/sys/kernel/ftrace_dump_on_oops

ftrace_enabled, stack_tracer_enabled
====================================
+26 −1
Original line number Diff line number Diff line
@@ -92,6 +92,24 @@ The following flags are currently supported.
  process closes or unregisters the event. Requires CAP_PERFMON otherwise
  -EPERM is returned.

+ USER_EVENT_REG_MULTI_FORMAT: The event can contain multiple formats. This
  allows programs to prevent themselves from being blocked when their event
  format changes and they wish to use the same name. When this flag is used the
  tracepoint name will be in the new format of "name.unique_id" vs the older
  format of "name". A tracepoint will be created for each unique pair of name
  and format. This means if several processes use the same name and format,
  they will use the same tracepoint. If yet another process uses the same name,
  but a different format than the other processes, it will use a different
  tracepoint with a new unique id. Recording programs need to scan tracefs for
  the various different formats of the event name they are interested in
  recording. The system name of the tracepoint will also use "user_events_multi"
  instead of "user_events". This prevents single-format event names conflicting
  with any multi-format event names within tracefs. The unique_id is output as
  a hex string. Recording programs should ensure the tracepoint name starts with
  the event name they registered and has a suffix that starts with . and only
  has hex characters. For example to find all versions of the event "test" you
  can use the regex "^test\.[0-9a-fA-F]+$".

Upon successful registration the following is set.

+ write_index: The index to use for this file descriptor that represents this
@@ -106,6 +124,9 @@ or perf record -e user_events:[name] when attaching/recording.
**NOTE:** The event subsystem name by default is "user_events". Callers should
not assume it will always be "user_events". Operators reserve the right in the
future to change the subsystem name per-process to accommodate event isolation.
In addition if the USER_EVENT_REG_MULTI_FORMAT flag is used the tracepoint name
will have a unique id appended to it and the system name will be
"user_events_multi" as described above.

Command Format
^^^^^^^^^^^^^^
@@ -156,7 +177,11 @@ to request deletes than the one used for registration due to this.
to the event. If programs do not want auto-delete, they must use the
USER_EVENT_REG_PERSIST flag when registering the event. Once that flag is used
the event exists until DIAG_IOCSDEL is invoked. Both register and delete of an
event that persists requires CAP_PERFMON, otherwise -EPERM is returned.
event that persists requires CAP_PERFMON, otherwise -EPERM is returned. When
there are multiple formats of the same event name, all events with the same
name will be attempted to be deleted. If only a specific version is wanted to
be deleted then the /sys/kernel/tracing/dynamic_events file should be used for
that specific format of the event.

Unregistering
-------------
+7 −7
Original line number Diff line number Diff line
@@ -646,18 +646,18 @@ u64 cxl_trace_hpa(struct cxl_region *cxlr, struct cxl_memdev *memdev, u64 dpa);

TRACE_EVENT(cxl_poison,

	TP_PROTO(struct cxl_memdev *cxlmd, struct cxl_region *region,
	TP_PROTO(struct cxl_memdev *cxlmd, struct cxl_region *cxlr,
		 const struct cxl_poison_record *record, u8 flags,
		 __le64 overflow_ts, enum cxl_poison_trace_type trace_type),

	TP_ARGS(cxlmd, region, record, flags, overflow_ts, trace_type),
	TP_ARGS(cxlmd, cxlr, record, flags, overflow_ts, trace_type),

	TP_STRUCT__entry(
		__string(memdev, dev_name(&cxlmd->dev))
		__string(host, dev_name(cxlmd->dev.parent))
		__field(u64, serial)
		__field(u8, trace_type)
		__string(region, region)
		__string(region, cxlr ? dev_name(&cxlr->dev) : "")
		__field(u64, overflow_ts)
		__field(u64, hpa)
		__field(u64, dpa)
@@ -677,10 +677,10 @@ TRACE_EVENT(cxl_poison,
		__entry->source = cxl_poison_record_source(record);
		__entry->trace_type = trace_type;
		__entry->flags = flags;
		if (region) {
			__assign_str(region, dev_name(&region->dev));
			memcpy(__entry->uuid, &region->params.uuid, 16);
			__entry->hpa = cxl_trace_hpa(region, cxlmd,
		if (cxlr) {
			__assign_str(region, dev_name(&cxlr->dev));
			memcpy(__entry->uuid, &cxlr->params.uuid, 16);
			__entry->hpa = cxl_trace_hpa(cxlr, cxlmd,
						     __entry->dpa);
		} else {
			__assign_str(region, "");
+3 −3
Original line number Diff line number Diff line
@@ -411,7 +411,7 @@ TRACE_EVENT(intel_fbc_activate,
			   struct intel_crtc *crtc = intel_crtc_for_pipe(to_i915(plane->base.dev),
									 plane->pipe);
			   __assign_str(dev, __dev_name_kms(plane));
			   __assign_str(name, plane->base.name)
			   __assign_str(name, plane->base.name);
			   __entry->pipe = crtc->pipe;
			   __entry->frame = intel_crtc_get_vblank_counter(crtc);
			   __entry->scanline = intel_get_crtc_scanline(crtc);
@@ -438,7 +438,7 @@ TRACE_EVENT(intel_fbc_deactivate,
			   struct intel_crtc *crtc = intel_crtc_for_pipe(to_i915(plane->base.dev),
									 plane->pipe);
			   __assign_str(dev, __dev_name_kms(plane));
			   __assign_str(name, plane->base.name)
			   __assign_str(name, plane->base.name);
			   __entry->pipe = crtc->pipe;
			   __entry->frame = intel_crtc_get_vblank_counter(crtc);
			   __entry->scanline = intel_get_crtc_scanline(crtc);
@@ -465,7 +465,7 @@ TRACE_EVENT(intel_fbc_nuke,
			   struct intel_crtc *crtc = intel_crtc_for_pipe(to_i915(plane->base.dev),
									 plane->pipe);
			   __assign_str(dev, __dev_name_kms(plane));
			   __assign_str(name, plane->base.name)
			   __assign_str(name, plane->base.name);
			   __entry->pipe = crtc->pipe;
			   __entry->frame = intel_crtc_get_vblank_counter(crtc);
			   __entry->scanline = intel_get_crtc_scanline(crtc);
Loading