Commit 01732755 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull probes updates from Masami Hiramatsu:
 "x86 kprobes:

   - Use boolean for some function return instead of 0 and 1

   - Prohibit probing on INT/UD. This prevents user to put kprobe on
     INTn/INT1/INT3/INTO and UD0/UD1/UD2 because these are used for a
     special purpose in the kernel

   - Boost Grp instructions. Because a few percent of kernel
     instructions are Grp 2/3/4/5 and those are safe to be executed
     without ip register fixup, allow those to be boosted (direct
     execution on the trampoline buffer with a JMP)

  tracing:

   - Add function argument access from return events (kretprobe and
     fprobe). This allows user to compare how a data structure field is
     changed after executing a function. With BTF, return event also
     accepts function argument access by name.

   - Fix a wrong comment (using "Kretprobe" in fprobe)

   - Cleanup a big probe argument parser function into three parts, type
     parser, post-processing function, and main parser

   - Cleanup to set nr_args field when initializing trace_probe instead
     of counting up it while parsing

   - Cleanup a redundant #else block from tracefs/README source code

   - Update selftests to check entry argument access from return probes

   - Documentation update about entry argument access from return
     probes"

* tag 'probes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  Documentation: tracing: Add entry argument access at function exit
  selftests/ftrace: Add test cases for entry args at function exit
  tracing/probes: Support $argN in return probe (kprobe and fprobe)
  tracing: Remove redundant #else block for BTF args from README
  tracing/probes: cleanup: Set trace_probe::nr_args at trace_probe_init
  tracing/probes: Cleanup probe argument parser
  tracing/fprobe-event: cleanup: Fix a wrong comment in fprobe event
  x86/kprobes: Boost more instructions from grp2/3/4/5
  x86/kprobes: Prohibit kprobing on INT and UD
  x86/kprobes: Refactor can_{probe,boost} return type to bool
parents c0a614e8 e8c32f24
Loading
Loading
Loading
Loading
+31 −0
Original line number Diff line number Diff line
@@ -70,6 +70,14 @@ Synopsis of fprobe-events

For the details of TYPE, see :ref:`kprobetrace documentation <kprobetrace_types>`.

Function arguments at exit
--------------------------
Function arguments can be accessed at exit probe using $arg<N> fetcharg. This
is useful to record the function parameter and return value at once, and
trace the difference of structure fields (for debuging a function whether it
correctly updates the given data structure or not)
See the :ref:`sample<fprobetrace_exit_args_sample>` below for how it works.

BTF arguments
-------------
BTF (BPF Type Format) argument allows user to trace function and tracepoint
@@ -218,3 +226,26 @@ traceprobe event, you can trace that field as below.
           <idle>-0       [000] d..3.  5606.690317: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000
      kworker/0:1-14      [000] d..3.  5606.690339: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
           <idle>-0       [000] d..3.  5606.692368: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000

.. _fprobetrace_exit_args_sample:

The return probe allows us to access the results of some functions, which returns
the error code and its results are passed via function parameter, such as an
structure-initialization function.

For example, vfs_open() will link the file structure to the inode and update
mode. You can trace that changes with return probe.
::

 # echo 'f vfs_open mode=file->f_mode:x32 inode=file->f_inode:x64' >> dynamic_events
 # echo 'f vfs_open%%return mode=file->f_mode:x32 inode=file->f_inode:x64' >> dynamic_events
 # echo 1 > events/fprobes/enable
 # cat trace
              sh-131     [006] ...1.  1945.714346: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x2 inode=0x0
              sh-131     [006] ...1.  1945.714358: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0x4d801e inode=0xffff888008470168
             cat-143     [007] ...1.  1945.717949: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x1 inode=0x0
             cat-143     [007] ...1.  1945.717956: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0x4a801d inode=0xffff888005f78d28
             cat-143     [007] ...1.  1945.720616: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x1 inode=0x0
             cat-143     [007] ...1.  1945.728263: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0xa800d inode=0xffff888004ada8d8

You can see the `file::f_mode` and `file::f_inode` are upated in `vfs_open()`.
+9 −0
Original line number Diff line number Diff line
@@ -70,6 +70,15 @@ Synopsis of kprobe_events
  (\*3) this is useful for fetching a field of data structures.
  (\*4) "u" means user-space dereference. See :ref:`user_mem_access`.

Function arguments at kretprobe
-------------------------------
Function arguments can be accessed at kretprobe using $arg<N> fetcharg. This
is useful to record the function parameter and return value at once, and
trace the difference of structure fields (for debuging a function whether it
correctly updates the given data structure or not).
See the :ref:`sample<fprobetrace_exit_args_sample>` in fprobe event for how
it works.

.. _kprobetrace_types:

Types
+1 −1
Original line number Diff line number Diff line
@@ -78,7 +78,7 @@
#endif

/* Ensure if the instruction can be boostable */
extern int can_boost(struct insn *insn, void *orig_addr);
extern bool can_boost(struct insn *insn, void *orig_addr);
/* Recover instruction if given address is probed */
extern unsigned long recover_probed_instruction(kprobe_opcode_t *buf,
					 unsigned long addr);
+68 −30
Original line number Diff line number Diff line
@@ -137,14 +137,14 @@ NOKPROBE_SYMBOL(synthesize_relcall);
 * Returns non-zero if INSN is boostable.
 * RIP relative instructions are adjusted at copying time in 64 bits mode
 */
int can_boost(struct insn *insn, void *addr)
bool can_boost(struct insn *insn, void *addr)
{
	kprobe_opcode_t opcode;
	insn_byte_t prefix;
	int i;

	if (search_exception_tables((unsigned long)addr))
		return 0;	/* Page fault may occur on this address. */
		return false;	/* Page fault may occur on this address. */

	/* 2nd-byte opcode */
	if (insn->opcode.nbytes == 2)
@@ -152,7 +152,7 @@ int can_boost(struct insn *insn, void *addr)
				(unsigned long *)twobyte_is_boostable);

	if (insn->opcode.nbytes != 1)
		return 0;
		return false;

	for_each_insn_prefix(insn, i, prefix) {
		insn_attr_t attr;
@@ -160,7 +160,7 @@ int can_boost(struct insn *insn, void *addr)
		attr = inat_get_opcode_attribute(prefix);
		/* Can't boost Address-size override prefix and CS override prefix */
		if (prefix == 0x2e || inat_is_address_size_prefix(attr))
			return 0;
			return false;
	}

	opcode = insn->opcode.bytes[0];
@@ -169,24 +169,35 @@ int can_boost(struct insn *insn, void *addr)
	case 0x62:		/* bound */
	case 0x70 ... 0x7f:	/* Conditional jumps */
	case 0x9a:		/* Call far */
	case 0xc0 ... 0xc1:	/* Grp2 */
	case 0xcc ... 0xce:	/* software exceptions */
	case 0xd0 ... 0xd3:	/* Grp2 */
	case 0xd6:		/* (UD) */
	case 0xd8 ... 0xdf:	/* ESC */
	case 0xe0 ... 0xe3:	/* LOOP*, JCXZ */
	case 0xe8 ... 0xe9:	/* near Call, JMP */
	case 0xeb:		/* Short JMP */
	case 0xf0 ... 0xf4:	/* LOCK/REP, HLT */
		/* ... are not boostable */
		return false;
	case 0xc0 ... 0xc1:	/* Grp2 */
	case 0xd0 ... 0xd3:	/* Grp2 */
		/*
		 * AMD uses nnn == 110 as SHL/SAL, but Intel makes it reserved.
		 */
		return X86_MODRM_REG(insn->modrm.bytes[0]) != 0b110;
	case 0xf6 ... 0xf7:	/* Grp3 */
		/* AMD uses nnn == 001 as TEST, but Intel makes it reserved. */
		return X86_MODRM_REG(insn->modrm.bytes[0]) != 0b001;
	case 0xfe:		/* Grp4 */
		/* ... are not boostable */
		return 0;
		/* Only INC and DEC are boostable */
		return X86_MODRM_REG(insn->modrm.bytes[0]) == 0b000 ||
		       X86_MODRM_REG(insn->modrm.bytes[0]) == 0b001;
	case 0xff:		/* Grp5 */
		/* Only indirect jmp is boostable */
		return X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
		/* Only INC, DEC, and indirect JMP are boostable */
		return X86_MODRM_REG(insn->modrm.bytes[0]) == 0b000 ||
		       X86_MODRM_REG(insn->modrm.bytes[0]) == 0b001 ||
		       X86_MODRM_REG(insn->modrm.bytes[0]) == 0b100;
	default:
		return 1;
		return true;
	}
}

@@ -252,21 +263,40 @@ unsigned long recover_probed_instruction(kprobe_opcode_t *buf, unsigned long add
	return __recover_probed_insn(buf, addr);
}

/* Check if paddr is at an instruction boundary */
static int can_probe(unsigned long paddr)
/* Check if insn is INT or UD */
static inline bool is_exception_insn(struct insn *insn)
{
	/* UD uses 0f escape */
	if (insn->opcode.bytes[0] == 0x0f) {
		/* UD0 / UD1 / UD2 */
		return insn->opcode.bytes[1] == 0xff ||
		       insn->opcode.bytes[1] == 0xb9 ||
		       insn->opcode.bytes[1] == 0x0b;
	}

	/* INT3 / INT n / INTO / INT1 */
	return insn->opcode.bytes[0] == 0xcc ||
	       insn->opcode.bytes[0] == 0xcd ||
	       insn->opcode.bytes[0] == 0xce ||
	       insn->opcode.bytes[0] == 0xf1;
}

/*
 * Check if paddr is at an instruction boundary and that instruction can
 * be probed
 */
static bool can_probe(unsigned long paddr)
{
	unsigned long addr, __addr, offset = 0;
	struct insn insn;
	kprobe_opcode_t buf[MAX_INSN_SIZE];

	if (!kallsyms_lookup_size_offset(paddr, NULL, &offset))
		return 0;
		return false;

	/* Decode instructions */
	addr = paddr - offset;
	while (addr < paddr) {
		int ret;

		/*
		 * Check if the instruction has been modified by another
		 * kprobe, in which case we replace the breakpoint by the
@@ -277,11 +307,10 @@ static int can_probe(unsigned long paddr)
		 */
		__addr = recover_probed_instruction(buf, addr);
		if (!__addr)
			return 0;
			return false;

		ret = insn_decode_kernel(&insn, (void *)__addr);
		if (ret < 0)
			return 0;
		if (insn_decode_kernel(&insn, (void *)__addr) < 0)
			return false;

#ifdef CONFIG_KGDB
		/*
@@ -290,10 +319,26 @@ static int can_probe(unsigned long paddr)
		 */
		if (insn.opcode.bytes[0] == INT3_INSN_OPCODE &&
		    kgdb_has_hit_break(addr))
			return 0;
			return false;
#endif
		addr += insn.length;
	}

	/* Check if paddr is at an instruction boundary */
	if (addr != paddr)
		return false;

	__addr = recover_probed_instruction(buf, addr);
	if (!__addr)
		return false;

	if (insn_decode_kernel(&insn, (void *)__addr) < 0)
		return false;

	/* INT and UD are special and should not be kprobed */
	if (is_exception_insn(&insn))
		return false;

	if (IS_ENABLED(CONFIG_CFI_CLANG)) {
		/*
		 * The compiler generates the following instruction sequence
@@ -308,13 +353,6 @@ static int can_probe(unsigned long paddr)
		 * Also, these movl and addl are used for showing expected
		 * type. So those must not be touched.
		 */
		__addr = recover_probed_instruction(buf, addr);
		if (!__addr)
			return 0;

		if (insn_decode_kernel(&insn, (void *)__addr) < 0)
			return 0;

		if (insn.opcode.value == 0xBA)
			offset = 12;
		else if (insn.opcode.value == 0x3)
@@ -324,11 +362,11 @@ static int can_probe(unsigned long paddr)

		/* This movl/addl is used for decoding CFI. */
		if (is_cfi_trap(addr + offset))
			return 0;
			return false;
	}

out:
	return (addr == paddr);
	return true;
}

/* If x86 supports IBT (ENDBR) it must be skipped. */
+2 −3
Original line number Diff line number Diff line
@@ -5747,16 +5747,15 @@ static const char readme_msg[] =
	"\t     args: <name>=fetcharg[:type]\n"
	"\t fetcharg: (%<register>|$<efield>), @<address>, @<symbol>[+|-<offset>],\n"
#ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
	"\t           $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
	"\t           <argname>[->field[->field|.field...]],\n"
#else
	"\t           $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
#endif
#else
	"\t           $stack<index>, $stack, $retval, $comm,\n"
#endif
	"\t           +|-[u]<offset>(<fetcharg>), \\imm-value, \\\"imm-string\"\n"
	"\t     kernel return probes support: $retval, $arg<N>, $comm\n"
	"\t     type: s8/16/32/64, u8/16/32/64, x8/16/32/64, char, string, symbol,\n"
	"\t           b<bit-width>@<bit-offset>/<container-size>, ustring,\n"
	"\t           symstr, <type>\\[<array-size>\\]\n"
Loading