Commit 2516fde1 authored by Vinicius Costa Gomes's avatar Vinicius Costa Gomes Committed by John Johansen
Browse files

apparmor: Optimize retrieving current task secid

When running will-it-scale[1] open2_process testcase, in a system with a
large number of cores, a bottleneck in retrieving the current task
secid was detected:

27.73% ima_file_check;do_open (inlined);path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_x64 (inlined);do_syscall_64;entry_SYSCALL_64_after_hwframe (inlined);__libc_open64 (inlined)
    27.72%     0.01%  [kernel.vmlinux]      [k] security_current_getsecid_subj             -      -
27.71% security_current_getsecid_subj;ima_file_check;do_open (inlined);path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_x64 (inlined);do_syscall_64;entry_SYSCALL_64_after_hwframe (inlined);__libc_open64 (inlined)
    27.71%    27.68%  [kernel.vmlinux]      [k] apparmor_current_getsecid_subj             -      -
19.94% __refcount_add (inlined);__refcount_inc (inlined);refcount_inc (inlined);kref_get (inlined);aa_get_label (inlined);aa_get_label (inlined);aa_get_current_label (inlined);apparmor_current_getsecid_subj;security_current_getsecid_subj;ima_file_check;do_open (inlined);path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_x64 (inlined);do_syscall_64;entry_SYSCALL_64_after_hwframe (inlined);__libc_open64 (inlined)
7.72% __refcount_sub_and_test (inlined);__refcount_dec_and_test (inlined);refcount_dec_and_test (inlined);kref_put (inlined);aa_put_label (inlined);aa_put_label (inlined);apparmor_current_getsecid_subj;security_current_getsecid_subj;ima_file_check;do_open (inlined);path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_x64 (inlined);do_syscall_64;entry_SYSCALL_64_after_hwframe (inlined);__libc_open64 (inlined)

A large amount of time was spent in the refcount.

The most common case is that the current task label is available, and
no need to take references for that one. That is exactly what the
critical section helpers do, make use of them.

New perf output:

39.12% vfs_open;path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_64;entry_SYSCALL_64_after_hwframe;__libc_open64 (inlined)
    39.07%     0.13%  [kernel.vmlinux]          [k] do_dentry_open                                                               -      -
39.05% do_dentry_open;vfs_open;path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_64;entry_SYSCALL_64_after_hwframe;__libc_open64 (inlined)
    38.71%     0.01%  [kernel.vmlinux]          [k] security_file_open                                                           -      -
38.70% security_file_open;do_dentry_open;vfs_open;path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_64;entry_SYSCALL_64_after_hwframe;__libc_open64 (inlined)
    38.65%    38.60%  [kernel.vmlinux]          [k] apparmor_file_open                                                           -      -
38.65% apparmor_file_open;security_file_open;do_dentry_open;vfs_open;path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_64;entry_SYSCALL_64_after_hwframe;__libc_open64 (inlined)

The result is a throughput improvement of around 20% across the board
on the open2 testcase. On more realistic workloads the impact should
be much less.

[1] https://github.com/antonblanchard/will-it-scale



Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: default avatarJohn Johansen <john.johansen@canonical.com>
parent fee5304a
Loading
Loading
Loading
Loading
+2 −2
Original line number Diff line number Diff line
@@ -766,9 +766,9 @@ static void apparmor_bprm_committed_creds(struct linux_binprm *bprm)

static void apparmor_current_getsecid_subj(u32 *secid)
{
	struct aa_label *label = aa_get_current_label();
	struct aa_label *label = __begin_current_label_crit_section();
	*secid = label->secid;
	aa_put_label(label);
	__end_current_label_crit_section(label);
}

static void apparmor_task_getsecid_obj(struct task_struct *p, u32 *secid)