Commit 2839f393 authored by Frederic Weisbecker's avatar Frederic Weisbecker Committed by Ingo Molnar
Browse files

perf/core: Fix put_ctx() ordering



So there are three situations:

* If perf_event_free_task() has removed all the children from the parent list
  before perf_event_release_kernel() got a chance to even iterate them, then
  it's all good as there is no get_ctx() pending.

* If perf_event_release_kernel() iterates a child event, but it gets freed
  meanwhile by perf_event_free_task() while the mutexes are temporarily
  unlocked, it's all good because while locking again the ctx mutex,
  perf_event_release_kernel() observes TASK_TOMBSTONE.

* But if perf_event_release_kernel() frees the child event before
  perf_event_free_task() got a chance, we may face this scenario:

    perf_event_release_kernel()                                  perf_event_free_task()
    --------------------------                                   ------------------------
    mutex_lock(&event->child_mutex)
    get_ctx(child->ctx)
    mutex_unlock(&event->child_mutex)

    mutex_lock(ctx->mutex)
    mutex_lock(&event->child_mutex)
    perf_remove_from_context(child)
    mutex_unlock(&event->child_mutex)
    mutex_unlock(ctx->mutex)

                                                                 // This lock acquires ctx->refcount == 2
                                                                 // visibility
                                                                 mutex_lock(ctx->mutex)
                                                                 ctx->task = TASK_TOMBSTONE
                                                                 mutex_unlock(ctx->mutex)

                                                                 wait_var_event()
                                                                     // enters prepare_to_wait() since
                                                                     // ctx->refcount == 2
                                                                     // is guaranteed to be seen
                                                                     set_current_state(TASK_INTERRUPTIBLE)
                                                                     smp_mb()
                                                                     if (ctx->refcount != 1)
                                                                         schedule()
    put_ctx()
       // NOT fully ordered! Only RELEASE semantics
       refcount_dec_and_test()
           atomic_fetch_sub_release()
       // So TASK_TOMBSTONE is not guaranteed to be seen
       if (ctx->task == TASK_TOMBSTONE)
           wake_up_var()

Basically it's a broken store buffer:

    perf_event_release_kernel()                                  perf_event_free_task()
    --------------------------                                   ------------------------
    ctx->task = TASK_TOMBSTONE                                   smp_store_release(&ctx->refcount, ctx->refcount - 1)
    smp_mb()
    READ_ONCE(ctx->refcount)                                     READ_ONCE(ctx->task)

So we need a smp_mb__after_atomic() before looking at ctx->task.

Fixes: 59f3aa4a ("perf: Simplify perf_event_free_task() wait")
Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/Z_ZvmEhjkAhplCBE@localhost.localdomain
parent f6938a56
Loading
Loading
Loading
Loading
+4 −3
Original line number Diff line number Diff line
@@ -1271,8 +1271,9 @@ static void put_ctx(struct perf_event_context *ctx)
		if (ctx->task && ctx->task != TASK_TOMBSTONE)
			put_task_struct(ctx->task);
		call_rcu(&ctx->rcu_head, free_ctx);
	} else if (ctx->task == TASK_TOMBSTONE) {
		smp_mb(); /* pairs with wait_var_event() */
	} else {
		smp_mb__after_atomic(); /* pairs with wait_var_event() */
		if (ctx->task == TASK_TOMBSTONE)
			wake_up_var(&ctx->refcount);
	}
}