Commit 7a96a84b authored by Anna-Maria Behnsen's avatar Anna-Maria Behnsen Committed by Thomas Gleixner
Browse files

timers/migration: Return early on deactivation



Commit 4b6f4c5a ("timer/migration: Remove buggy early return on
deactivation") removed the logic to return early in tmigr_update_events()
on deactivation. With this the problem with a not properly updated first
global event in a hierarchy containing only a single group was fixed.

But when having a look at this code path with a hierarchy with more than a
single level, now unnecessary work is done (example is partially copied
from the message of the commit mentioned above):

                            [GRP1:0]
                         migrator = GRP0:0
                         active   = GRP0:0
                         nextevt  = T0:0i, T0:1
                         /              \
              [GRP0:0]                  [GRP0:1]
           migrator = 0              migrator = NONE
           active   = 0              active   = NONE
           nextevt  = T0i, T1        nextevt  = T2
           /         \                /         \
          0 (T0i)     1 (T1)         2 (T2)      3
      active         idle            idle       idle

0) CPU 0 is active thus its event is ignored (the letter 'i') and so are
upper levels' events. CPU 1 is idle and has the timer T1 enqueued.
CPU 2 also has a timer. The expiry order is T0 (ignored) < T1 < T2

                            [GRP1:0]
                         migrator = GRP0:0
                         active   = GRP0:0
                         nextevt  = T0:0i, T0:1
                         /              \
              [GRP0:0]                  [GRP0:1]
           migrator = NONE           migrator = NONE
           active   = NONE           active   = NONE
           nextevt  = T1             nextevt  = T2
           /         \                /         \
          0 (T0i)     1 (T1)         2 (T2)      3
        idle         idle            idle         idle

1) CPU 0 goes idle without global event queued. Therefore KTIME_MAX is
pushed as its next expiry and its own event kept as "ignore". Without this
early return the following steps happen in tmigr_update_events() when
child = null and group = GRP0:0 :

  lock(GRP0:0->lock);
  timerqueue_del(GRP0:0, T0i);
  unlock(GRP0:0->lock);


                            [GRP1:0]
                         migrator = NONE
                         active   = NONE
                         nextevt  = T0:0, T0:1
                         /              \
              [GRP0:0]                  [GRP0:1]
           migrator = NONE           migrator = NONE
           active   = NONE           active   = NONE
           nextevt  = T1             nextevt  = T2
           /         \                /         \
          0 (T0i)     1 (T1)         2 (T2)      3
        idle         idle            idle         idle

2) The change now propagates up to the top. Then tmigr_update_events()
updates the group event of GRP0:0 and executes the following steps
(child = GRP0:0 and group = GRP0:0):

  lock(GRP0:0->lock);
  lock(GRP1:0->lock);
  evt = tmigr_next_groupevt(GRP0:0); -> this removes the ignored events
					in GRP0:0
  ... update GRP1:0 group event and timerqueue ...
  unlock(GRP1:0->lock);
  unlock(GRP0:0->lock);

So the dance in 1) with locking the GRP0:0->lock and removing the T0i from
the timerqueue is redundand as this is done nevertheless in 2) when
tmigr_next_groupevt(GRP0:0) is executed.

Revert commit 4b6f4c5a ("timer/migration: Remove buggy early return on
deactivation") and add a condition into return path to skip the return
only, when hierarchy contains a single group. Adapt comments accordingly.

Fixes: 4b6f4c5a ("timer/migration: Remove buggy early return on deactivation")
Signed-off-by: default avatarAnna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/87cyr49on2.fsf@somnus
parent 61f7fdf8
Loading
Loading
Loading
Loading
+27 −0
Original line number Diff line number Diff line
@@ -751,6 +751,33 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,

		first_childevt = evt = data->evt;

		/*
		 * Walking the hierarchy is required in any case when a
		 * remote expiry was done before. This ensures to not lose
		 * already queued events in non active groups (see section
		 * "Required event and timerqueue update after a remote
		 * expiry" in the documentation at the top).
		 *
		 * The two call sites which are executed without a remote expiry
		 * before, are not prevented from propagating changes through
		 * the hierarchy by the return:
		 *  - When entering this path by tmigr_new_timer(), @evt->ignore
		 *    is never set.
		 *  - tmigr_inactive_up() takes care of the propagation by
		 *    itself and ignores the return value. But an immediate
		 *    return is possible if there is a parent, sparing group
		 *    locking at this level, because the upper walking call to
		 *    the parent will take care about removing this event from
		 *    within the group and update next_expiry accordingly.
		 *
		 * However if there is no parent, ie: the hierarchy has only a
		 * single level so @group is the top level group, make sure the
		 * first event information of the group is updated properly and
		 * also handled properly, so skip this fast return path.
		 */
		if (evt->ignore && !remote && group->parent)
			return true;

		raw_spin_lock(&group->lock);

		childstate.state = 0;