Commit 2dd0ebcd authored by Ravikiran G Thirumalai's avatar Ravikiran G Thirumalai Committed by Linus Torvalds
Browse files

[PATCH] Avoid taking global tasklist_lock for single threadedprocess at getrusage()



Avoid taking the global tasklist_lock when possible, if a process is single
threaded during getrusage().  Any avoidance of tasklist_lock is good for
NUMA boxes (and possibly for large SMPs).  Thanks to Oleg Nesterov for
review and suggestions.

Signed-off-by: default avatarNippun Goel <nippung@calsoftinc.com>
Signed-off-by: default avatarRavikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: default avatarShai Fultheim <shai@scalex86.org>
Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
parent 0c9e63fd
Loading
Loading
Loading
Loading
+34 −8
Original line number Diff line number Diff line
@@ -1677,9 +1677,6 @@ asmlinkage long sys_setrlimit(unsigned int resource, struct rlimit __user *rlim)
 * a lot simpler!  (Which we're not doing right now because we're not
 * measuring them yet).
 *
 * This expects to be called with tasklist_lock read-locked or better,
 * and the siglock not locked.  It may momentarily take the siglock.
 *
 * When sampling multiple threads for RUSAGE_SELF, under SMP we might have
 * races with threads incrementing their own counters.  But since word
 * reads are atomic, we either get new values or old values and we don't
@@ -1687,6 +1684,25 @@ asmlinkage long sys_setrlimit(unsigned int resource, struct rlimit __user *rlim)
 * the c* fields from p->signal from races with exit.c updating those
 * fields when reaping, so a sample either gets all the additions of a
 * given child after it's reaped, or none so this sample is before reaping.
 *
 * tasklist_lock locking optimisation:
 * If we are current and single threaded, we do not need to take the tasklist
 * lock or the siglock.  No one else can take our signal_struct away,
 * no one else can reap the children to update signal->c* counters, and
 * no one else can race with the signal-> fields.
 * If we do not take the tasklist_lock, the signal-> fields could be read
 * out of order while another thread was just exiting. So we place a
 * read memory barrier when we avoid the lock.  On the writer side,
 * write memory barrier is implied in  __exit_signal as __exit_signal releases
 * the siglock spinlock after updating the signal-> fields.
 *
 * We don't really need the siglock when we access the non c* fields
 * of the signal_struct (for RUSAGE_SELF) even in multithreaded
 * case, since we take the tasklist lock for read and the non c* signal->
 * fields are updated only in __exit_signal, which is called with
 * tasklist_lock taken for write, hence these two threads cannot execute
 * concurrently.
 *
 */

static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
@@ -1694,13 +1710,23 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
	struct task_struct *t;
	unsigned long flags;
	cputime_t utime, stime;
	int need_lock = 0;

	memset((char *) r, 0, sizeof *r);
	utime = stime = cputime_zero;

	if (unlikely(!p->signal))
		return;
	if (p != current || !thread_group_empty(p))
		need_lock = 1;

	utime = stime = cputime_zero;
	if (need_lock) {
		read_lock(&tasklist_lock);
		if (unlikely(!p->signal)) {
			read_unlock(&tasklist_lock);
			return;
		}
	} else
		/* See locking comments above */
		smp_rmb();

	switch (who) {
		case RUSAGE_BOTH:
@@ -1740,6 +1766,8 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
			BUG();
	}

	if (need_lock)
		read_unlock(&tasklist_lock);
	cputime_to_timeval(utime, &r->ru_utime);
	cputime_to_timeval(stime, &r->ru_stime);
}
@@ -1747,9 +1775,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
int getrusage(struct task_struct *p, int who, struct rusage __user *ru)
{
	struct rusage r;
	read_lock(&tasklist_lock);
	k_getrusage(p, who, &r);
	read_unlock(&tasklist_lock);
	return copy_to_user(ru, &r, sizeof(r)) ? -EFAULT : 0;
}