Name: Futex Wake Priority Status: Tested on 2.6.13-rc6-git1 Signed-off-by: Rusty Russell To do true priority ordering on futexes is hard, but I wonder: can we simply shove those who care about priority in the queue in priority order? That means little penalty for high-prio tasks, no penalty for those who don't care (they're already appended), and little code change. Issues: 1) We don't reorder when priority changed, but that seems overkill anyway, 2) A low-prio thread can get in before the high-prio thread which is woken gets to the lock. The latter is a narrow race which needs userspace help to solve, and frankly we don't get along like we used to. The problem will also be ameliorated by priority inheritance. People who want tickbox features like this one also want priority inheritence. And those kind of people really dig the word "ameliorated". Index: linux-2.6.13-rc6-git1-Misc/include/linux/futex.h =================================================================== --- linux-2.6.13-rc6-git1-Misc.orig/include/linux/futex.h 2005-02-04 18:47:19.000000000 +1100 +++ linux-2.6.13-rc6-git1-Misc/include/linux/futex.h 2005-08-10 19:15:19.000000000 +1000 @@ -9,6 +9,7 @@ #define FUTEX_FD (2) #define FUTEX_REQUEUE (3) #define FUTEX_CMP_REQUEUE (4) +#define FUTEX_WAIT_PRIO (5) long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout, unsigned long uaddr2, int val2, Index: linux-2.6.13-rc6-git1-Misc/kernel/futex.c =================================================================== --- linux-2.6.13-rc6-git1-Misc.orig/kernel/futex.c 2005-08-10 19:14:36.000000000 +1000 +++ linux-2.6.13-rc6-git1-Misc/kernel/futex.c 2005-08-10 19:39:39.000000000 +1000 @@ -446,6 +446,23 @@ spin_unlock(&bh->lock); } +static inline void __queue_me_prio(struct futex_q *q, + struct futex_hash_bucket *bh) +{ + struct futex_q *i; + struct task_struct *tsk; + + list_for_each_entry(i, &bh->chain, list) { + tsk = list_entry(i->waiters.task_list.next, + wait_queue_t, task_list)->private; + /* Higher is lower priority, like nice(2). */ + if (current->static_prio < tsk->prio) + break; + } + list_add_tail(&q->list, &i->list); + spin_unlock(&bh->lock); +} + static inline void queue_unlock(struct futex_q *q, struct futex_hash_bucket *bh) { @@ -504,7 +521,7 @@ return ret; } -static int futex_wait(unsigned long uaddr, int val, unsigned long time) +static int futex_wait(unsigned long uaddr, int val, unsigned long time, int pr) { DECLARE_WAITQUEUE(wait, current); int ret, curval; @@ -564,7 +581,10 @@ } /* Only actually queue if *uaddr contained val. */ - __queue_me(&q, bh); + if (pr) + __queue_me_prio(&q, bh); + else + __queue_me(&q, bh); /* * Now the futex is queued and we have checked the data, we @@ -725,7 +745,10 @@ switch (op) { case FUTEX_WAIT: - ret = futex_wait(uaddr, val, timeout); + ret = futex_wait(uaddr, val, timeout, 0); + break; + case FUTEX_WAIT_PRIO: + ret = futex_wait(uaddr, val, timeout, 1); break; case FUTEX_WAKE: ret = futex_wake(uaddr, val);