# Sorting a list by a priority list in Python
TL;DR: jump to the solution.
Let's say you want to sort some list according the order another list, how would you do that in Python?
An example would be to sort a list of foods according a list of your favourite foods, ordered from your most liked to least liked foods:
>>> my_favourite_foods = ['Coffee', 'Chocolate', 'Bagels', 'Lasagne']
>>> available_foods = ['Coffee', 'Beetroot', 'Bagels', 'Burgers', 'Chocolate']
>>> sort_by_priority_list(values=available_foods, priority=my_favourite_foods)
['Coffee', 'Chocolate', 'Bagels', 'Beetroot', 'Burgers']
2
3
4
I had to implement a function like this for one of our projects today. Another dev sent me this function to get me started:
def sort_by_priority_list(values, priority):
priority_dict = dict(
zip(
priority,
range(len(priority)),
),
)
priority_getter = priority_dict.get # dict.get(key)
return sorted(values, key=priority_getter)
2
3
4
5
6
7
8
9
How does this function work?
It starts off by creating a dictionary for which the keys are the items in priority
and the values are the indices of the items in priority
. For the above example, it would look something like this:
>>> priority_dict
{'Coffee': 0, 'Chocolate': 1, 'Bagels': 2, 'Lasagne': 3}
2
Another to way create priority_dict
would be:
priority_dict = {k: i for i, k in enumerate(priority)}
Next up it creates a function called priority_getter
like this:
priority_getter = priority_dict.get # dict.get(key)
Another way to define priority_getter
would be:
def priority_getter(key):
return priority_dict.get(key)
2
priority_getter
gets the value of priority_dict
at a given key.
Why do we need priority_getter
?
At the last line of our function we use it to return a sorted list of our input values:
sorted(values, key=priority_getter)
Here, instead of sorting values
by the items in the values
, we sort it by calling
priority_getter
on each item. In other words, instead of comparing two items as usual like item1 > item2
, sorted
will now compare them using priority_getter(item1) > priority_getter(item2)
- which amounts to: priority_dict.get(item1) > priority_dict.get(item2)
.
There's one issue with this solution though: it fails when there are any items in values
that aren't in priority
:
>>> sort_by_priority_list(values=[1,2,3], priority=[1,2])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-23-3d6470298442> in <module>()
----> 1 sort_by_priority_list([1,2,3], [1,2])
<ipython-input-20-b8e90b47f443> in sort_by_priority_list(values, priority)
7 )
8 priority_getter = priority_dict.get # dict.get(key)
----> 9 return sorted(values, key=priority_getter)
TypeError: '<' not supported between instances of 'NoneType' and 'int'
2
3
4
5
6
7
8
9
10
11
12
How can we fix this?
First we need to figure out why we get this error.
Remember that for every item in values
, we try to get a value from priority_dict
using priority_dict.get(value)
. dict.get
returns None
if the dict
doesn't contain the given key. We can pass a default value to it as an additional argument to make it return the default value instead of None
:
def priority_getter(value):
return priority_dict.get(value, len(values))
2
We set the default value to len(values)
because we want the value of keys that aren't in our priority
list to be higher (i.e. a lesser priority) than any of the items in our priority
list. Why? So that any items in values
that aren't in priority
will always be put at the back of the resulting list.
Instead of modifying priority_getter
, we also have to option of using a defaultdict
(opens new window). When we create a defaultdict
, we have to pass a function which returns a default value as its first argument. This function is called the default_factory
.
priority_dict = defaultdict(
lambda: len(priority), zip(priority, range(len(priority)),),
)
2
3
If we use defaultdict
, we need to change priority_getter = priority_dict.get
to use __getitem__
instead of get
because defaultdict
's default_factory
function only gets called when a key isn't found when __getitem__
is called. defaultdict.get
would return None
in same way that it would return None
for a normal dict
.
# Final Solution
This is what our final solution looks like when we use zip
and defaultdict
:
from collections import defaultdict
from typing import Iterable, List, TypeVar
T = TypeVar("T")
def sort_by_priority_list(values: Iterable[T], priority: List[T]) -> List[T]:
"""
Sorts an iterable according to a list of priority items.
Usage:
>>> sort_by_priority_list(values=[1,2,2,3], priority=[2,3,1])
[2, 2, 3, 1]
>>> sort_by_priority_list(values=set([1,2,3]), priority=[2,3])
[2, 3, 1]
"""
priority_dict = defaultdict(
lambda: len(priority), zip(priority, range(len(priority)),),
)
priority_getter = priority_dict.__getitem__ # dict.get(key)
return sorted(values, key=priority_getter)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
And here it is without using zip
and defaultdict
:
from typing import Iterable, List, TypeVar
T = TypeVar("T")
def sort_by_priority_list(values: Iterable[T], priority: List[T]) -> List[T]:
"""
Sorts an iterable according to a list of priority items.
Usage:
>>> sort_by_priority_list(values=[1,2,2,3], priority=[2,3,1])
[2, 2, 3, 1]
>>> sort_by_priority_list(values=set([1,2,3]), priority=[2,3])
[2, 3, 1]
"""
priority_dict = {k: i for i, k in enumerate(priority)}
def priority_getter(value):
return priority_dict.get(value, len(values))
return sorted(values, key=priority_getter)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
I think the solution without zip
and defaultdict
is a bit easier to read and understand. It is also slightly faster than the first solution for large inputs.
Newsletter
If you'd like to subscribe to my blog, please enter your details below. You can unsubscribe at any time.