It seems it’s been awhile since I’ve written a Python article, but the series has been fairly successful. So, I figured I dive back in with an article on how to sort a list of strings in Python. Let’s get to it!
Table of Contents
Problem Introduction
Recently, I discovered a bug in my Sample Programs Wiki Generator code which caused the output wiki to occasionally display a list of strings in the wrong order. The expected list looked something like:
[A, B, C, ...., X, Y, Z]
For whatever reason, the list was instead scrambled:
[H, G, A, ..., Q, B, C]
As I dug into the code a bit, I discovered the following line of code:
alphabetical_list = os.listdir(self.repo.source_dir)
As we can see, we’re banking on the OS library to produce a list of directories in alphabetical order. I guess that’s not always the case. To be sure, I took a peek at the os.listdir
documentation, and it did not disappoint:
Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries
'.'
and'..'
even if they are present in the directory.
Naturally, I decided I wanted to sort this list to avoid any future issues. In this article, we’ll take a look at a few ways to sort a list of strings in Python.
Solutions
When it comes to sorting, there’s no shortage of solutions. In this section, we’ll cover three of my favorite ways to sort a list of strings in Python.
Sort a List of Strings in Python by Brute Force
As always, we can try to implement our own sorting method. For simplicity, we’ll leverage selection sort:
my_list = [7, 10, -3, 5] size = len(my_list) for i in range(size): min_index = i for j in range(i + 1, size): if my_list[j] < my_list[min_index]: min_index = j temp = my_list[i] my_list[i] = my_list[min_index] my_list[min_index] = temp print(my_list)
It works by comparing the characters of each string directly from their ASCII values in Python 2 or their Unicode values in Python 3. Don’t believe me? Try it out yourself:
"hello" > "the" # returns false "the" > "hello" # returns true
The boolean operators work on strings directly in Python, so we don’t have to worry about writing our own loops to perform the comparison.
Naturally, this solution has its drawbacks. For instance, sorting is almost meaningless for Non-English character sets. In addition, with this method, we’d be performing a case-sensitive sort, so a list like ["abs", "Apple", "apple"]
will look something like ['Apple', 'abs', 'apple']
after sorting.
Notice how two of the words are exactly the same but separated in the list. We’d need to use something like the casefold
function for better results.
Sort a List of Strings in Python Using the Sort Function
Why sort by hand when we can leverage the high-level power of python? Naturally, python has a built in sort functionality that works by accepting a list and sorting it in place. Let’s see what it does for a list of strings:
my_list = ["leaf", "cherry", "Fish"] my_list.sort() print(my_list) # prints ["Fish", "cherry", "leaf"]
As we can see, using the predefined sort function, we get the same case-sensitive sorting issue as before. If that’s not problem, feel free to use this solution.
Luckily, sort has a special parameter called key which we can use to specify the ordering:
my_list = ["leaf", "cherry", "Fish"] my_list.sort(key=str.casefold) print(my_list) # prints ["cherry", "Fish", "leaf"]
In the next section, we’ll discuss this key parameter more deeply.
Sort a List of Strings in Python Using the Sorted Function
While lists have their own sort functionality, Python exposes the sort functionality with a separate function called sorted which accepts an iterable. In other words, this new function allows us to sort any collection for which we can obtain an iterable—not just lists. The only difference is that the sorted functionality doesn’t perform a sort in place, so we’ll have to save the result back into our variable. Let’s try it:
my_list = ["leaf", "cherry", "Fish"] my_list = sorted(my_list) print(my_list) # prints ["Fish", "cherry", "leaf"]
Here we can see that we get the same problem as the previous two implementations. So, how do we fix it? Well, fortunately, we’re able to pass a key to the sorted function which defines how to sort the iterable. Take a look:
my_list = ["leaf", "cherry", "Fish"] my_list = sorted(my_list, key=str.casefold) print(my_list) # prints ["cherry", "Fish", "leaf"]
Here we’ve defined a key which leverages the casefold function from earlier. Feel free to read up on Python’s documentation to learn more about how it works. But to summarize, it’s basically a more aggressive lowercase function which can handle many different character sets.
Of course, there are other keys we can leverage such as cmp_to_key(locale.strcoll)
which works for the current locale. If you have any keys you’d recommend, let us know in the comments. As it turns out, manipulating strings isn’t always easy. I learned that the hard way when I started the Reverse a String in Every Language series.
Sort a List of Strings in Python in Descending Order
At this point, we’re able to sort properly, but let’s take things a step further. Let’s sort the list backwards. In other words, the word that normally comes last alphabetically will come first:
my_list = ["leaf", "cherry", "fish"] my_list = sorted(my_list, key=str.casefold, reverse=True) print(my_list) # prints ["leaf", "fish", "cherry"]
Fortunately, the python developers thought ahead and added this functionality right into the sorted method. Using the reverse keyword, we can specify which direction sorting should occur.
And with that, we have everything we need to know to begin sorting.
Performance
To test the performance of each solution, we’ll want to set them up in strings:
setup = """ import locale from functools import cmp_to_key my_list = ["leaf", "cherry", "fish"] """ brute_force = """ size = len(my_list) for i in range(size): for j in range(size): if my_list[i] < my_list[j]: temp = my_list[i] my_list[i] = my_list[j] my_list[j] = temp """ generic_sort = """ my_list.sort() """ case_fold_sort = """ my_list.sort(key=str.casefold) """ generic_sorted = """ my_list = sorted(my_list) """ case_fold_sorted = """ my_list = sorted(my_list, key=str.casefold) """ locale_sorted = """ my_list = sorted(my_list, key=cmp_to_key(locale.strcoll)) """ reverse_case_fold_sorted = """ my_list = sorted(my_list, key=str.casefold, reverse=True) """
Next, we can test each solution using the timeit
library:
>>> import timeit >>> min(timeit.repeat(stmt=brute_force, setup=setup, repeat=10)) 2.4897978000003604 >>> min(timeit.repeat(stmt=generic_sort, setup=setup, repeat=10)) 0.08845160000009855 >>> min(timeit.repeat(stmt=case_fold_sort, setup=setup, repeat=10)) 0.40834640000002764 >>> min(timeit.repeat(stmt=generic_sorted, setup=setup, repeat=10)) 0.1804069999998319 >>> min(timeit.repeat(stmt=case_fold_sorted, setup=setup, repeat=10)) 0.5034002000002147 >>> min(timeit.repeat(stmt=locale_sorted, setup=setup, repeat=10)) 1.0272592000001168 >>> min(timeit.repeat(stmt=reverse_case_fold_sorted, setup=setup, repeat=10)) 0.5373070999999072
And, there we have it! Apparently, the generic sort method is quite fast. If you’re comfortable with the natural ordering of strings, that’s definitely the way to go.
Of course, don’t try to write your own sorting algorithm! Look how slow our brute force implementation is in comparison to all other solutions. We’re talking two orders of magnitude slower than the built in sort
method. Now, that’s slow.
A Little Recap
At this point, we’ve covered several ways to sort a list of strings. Let’s take another look:
my_list = ["leaf", "cherry", "fish"] # Brute force method using bubble sort my_list = ["leaf", "cherry", "fish"] size = len(my_list) for i in range(size): for j in range(size): if my_list[i] < my_list[j]: temp = my_list[i] my_list[i] = my_list[j] my_list[j] = temp # Generic list sort *fastest* my_list.sort() # Casefold list sort my_list.sort(key=str.casefold) # Generic list sorted my_list = sorted(my_list) # Custom list sort using casefold (>= Python 3.3) my_list = sorted(my_list, key=str.casefold) # Custom list sort using current locale import locale from functools import cmp_to_key my_list = sorted(my_list, key=cmp_to_key(locale.strcoll)) # Custom reverse list sort using casefold (>= Python 3.3) my_list = sorted(my_list, key=str.casefold, reverse=True)
And, that’s it! I hope you enjoyed this article, and perhaps you even found it useful. If so, why not become a member? That way, you’ll always be up to date with the latest The Renegade Coder content.
Once again, you can also support the site by making purchases on Amazon through the following affiliate links:
- Python Crash Course: A Hands-On, Project-Based Introduction to Programming by Eric Matthes
- Python Programming: A Smart Approach For Absolute Beginners by Steve Manson
While I haven’t personally used these resources, I can say that I put quite a bit of research into finding products that I believe will benefit you.
While you’re here, check out some of these other Python articles:
As always, thanks for taking time to support the site. Also, special thanks to all my patrons who continue to support my work. See ya next time!
Recent Posts
While creating some of the other early articles in this series, I had a realization: something even more fundamental than loops and if statements is the condition. As a result, I figured we could...
Today, we're expanding our concept map with the concept of loops in Python! Unless you're a complete beginner, you probably know a thing or two about loops, but maybe I can teach you something new.