Last Updated on
It seems it’s been awhile since I’ve written a Python article, but the series has been fairly successful. So, I figured I dive back in with an article on how to sort a list of strings in Python. Let’s get to it!
Table of Contents
Recently, I discovered a bug in my Sample Programs Wiki Generator code which caused the output wiki to occasionally display a list of strings in the wrong order. The expected list looked something like:
[A, B, C, ...., X, Y, Z]
For whatever reason, the list was instead scrambled:
[H, G, A, ..., Q, B, C]
As I dug into the code a bit, I discovered the following line of code:
alphabetical_list = os.listdir(self.repo.source_dir)
As we can see, we’re banking on the OS library to produce a list of directories in alphabetical order. I guess that’s not always the case. To be sure, I took a peek at the
os.listdir documentation, and it did not disappoint:
Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries
'..'even if they are present in the directory.
Naturally, I decided I wanted to sort this list to avoid any future issues. In this article, we’ll take a look at a few ways to sort a list of strings in Python.
When it comes to sorting, there’s no shortage of solutions. In this section, we’ll cover three of my favorite ways to sort a list of strings in Python.
Sort a List of Strings in Python by Brute Force
As always, we can try to implement our own sorting method. For simplicity, we’ll leverage selection sort:
my_list = [7, 10, -3, 5] size = len(my_list) for i in range(size): min_index = i for j in range(i + 1, size): if my_list[j] < my_list[min_index]: min_index = j temp = my_list[i] my_list[i] = my_list[min_index] my_list[min_index] = temp print(my_list)
It works by comparing the characters of each string directly from their ASCII values in Python 2 or their Unicode values in Python 3. Don’t believe me? Try it out yourself:
"hello" > "the" # returns false "the" > "hello" # returns true
The boolean operators work on strings directly in Python, so we don’t have to worry about writing our own loops to perform the comparison.
Naturally, this solution has its drawbacks. For instance, sorting is almost meaningless for Non-English character sets. In addition, with this method, we’d be performing a case-sensitive sort, so a list like
["abs", "Apple", "apple"] will look something like
['Apple', 'abs', 'apple'] after sorting.
Notice how two of the words are exactly the same but separated in the list. We’d need to use something like the
casefold function for better results.
Sort a List of Strings in Python Using the Sort Function
Why sort by hand when we can leverage the high-level power of python? Naturally, python has a built in sort functionality that works by accepting a list and sorting it in place. Let’s see what it does for a list of strings:
my_list = ["leaf", "cherry", "Fish"] my_list.sort() print(my_list) # prints ["Fish", "cherry", "leaf"]
As we can see, using the predefined sort function, we get the same case-sensitive sorting issue as before. If that’s not problem, feel free to use this solution.
Luckily, sort has a special parameter called key which we can use to specify the ordering:
my_list = ["leaf", "cherry", "Fish"] my_list.sort(key=str.casefold) print(my_list) # prints ["cherry", "Fish", "leaf"]
In the next section, we’ll discuss this key parameter more deeply.
Sort a List of Strings in Python Using the Sorted Function
While lists have their own sort functionality, Python exposes the sort functionality with a separate function called sorted which accepts an iterable. In other words, this new function allows us to sort any collection for which we can obtain an iterable—not just lists. The only difference is that the sorted functionality doesn’t perform a sort in place, so we’ll have to save the result back into our variable. Let’s try it:
my_list = ["leaf", "cherry", "Fish"] my_list = sorted(my_list) print(my_list) # prints ["Fish", "cherry", "leaf"]
Here we can see that we get the same problem as the previous two implementations. So, how do we fix it? Well, fortunately, we’re able to pass a key to the sorted function which defines how to sort the iterable. Take a look:
my_list = ["leaf", "cherry", "Fish"] my_list = sorted(my_list, key=str.casefold) print(my_list) # prints ["cherry", "Fish", "leaf"]
Here we’ve defined a key which leverages the casefold function from earlier. Feel free to read up on Python’s documentation to learn more about how it works. But to summarize, it’s basically a more aggressive lowercase function which can handle many different character sets.
Of course, there are other keys we can leverage such as
cmp_to_key(locale.strcoll) which works for the current locale. If you have any keys you’d recommend, let us know in the comments. As it turns out, manipulating strings isn’t always easy. I learned that the hard way when I started the Reverse a String in Every Language series.
Sort a List of Strings in Python in Descending Order
At this point, we’re able to sort properly, but let’s take things a step further. Let’s sort the list backwards. In other words, the word that normally comes last alphabetically will come first:
my_list = ["leaf", "cherry", "fish"] my_list = sorted(my_list, key=str.casefold, reverse=True) print(my_list) # prints ["leaf", "fish", "cherry"]
Fortunately, the python developers thought ahead and added this functionality right into the sorted method. Using the reverse keyword, we can specify which direction sorting should occur.
And with that, we have everything we need to know to begin sorting.
A Little Recap
At this point, we’ve covered several ways to sort a list of strings. Let’s take another look:
my_list = ["leaf", "cherry", "fish"] # Brute force method using bubble sort my_list = ["leaf", "cherry", "fish"] size = len(my_list) for i in range(size): for j in range(size): if my_list[i] < my_list[j]: temp = my_list[i] my_list[i] = my_list[j] my_list[j] = temp # Generic list sort my_list.sort() # Casefold list sort my_list.sort(key=str.casefold) # Custom list sort using casefold (>= Python 3.3) my_list = sorted(my_list, key=str.casefold) # Custom list sort using current locale my_list = sorted(my_list, key=cmp_to_key(locale.strcoll)) # Custom reverse list sort using casefold (>= Python 3.3) my_list = sorted(my_list, key=str.casefold, reverse=True)
And, that’s it! I hope you enjoyed this article, and perhaps you even found it useful. If so, why not become a member? That way, you’ll always be up to date with the latest The Renegade Coder content.