Roll Your Own Uppercase Function in Python

Today, I’m kicking off a new series of educational Python articles that focuses on reverse engineering common Python functions. To start the series, I figured we’d take a look at an implementation of an uppercase function similar to upper(). Let’s see how we do!

Problem Description
Thought Process
Testing
Solution
Why Not Roll Your Own?

Problem Description

Recently, I wrote an article on how to capitalize a string in Python, and I had an idea. What if I put together a series of articles on implementing existing Python functionality? This would allow me to teach a bit of my thought process while also giving me an endless supply of articles to write, so I decided to give it a go.

To kick off this series, I thought it would be fun to explore a method closely related to capitalization: upper(). If you’re not familiar with this method, here’s the official method description:

Return a copy of the string with all the cased characters converted to uppercase. Note that s.upper().isupper() might be False if s contains uncased characters or if the Unicode category of the resulting character(s) is not “Lu” (Letter, uppercase), but e.g. “Lt” (Letter, titlecase).

The uppercasing algorithm used is described in section 3.13 of the Unicode Standard.
Source: Python Documentation

Ultimately, the goal of today will be to write our own upper() function in line with the description above. That said, like most of my work regarding strings, I try to simplify things considerably. Here’s the uppercase and lowercase character sets we’ll be working with today:

lowercase = "abcdefghijklmnopqrstuvwxyz"
uppercase = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

Any function we develop today should then behave as follows:

>>> upper("example")
EXAMPLE
>>> upper("123abc")
123ABC
>>> upper("HOWDY")
HOWDY

In the next section, we’ll talk about the thought process I’d use to solve this problem.

Thought Process

For me, when it comes to writing up a solution, I like to think about the expected behavior in terms of a black box. In other words, we don’t really know how upper() works, but we do know two things: input and expected output.

Input: a string
Output: a string with all cased characters converted to uppercase

Or if you’d like it in Python format, here’s what the function definition might look like in a file called roll_your_own.py:

def upper(string):
  pass

Ultimately, we need to figure out how to transform the input into the expected output. In this case, the transformation probably involves finding all the lowercase letters and converting them to uppercase characters.

What else do we know? Well, we know strings cannot be modified, so we’ll need to build a new string to return. In addition, we know the transformation is not just going to be a process of converting lowercase letters to uppercase letters. We’ll also need to identify lowercase letters from other letters.

Based on this information, there’s probably going to be a few steps:

Identify characters that need to be transformed
Convert them
Add them to a new string
Return the result

Perhaps the most straightforward way to do this would be to scan each character in the string and add it to a new string. Of course, we don’t want to duplicate the string. As a result, if the current character is lowercase, convert it before adding it to the new string.

Testing

Now, there are a lot of ways to implement the solution we came up with and probably dozen of ways that use different steps. Regardless of the solution we come up with, we’ll want to make sure that it’s valid. To do that, we should write a few tests.

Personally, I’ve followed the same crude testing scheme since my first programming course in 2012: first, middle, last, zero, one, many. In our case, this simple testing scheme basically breaks down as follows:

First: a lowercase character appears as the first character in the string
Middle: a lowercase character appears somewhere in the middle of the string
Last: a lowercase character appears as the last character in the string
Zero: an empty string
One: a string of one character
Many: a string of many characters

Obviously, this list is not exhaustive, but it’s a great start.

For completeness, I’ll share how I’d write those tests as well. Assuming the example file from before (i.e. roll_your_own.py), we can create a test file in the same folder called test.py. The test file should like as follows:

import unittest
import importlib

roll_your_own = importlib.import_module("roll_your_own")

class TestUpper(unittest.TestCase):

  def test_upper_first(self):
    self.assertEqual(
      roll_your_own.upper("aPPLE"), 
      "APPLE", 
      "Failed to uppercase 'a' in 'aPPLE'"
    )

  def test_upper_middle(self):
    self.assertEqual(
      roll_your_own.upper("ApPLe"), 
      "APPLE", 
      "Failed to uppercase 'p' in 'ApPLE'"
    )

  def test_upper_last(self):
    self.assertEqual(
      roll_your_own.upper("APPLe"), 
      "APPLE", 
      "Failed to uppercase 'e' in 'APPLe'"
    )

  def test_upper_zero(self):
    self.assertEqual(
      roll_your_own.upper(""), 
      "", 
      "Failed to return empty string unchanged"
    )

  def test_upper_one(self):
    self.assertEqual(
      roll_your_own.upper("a"), 
      "A", 
      "Failed to uppercase a single letter"
    )

  def test_upper_many(self):
    self.assertEqual(
      roll_your_own.upper("how now brown cow"), 
      "HOW NOW BROWN COW", 
      "Failed to uppercase many letters"
    )

if __name__ == '__main__':
  unittest.main()

And to be sure the testing works, we should see something like the following when we run it:

FFFFFF
======================================================================
FAIL: test_upper_first (__main__.TestUpper)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\jerem\Downloads\test\test.py", line 9, in test_upper_first
    self.assertEqual(roll_your_own.upper("aPPLE"), "APPLE", "Failed to uppercase 'a' in 'aPPLE'")
AssertionError: None != 'APPLE' : Failed to uppercase 'a' in 'aPPLE'

======================================================================
FAIL: test_upper_last (__main__.TestUpper)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\jerem\Downloads\test\test.py", line 15, in test_upper_last
    self.assertEqual(roll_your_own.upper("APPLe"), "APPLE", "Failed to uppercase 'e' in 'APPLe'")
AssertionError: None != 'APPLE' : Failed to uppercase 'e' in 'APPLe'

======================================================================
FAIL: test_upper_many (__main__.TestUpper)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\jerem\Downloads\test\test.py", line 24, in test_upper_many
    self.assertEqual(roll_your_own.upper("how now brown cow"), "HOW NOW BROWN COW", "Failed to uppercase many letters")
AssertionError: None != 'HOW NOW BROWN COW' : Failed to uppercase many letters

======================================================================
FAIL: test_upper_middle (__main__.TestUpper)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\jerem\Downloads\test\test.py", line 12, in test_upper_middle
    self.assertEqual(roll_your_own.upper("ApPLe"), "APPLE", "Failed to uppercase 'p' in 'ApPLE'")
AssertionError: None != 'APPLE' : Failed to uppercase 'p' in 'ApPLE'

======================================================================
FAIL: test_upper_one (__main__.TestUpper)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\jerem\Downloads\test\test.py", line 21, in test_upper_one
    self.assertEqual(roll_your_own.upper("a"), "A", "Failed to uppercase a single letter")
AssertionError: None != 'A' : Failed to uppercase a single letter

======================================================================
FAIL: test_upper_zero (__main__.TestUpper)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\jerem\Downloads\test\test.py", line 18, in test_upper_zero
    self.assertEqual(roll_your_own.upper(""), "", "Failed to return empty string unchanged")
AssertionError: None != '' : Failed to return empty string unchanged

----------------------------------------------------------------------
Ran 6 tests in 0.013s

FAILED (failures=6)

With that out of the way, let’s go ahead and write ourselves a solution!

Solution

As I mentioned above, my general approach to uppercasing a string will be as follows:

Identify characters that need to be transformed
Convert them
Add them to a new string
Return the result

Let’s tackle each step one at a time.

Identify Lowercase Characters

To identify lowercase characters, we’re going to need some sort of mechanism for retrieving each character. There are a couple of ways to do this, but they basically fall in two camps: recursion and iteration. In other words, we can get each character from our string using recursion or iteration. Here’s an example for each:

Iteration

def upper(string):
  result = ""
  for character in string:
    result += character
  return result

Recursion

def upper(string):
  if string:
    return string[0] + upper(string[1:])
  return string

Both of these examples have the same behavior: they create a copy of the original string. It’s up to you to decide which approach you’ll take, but I’m fond of the iterative approach.

Now that we have a way of retrieving each character from the string, we need some way to check if it’s lowercase. If you read my capitalization article, then you know there are several ways to do this. Personally, I like using the ordinal values of each character to identify characters in the range of all lowercase values (i.e. 97 – 122). To do that, we need an if statement:

def upper(string):
  result = ""
  for character in string:
    if 97 <= ord(character) <= 122:
      pass
    result += character
  return result

Alternatively, it’s entirely possible to search a string that has all of the lowercase letters of the alphabet:

def upper(string):
  lowercase = 'abcdefghijklmnopqrstuvwxyz'
  result = ""
  for character in string:
    if character in lowercase:
      pass
    result += character
  return result

Personally, I think the string of characters is a bit ugly, but I’d argue the code is more readable due to the lack of magic numbers. That said, we’ll stick with the ordinal value solution for now.

Convert Lowercase Characters to Uppercase

Now that we’ve managed to identify all of the lowercase characters, we’ll need some conversion logic. Since we’re using the ordinal values, we’ll need some sort of mapping from lowercase to uppercase.

Luckily, all of the lowercase values can be found in the range of 97 to 122 while all of the uppercase values can be found in the range of 65 to 90. As it turns out, the difference in these ranges is 32. In other words, we can take the ordinal value of any lowercase letter and subtract it by 32 to obtain its uppercase counterpart. Here’s what that looks like in the code:

def upper(string):
  result = ""
  for character in string:
    if 97 <= ord(character) <= 122:
      uppercase = ord(character) - 32 
    result += character
  return result

And if you’re like me and hate to see duplicate code, you might pull out the call to ord():

def upper(string):
  result = ""
  for character in string:
    ordinal = ord(character) - 32
    if 65 <= ordinal <= 90:
      pass
    result += character
  return result

Here, we compute the shift ahead of time and save it in a variable. If the shifted variable falls in the range of the uppercase letters, we know we had a lowercase letter. At this time, we don’t do anything with the value. That’s the next step!

Add Updated Characters to a New String

At this point, the bulk of the steps are complete. All that is left is to construct the new string. There are several ways to do this, but I’ll stick to the straightforward if/else:

def upper(string):
  result = ""
  for character in string:
    ordinal = ord(character) - 32
    if 65 <= ordinal <= 90:
      result += chr(ordinal)
    else:
      result += character
  return result

Now, this solution technically works. For instance, here’s what happens when we run our tests:

......
----------------------------------------------------------------------
Ran 6 tests in 0.012s

OK

However, there are a few quality of life updates we should probably make. For example, it’s generally bad practice to concatenate strings in a loop. Instead, let’s try converting our string to a list, so we can leverage the join() method:

def upper(string):
  characters = list(string)
  for index, character in enumerate(characters):
    ordinal = ord(character) - 32
    if 65 <= ordinal <= 90:
      characters[index] = chr(ordinal)
  return ''.join(characters)

Personally, I like this solution a bit more because it allows us to modify the string in place. In addition, we got rid of a branch as well as concatenation in a loop.

That said, even after all this work, I think there’s another possible solution. Rather than iterating explicitly, what if we took advantage of one of the functional features of Python: map(). That way, we could apply our conversion logic in a more concise way:

def upper(string):
  return "".join(map(lambda c: chr(ord(c) -32) if 97 <= ord(c) <= 122 else c, string))

Granted, a lot of Python folks prefer list comprehensions. That said, both are fairly unreadable given our ordinal logic, so it’s probably for the best to stick to the previous solution. Otherwise, I think we’re done here!

Why Not Roll Your Own?

The purpose of these roll your own articles is threefold:

First, they allow me to take some time to practice my Python, and it’s fun trying to reverse engineering common Python functions and methods.

Second, they allow me to demonstrate the thought process of an experienced programmer to newer programmers.

Finally, they give me yet another way for folks in the community to contribute. If you’d like to share your own solution to this problem, head on over to Twitter and share your solution with #RenegadePython. Alternatively, I’m happy to check out your solutions in our Discord.

As always, I appreciate you taking the time to check out the site. If you’d like to help support The Renegade Coder, head on over to my list of ways to grow the site. Alternatively, feel free to check out some of these related articles:

Likewise, here are some resources from the folks at Amazon (#ad):

Once again, thanks for checking out the site! I hope to see you again soon.

Roll Your Own Python (3 Articles)—Series Navigation

wp-content/uploads/2021/02/noun_Roll_3353725-1024x1024.png

Roll Your Own Python is my latest Python series inspired by one of my best friends, @VirtualFlatCAD. In this series, we try to implement built-in Python functions like min() and len(). You can check out the full set of solutions in the GitHub repository.

[#2]: Next Post →