To kick off this series on data structures, we’re going to cover something a bit theoretical known as big O notation.
Table of Contents
- From Basics to Data Structures
- What Are Data Structures?
- What Is Big O Notation?
- Big O Examples
- Breaking Down Big O
- Measuring Big O
- Comparing Algorithms
- Implications of Big O
From Basics to Data Structures
Long time no see! It seems like it’s been a little while since we chatted about Java on The Renegade Coder. In fact, the last lesson was the closing tutorial for the Java Basics series: Review of the Java Basics Series. That lesson revisited all the topics that we covered throughout that series like class structure, looping, and control flow.
At this point, it would probably make sense to start tackling more advanced Java topics like inheritance and polymorphism. Instead, we’re going to pivot to something a bit more theoretical. Don’t worry though! These topics will help when we double back to some more advanced Java topics. Instead, we’re going to start tackling data structures by getting a better understanding of Big O notation.
What Are Data Structures?
If we recall all the way back to the Java Basics Review tutorial, then we’ll remember that we built a test grading program. To get the program to work, we had to actually introduce a new concept: the array.
The array allowed us to store a list of tests which we would grade together. This was quite powerful because it gave us the ability to store multiple tests without giving each of them their own field. We just created a single field that could store as many tests as we wanted.
That storage mechanism is known as a data structure. In other words, a data structure is a way to organize data.
What Is Big O Notation?
Fortunately, our array is not the only way to organize data. We could have used a linked list, or perhaps a tree, or even a hash table. Don’t worry if some of those terms are brand new. We’ll be covering them in detail as this series moves forward.
With all these options, how do we know which one to choose? The key is to understand each data structure at a fundamental level. For instance, how much time does it take to insert a new element in the data structure? How long does it take to search for an element in the data structure? Do those times change as the data structure grows? If so, does that have a positive or negative impact on our design?
Definition
In essence, these types of questions lead to a concept known as Big O or Big O notation. Big O is often used to describe the asymptotic upper bound of performance or complexity for a given function. In other words, Big O can be used as an estimate of performance or complexity for a given algorithm.
With that said, big O has nothing to do with best, average, or worst case performance or complexity. However, it can describe an algorithm in any of those situations. If that seems confusing, don’t worry. Math terminology can be difficult to grasp. I recommend reading up on the formal big O definition, so you’ll at least be more comfortable with the math.
At any rate, let’s dive into something a bit more practical.
Explanation
By knowing Big O for different features of a data structure, we’re able to make decisions pretty quickly. But, what is Big O notation? It’s a measurement that is usually shown as follows:
O(N log(N))
Uh oh! Looks like we’ll have to brush up on our math skills a bit. What we’re looking at above is the asymptotic upper bound of some function which has some parameter N. In algorithms, N is typically the size of the input set.
For example, if we wanted to sort a list of size 10, then N would be 10. In other words, Big O tells us how much time or space an algorithm could take given the size of the data set.
However, Big O is almost never used in plug’n chug fashion. Instead, it’s used to describe the performance or complexity of an algorithm as the size of the data set tends toward infinity. After all, as software developers we care about scalability. We want to be able to choose the right data structure for the job the first time. Otherwise, we could see our design grinding to a halt over time.
Big O Examples
Perhaps the best way to get an understanding of Big O is to share some coding examples. That way we’ll get an idea of some real world applications. To kick it off, we’ll start with O(1).
O(1) Example
Given some best, worst, or average case scenario, O(1) refers to an algorithm that will execute in no worse than constant time or space proportional to the size of the data set. For example:
public int getFirstElement(int[] myList) { return myList[0]; }
In this example, we pull out the first element of an array. Because each element in an array has a fixed size, we can access any of them in constant time. To do so, we multiply the size of an element by the index we want to access and add that product to the memory address of the first element:
memory_address_of(element_11) = memory_address_of(element_0) + size_of_element * index_of(element_11)
This method works to give us the first element of an array in constant time.
O(N) Example
Given some best, worst, or average case scenario, O(N) refers to an algorithm that executes in no worse than linear time or space proportional to the size of the data set. In other words, the execution time or space increases linearly with the size of the data set. For example:
public int sumSet(int[] values) { int sum = 0; for (int i = 0; i < values.length; i++) { sum += value[i]; } return sum; }
In this case, the loop has to iterate over all elements of the data set to produce the sum. As the data set size increases, the time to compute the sum will increase linearly.
O(N²) Example
Given some best, worst, or average case scenario, O(N²) refers to an algorithm that executes in time or space proportional to the square of the size of the data set. In other words, if we had a data set which had 4 elements, it would take 16 iterations to complete the algorithm. As we can see, this issue scales pretty rapidly.
For an example of O(N²), let’s take a stab at a sorting algorithm. In particular, we’ll implement bubble sort. Bubble sort is generally a bad sorting algorithm, but we’ll see how that plays out much later in the series.
public static void bubbleSort(int[] numberList) { int n = numberList.length; int temp = 0; for (int i = 0; i < n; i++) { for (int j = 1; j < (n - i); j++) { if (numberList[j - 1] > numberList[j]) { temp = numberList[j - 1]; numberList[j - 1] = numberList[j]; numberList[j] = temp; } } } }
Here we can see that the bubble sort algorithm uses a nested loop. In particular, we’ll see that the number of iterations over the data set is i * j
. A nested loop is usually a red flag that demonstrates we have an O(N²) algorithm (not a universal truth, but we’ll see that later).
But What About Space?
As stated several times already, Big O is an asymptotic upper bound measurement of performance for a particular algorithm. We’ve primarily looked at examples of performance in terms of time, but Big O can also be used to measure space complexity. In other words, Big O can be used to measure the impact of an algorithm on memory.
For instance, an algorithm with O(N²) space complexity would require space proportional to the square of the input data set. By space, we mean physical memory locations. For the O(N²) algorithm with a input data size of 10, we would need to allocate 100 physical locations in memory. Sometimes using memory allows us to reduce redundant comparisons and calculations which drives down the runtime of an algorithm.
Breaking Down Big O
Now that we have a better understanding of Big O, let’s see the actual impact that it can have on an algorithm. The following Wolfram Alpha widget should help put algorithm performance into perspective a bit. Use the function lines to write equations like 1, x, and x². Then extend the x axis out to get a better idea of the impact of these growth rates as the size of the data set increases.
If we treat the x-axis as if it were the size of the data set, we can quickly see the impact a bad algorithm can have on execution time or space. For example, just take a look at the difference between O(N) and O(N²). By the time the input data size hits two, the O(N²) algorithm starts taking twice as much time or space as the O(N) algorithm.
Of course, at a small scale Big O is hardly relevant. That’s partly due to the speed of modern processors, but it’s also due to the fact that algorithm overhead can have more of in impact on runtime than the actual algorithm. For example, perhaps an O(N) algorithm caches some calculations before it executes. In the long term, it beats out an O(N²) algorithm every time. However, at a small scale the caching might add enough overhead to the O(N) algorithm that the O(N²) algorithm actually has the edge. Keep that in mind as we continue.
Measuring Big O
In order to actually be able to apply Big O, we’ll need to be able to measure it for a given a algorithm. By now we should understand that the expression inside the parentheses is the actual Big O measurement. In other words, we’ll need to be able to look at a code snippet and determine the expression that describes the worst case performance of that function.
A Couple Notes
Before we begin analyzing any algorithms, we need to cover a few key aspects of Big O. First, when measuring Big O, we only care about the term with the greatest order. For example:
f(x) = x² + 3x - 17
This function could very well describe the worst case performance of an algorithm. However, the term with the greatest order is x². Therefore, the Big O of this algorithm is O(N²).
Second, constants are also ignored when measuring Big O. For example:
f(x) = 5x² + 9
With this function, we might think that the 5 is significant because it’s appended to the term with the largest order. Naturally, we would report that the Big O for this algorithm is O(5N²). The truth is we don’t care about that constant because Big O is simply measuring the growth rate of a function as it tends toward infinity. Therefore, we would also declare this algorithm as O(N²).
However, now we have a bit of a predicament. Both of the algorithms in this section are rated as O(N²), yet these algorithms will certainly have different runtimes. After all, we’re always dealing with finite data sets. Therefore, the original functions have to carry some weight during runtime.
That brings us to the final point. Big O only matters for very large data sets, and even then it’s only practical when choosing between two algorithms with different Big O measurements. Otherwise, it comes down to running the algorithms. After all, theory is nice, but hard evidence is better.
Big O Measurement Strategies
Measuring Big O is as easy as tracing through the code and assigning each operation a Big O measurement. From there, we combine our measurements into an expression which we ultimately reduce to the largest order term. In other words, we just need to isolate the bottleneck, and we’ll have our answer.
O(1) Example
To be thorough, let’s go back and actually evaluate our examples by hand. To start, let’s trace through our O(1) algorithm:
public int getFirstElement(int[] myList) { return myList[0]; }
If we were to call this method, the first thing that would happen is we would evaluate myList[0]
. As stated before, random access to an array is a constant time operation. Therefore, this operation receives a constant time rating of O(1). Since the method exits, we have our answer.
O(N) Example
Now let’s complicate things a bit more using the O(N) algorithm:
public int sumSet(int[] values) { int sum = 0; for (int i = 0; i < values.length; i++) { sum += value[i]; } return sum; }
If we drop into this method, we first complete a variable assignment which is a constant time operation or O(1). Next we enter our loop which begins with another variable assignment. At this point, our overall performance looks something like O(1) + O(1)
.
Next, we’ll run a constant time comparison. However, this is a part of the loop. As a result, we need to figure out how many times the loop iterates. In this case, an array of size 50 would cause 50 iterations while an array of size 300 would cause 300 iterations. This relationship is linear, so the loop as a whole operates at O(N). Inside the loop, we have 4 constant time operations: a comparison, an array lookup, an addition, and an increment. These four operations occur every time the loop executes, so we’ll want to use multiplication. Overall, the algorithm’s performance can modeled using the following expression:
2O(1) + O(N) * 4O(1)
Here we can isolate the bottleneck pretty easily. Since the largest order term is O(N), we can go ahead and give the algorithm a rating of O(N).
O(N²) Example
Finally, let’s revisit our O(N²) algorithm.
public static void bubbleSort(int[] numberList) { int n = numberList.length; int temp = 0; for (int i = 0; i < n; i++) { for (int j = 1; j < (n - i); j++) { if (numberList[j - 1] > numberList[j]) { temp = numberList[j - 1]; numberList[j - 1] = numberList[j]; numberList[j] = temp; } } } }
Here we have an additional complication – a nested loop. This can make things challenging because we actually have to be careful when we calculate the total number of iterations. In loops with counters, we have to pay attention to who is iterating each counter. Fortunately, both counters in this algorithm are owned by their respective loops. That makes this calculation a lot easier since we only have to pay attention to the loop conditions.
Outer Loop
In this case, we start with three constant time operations. Yes, the length of an array can be accessed in constant time. It’s a fixed value, so Java essentially treats it as a constant which can be retrieved at any time. Next, we drop into our outer loop. Here the loop condition is driven by the length of our data set, so we can go ahead and refer to this operation as O(N).
Inner Loop
Next we drop into the inner loop which also runs for the length of N (or rather N – 1). We can go ahead and ignore the constant value since the trend for the loop is still linear. As a result, the inner loop also has a growth rate of O(N). So what happens in this situation? Let’s go ahead and draw up the equation:
3O(1) + O(N) * (O(N) * 5O(1))
In this case, we can’t exactly say that this algorithm executes in linear time. That’s because the linear terms are multiplied rather than added.
That said, the math isn’t essential here. All we need to do is identify the bottleneck which in this case is clearly the nested loop. If we look at what is really happening, we’re runnning a linear operation a linear number of times. In other words, we run N iterations N times for a total of N² iterations. As a result, we can give this algorithm a rating of O(N²).
Comparing Algorithms
Alright, so now we know what Big O is and how to measure it, but how do we compare algorithms once we’ve made our measurement? At this point, it’s all math. We just need to be able to compare growth rates of various functions. That said, let’s take a look at a couple examples:
O(N) vs. O(N²)
O(N!) vs. O(2^N)
O(N log(N)) vs. O(N √N)
Here we have three examples that should showcase the various ways we can compare algorithms.
O(N) vs. O(N²)
To start, let’s look at one that we should already be able to answer quickly: O(N) vs. O(N²)
With this one, we can intuitively say that N² grows faster than N, but how do we know that? A quick trick is to separate out the terms. For example: O(N) vs. O(N * N)
. Now we can pretty much just cancel duplicate terms and look at what is left. For our example, we end up with an extra N term in O(N²) which grows much faster than constant term that is left in O(N), so the O(N) algorithm is the clear winner.
O(N!) vs. O(2^N)
Now our second example gets a bit more complicated. Here we have a factorial function versus an exponential function. Without knowing offhand which one grows faster, the best way to figure it out is to convert each function to a series, and determine which one grows faster. For instance:
N! = 1 * 2 * 3 * ... * N 2^N = 2 * 2 * 2 * 2 * ... * 2
Now we can see that after the second term the factorial function overtakes the exponential function. In fact, we can even do a little plug’n chug to see when the factorial function outgrows the exponential function.
N = 1 N! = 1 2^N = 2 ------- N = 2 N! = 2 2^N = 4 ------- N = 3 N! = 6 2^N = 8 ------- N = 4 N! = 24 2^N = 16
By the time N = 4, the factorial function has already outgrown the exponential function. In this case, we should snag the algorithm with the exponential growth rate.
O(N log(N)) vs. O(N √N)
Finally, we have our first comparison using logs and square roots. This one combines a couple of tricks from above. First, we’ll note that both functions have a factor of N, so we can go ahead and ignore them. What we really care about is the difference between a square root and a logarithm. The trick here is to recognize that a square root is really just another exponential function where the power is ½. However, that doesn’t mean that a O(√N) is bad. In fact, it’s actually better than O(N). The fact that it is still exponential is what makes it worse than O(log(N)). Let’s actually go ahead and do some plug’n chug to prove it.
N = 1 log(1) = 0 √1 = 1 ------- N = 2 log(2) = 0.30102999566 √2 = 1.41421356237
By the time our data set hits a value of two, the square root function has already taken over. At the end of the day, we’ll take the O(N log(N)) algorithm.
Implications of Big O
Of course, why does Big O matter? Computers today are so fast that we would hardly notice the difference with a small data set. But that’s just the problem! We tend to assume small data sets as we begin a project. By the time the data set is big enough to make an impact on the project, we already opted out of optimization. Over time our data set grows, and we start to experience serious issues. Then we have to go back and identify the bottleneck. Sometimes this is easy. Most of the time it isn’t.
As we move forward through the various data structures, we’ll revisit this concept. In fact, it’ll become pretty important as we play around with the features of each data structure. It’ll also be a main talking point when we get into sorting algorithms. By the end of this series, we should be pretty comfortable talking about algorithm performance and complexity.
If you want to get a head start, I recommend taking a look at the Big O Cheat Sheet. It’s a great reference if you’re ever looking for a one-stop-shop of all the different data structures and their associated performances. It won’t be super helpful right away, but it’s a nice tool to have handy.
Recent Posts
While creating some of the other early articles in this series, I had a realization: something even more fundamental than loops and if statements is the condition. As a result, I figured we could...
Today, we're expanding our concept map with the concept of loops in Python! Unless you're a complete beginner, you probably know a thing or two about loops, but maybe I can teach you something new.