Explain Like I’m Five: Context-Free Grammars

Explain Like I'm Five: Context-Free Grammars Featured Image

This week I was doing a bit of internal link building when I rediscovered a few articles attempting to explain concepts to kids. Given that I have my own kid now, I figured I might as well make it a full-fledged series. To kick things off, I also figured I’d start with a topic I’ll be teaching in a couple weeks: context-free grammars.

Table of Contents

What’s a Grammar?

If you spent any time in school, you have almost certainly been exposed to the concept of grammar. As far as I know, every natural language has one that we more or less respect.

When it comes to English, grammar typically defines the structure of sentences. For example, when we first learn how to form sentences, we often learn about nouns (i.e., people, places, and things). Then, we might explore pronouns, which we use in place of nouns. In other words, rather than referring to myself by my name (e.g., Jeremy likes talking about grammar), I use the pronoun “I”. Add verbs into the mix, and we can start building basic sentences.

As it turns it, programming languages are very similar; they, too, have a grammar. In other words, every programming language has a structure defined for it, called a syntax. Under this grammar, there are various syntactic rules. For example, much like English has nouns, adjectives, and verbs, programming languages might have literals, labels, and operators.

Keeping the analogy going, the small pieces of syntax in English, like nouns and verbs, build up to bigger elements like sentences and paragraphs. In a programming language, we might combine literals, such as numbers, and operators, such as addition and subtraction, to create an expression. We might then take an expression and place it next to a declaration to create an assignment statement. At this point, we might even notice the parallels more clearly as programming languages literally borrow words like “expression”, “statement”, “term”, and “declaration” directly from natural language.

What Does a Grammar Look Like?

In the world of programming languages, we somewhat eliminate the ambiguity you see in natural languages by defining a very explicit grammar. There are different types of grammars for programming languages, but the most basic kind is called a context-free grammar. It works by defining a series of rules that can be used to combine smaller structures to create larger ones.

Of course, it probably helps to look at an example, so let’s say we wanted to write a grammar for basic arithmetic. There are things we probably already know about arithmetic that we can use to start crafting rules. For instance, we probably know that arithmetic is composed of numbers, so we should define what a number is. How would we do that? Well, we go back to this idea of layering structures on top of each other, so what’s the smallest element of a number? I would argue it’s a digit, and I would define that as follows:

<digit> -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

There are a couple things I want us to focus on in this rule. First, we define something called digit, which we have placed in angle brackets to signify that it can still be broken down. If you’re curious, we call that a nonterminal symbol. Then, the definition reads as a list of possible elements that we can substitute in for digit. In other words, digit can be any number from 0 to 9. Again, if you’re curious, any symbol not in angle brackets is considered a terminal symbol.

Hopefully, at this point, you get a feel for what we need to do next. To define a number, it must either be a single digit or a series of digits. Here’s what that might look like:

<number> -> <digit><number> | <digit>
<digit>  -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

In this case, we have recursively defined the idea of a number by stating that it’s either a single digit or a digit followed by another number. We can also simplify the grammar by taking more of an iterative approach:

<number> -> <digit>{<digit>}
<digit>  -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

In this case, the braces indicate that we can have zero or more of the symbols inside. In other words, we define a number as one or more digits. That’s probably a bit more intuitive, and it gets rid of issues related to recursion.

At this point, I would encourage you to continue building the grammar. For example, you probably already know that adding two numbers together makes a sum, so make a rule called sum. It will probably look something like this: <sum> -> <number> + <number>.

What Makes the Grammar Context-Free?

Unfortunately, almost no programming languages are actually context-free. In other words, typically the rules set in place by a context-free grammar are not enough to know if a program is valid. Here’s a simple example using Python:

x + 1

In this example, we have an expression which is syntactically valid. Much like our grammar for arithmetic above (though, we didn’t bother with the concept of variables), there is probably a set of rules you can apply to this expression and confirm that it’s valid Python code.

However, there’s a major problem: x does not exist. Therefore, the program is not valid. Unfortunately, there is no way to write a context-free grammar that would have caught this error. Instead, you would need something like an attribute grammar to allow you to carry information like variable definitions around the grammar. Or put another way, you would need a rule alongside the arithmetic rule that states that the variable must be defined before it can be used.

Programming Languages Are a Lot of Fun

Hopefully, that was a quick crash course into a little bit of programming language theory. Also, I hope I did a decent job at keeping the material approachable. I’ll be exploring this same topic in a couple weeks with my students, and this should provide them with a little extra material.

All that is to say that programming languages in general are a lot of fun! I’ve been dabbling in them since around 2018 as they’ve become a fun hobby of mine. Nowhere else is that more true than in the Sample Programs repoOpens in a new tab.. These days I’m more of a maintainer than a contributor as the project has vastly outgrown me, but I’d encourage you to dabble in your own programming language exploration of there. And if nothing else, come meet some of the major contributors.

Beyond that, I have to pat myself on the back. This article is set to go out a full three days ahead of schedule, so I’m going spend the rest of the time with my kid!

As always, if you liked this, here are some similar posts:

Likewise, here are some intro to programming resources (#ad):

And finally, you can find all sorts of other resources like our Discord and Patreon here. Otherwise, take care!

ELI5 (3 Articles)—Series Navigation

While I’m not sure of the origins of the phrase “explain like I’m five,” it’s become quite the common phrase curtesy of Reddit and The Office. I am, of course, shamelessly stealing the idea to create a fun coding filled series. Let’s have some fun!

Jeremy Grifski

Jeremy grew up in a small town where he enjoyed playing soccer and video games, practicing taekwondo, and trading Pokémon cards. Once out of the nest, he pursued a Bachelors in Computer Engineering with a minor in Game Design. After college, he spent about two years writing software for a major engineering company. Then, he earned a master's in Computer Science and Engineering. Today, he pursues a PhD in Engineering Education in order to ultimately land a teaching gig. In his spare time, Jeremy enjoys spending time with his wife, playing Overwatch and Phantasy Star Online 2, practicing trombone, watching Penguins hockey, and traveling the world.

Recent Posts