Who Gets to Decide What Is and Isn’t a Programming Language?

One of the most annoying discussions that crops up in tech circles all the time is whether something is or isn’t a programming language. For example, you’ve probably heard someone say “of course HTML isn’t a programming language” before rattling off some silly rationale. In fact, I just saw the following comment today:

SQL:1999 is turing complete, so yes it’s a programming language.

Now, I raise this example because these sort of discussions always include these matter of fact statements. Yet, no one can seem to agree on whether or not something is or isn’t a programming language.

Obviously, this has bothered me for quite some time. After all, I once wrote an article about this exact subject where I gave a fairly lenient definition for the term “programming language.” As a result, I don’t intend to rehash the discussion around what I think a programming language is. Instead, I want to take a moment to sort of critique the common arguments I see.

Programming Language Classification Strategies
If You Must Draw a Line
What’s the Point?

Programming Language Classification Strategies

To start, I want to pose a thought experiment. Imagine that we’re presented with a real world syntax. It could be C, HTML, Python, etc. How would we decide if it’s a programming language or not?

Option A: Create a List of Criteria

Like most people, we might first decide to list off some criteria. For example, maybe we’d use the argument from above that states that the language is or isn’t Turing Complete. Alternatively, we might list off certain language features like control flow as evidence. Here’s an example:

A programming language must be:

Turing Complete
Applicable to the real world
Compiled or interpreted
Currently relevant
General purpose

Over time, what we’ll find is that any set of criteria will yield some false positives (i.e. languages we believe shouldn’t be included but are) and false negatives (i.e. languages we believe should be included but aren’t).

Unfortunately, the more we refine our criteria, the more ridiculous the criteria becomes. For instance, we may find that CSS meets our criteria because it’s Turing Complete, but we don’t believe it belongs because that wasn’t the “intent of the language.” So, we end up adding some weird “real world applicability” criteria. Suddenly, we can no longer include any esoteric languages.

Option B: Create a List of Possible Criteria

At this point, we grow frustrated as the complexity of criteria grows. That’s when we have a great idea: what if we define a set of criteria where the language only has to pass x% of the rules to pass?

For example, let’s say we have five rules to classify something as a programming language. As long as a language meets four criteria, we say it’s a programming language. As a result, languages that may have failed the “Turing Complete” criteria but passed all other criteria could be included in the set. Here’s what that might look like:

A programming language must meet at least four (4) of the following five (5) criteria:

Turing Complete
Applicable to the real world
Compiled or interpreted
Currently relevant
General purpose

Unfortunately, this type of system leads to significantly more false positives, and we can’t have that! Heaven forbid a garbage tier language like CSS or HTML makes its way into the list, right? So, we’re stumped.

Option C: Create a List of Scoring Criteria

If we’re particularly frustrated at this point, we could agree that perhaps languages should appear on a spectrum. For example, we could put together a list of criteria which allows us to score a language. In other words, a score would indicate to what degree a language would be considered a programming language. Naturally, higher scores would be better.

Again, however, we run into this issue of criteria. Who gets to define them? How do we ensure that all criteria account for equal weight? If that’s not possible, who gets to decide the weights?

To make matters worse, defining programming languages on a spectrum does nothing for the folks who want to gatekeep languages. After all, a spectrum works under the assumption that all languages are programming languages—unless there’s some sort of exclusion criteria.

Rant aside, it’s clear that I often find these sort of discussions frustrating because people can use whatever criteria they want to include or exclude a language. Often times, this process of exclusion is then used to gatekeep an entire group of developers. Certainly, you’ve heard someone say that people who program in “x” aren’t real programmers. Hell, here’s an example from Reddit. Luckily, there’s another option.

Option D: Do Nothing

One thing people neglect to do in these language discussions is realize that a language is just a grammar. In other words, the language itself doesn’t do anything. What we’re really arguing about is a language’s semantics (i.e. what the syntax actually means). Oddly enough, semantics depend entirely on us.

The argument I’m trying to make here is that languages derive their meaning from the tools that make sense of them. In other words, nothing is stopping me from writing an interpreter for Python that swaps the semantics of addition and subtraction.

Ultimately, what we’re arguing over is the community-defined semantics of a language. Sometimes these semantics aren’t well defined which leads to weird cases where a statement is treated differently by two different compilers. To make matters worse, these semantics (and even sometimes the syntax itself) change over time.

As a result, if we create a set of programming language criteria, we may find that some languages follow it today but not tomorrow. Hell, some criteria might change between implementations of the compiler/interpreter.

So, here’s what I propose: stop obsessing over this topic. I see no tangible benefit to classifying languages in this way.

If You Must Draw a Line

However, if for some reason you feel the need to classify a language, ask yourself the following question:

Can a program (i.e. anything that can be converted into machine code) be written in this language?

For some languages, this question is easy to answer. Clearly, I can write some code in Python that can be executed by a computer.

However, what about a language like HTML? Can you write a program in that? Of course! Just because a language doesn’t provide the semantics for instructions doesn’t mean there’s no way to derive them (see: declarative programming). If I give a carpenter my requirements for a house, do I need to give them the steps to build one? Of course not.

The only time this sort of distinction gets blurry is when we start talking about languages that don’t have community-defined semantics. For example, file formats like text, JSON, and XML are all meant for storing data. As far as I know, we have not agreed on any universal semantics for these languages.

That said, I’ve definitely seen folks use data files to drive logic in programs. For example, many desktop applications include some sort of user configuration file. Is that not similar to how HTML is used to construct web pages? Sure, the semantics depend on the application, but I’d argue that that distinction is sort of meaningless. After all, the entire point of a data file is to drive logic.

If you read my other article, you already know how this ends. We can essentially argue that anything is a programming language because it’s possible for any data to be interpreted as a set of instructions. After all, isn’t that what machine learning is all about—finding patterns in data?

What’s the Point?

Funnily enough, as I stretch the definition of programming language, I find the distinction no more nonsensical than when we started. Making the argument that YAML is a programming language feels just as arbitrary as claiming MATLAB isn’t.

In other words, who gets to decide what is and isn’t a programming language? Certainly, I shouldn’t, but neither should anyone else. All it serves to do is alienate a group of people from the community. Don’t believe me? Take a look at the way people talk about programming languages on Twitter:

https://twitter.com/Der_Pesse/status/1233295916607643648?s=20

https://twitter.com/Taylorsaurian/status/1054859506012680192?s=20

One thing I find really sad about all these tweets is how close in time they are. The oldest tweet in the thread is from 2018; the rest are from 2020. It’s truly sad when these sort of discussions are so frequent that I can’t even find old examples. That said, I don’t recommend digging into any of the threads in this list. You’ll get really frustrated.

Instead, I encourage you to find ways to lift up folks in the community. For every discussion around the classification of SQL, there should be a discussion around how we can address some of the core issues in our field. Here are just a few:

Gender and Ethnic Diversity (e.g is the tech industry doing enough to address gaps in diversity?)
Accountability and Oversight (e.g. should tech companies have unlimited access to our data?)
Defense Technology (e.g. should tech play a role in military?)

If you know me, you know one of my biggest interests is in elitism and gatekeeping in tech. So naturally, these discussions around what a “real” programming language is or isn’t really hits all the wrong buttons for me. I suppose that’s why I’m planning to do a whole dissertation on this subject.

At any rate, thanks again for listening to my rant. As always, if you liked this article and want to read more like it, head on over to my list of ways to grow the site. There you’ll find my newsletter as well as some other goodies.

Alternatively, you’re welcome to keep reading some of my other rants and whatnot:

Also, if you didn’t feel convinced by this article, here are some folks saying the same thing:

Otherwise, thanks for stopping by! Take care.