To kick off the new year, I decided to do some website maintenance. In particular, I finally decided to tackle the mess that was my tagging system. Now, instead of rocking 400+ tags, I have just 53 of them. Let’s talk about what I learn and how you can clean up WordPress tags too.
Table of Contents
In the WordPress, there are various ways of organizing content. For instance, there are categories which can be used to set up a hierarchy of content. Let’s say we started a pop culture blog, and we decided that our three main topics would be music, movies, and podcasts. Those three topics would become our website’s categories.
If we had enough content, we might even break those categories into subcategories. For example, perhaps we could break up movies into genres (e.g. horror, comedy, drama, etc.). These new categories would nest under the core category, movies. Then, we might include these categories along our main navigation bar. If a user were to hover over movies, they’d also see our three subcategories.
Outside of navigation, categories may also show up in the post URL. In my case, I used categories as an indicator in the URL of what kind of content you’ll be reading. For example, this post is filed under meta because it focuses primarily on maintenance of my website—even though I’m structuring this article like a how-to guide.
All of this information about categories is important because it contrasts significantly with another content organization technique known as tagging. Unlike with categories, multiple tags can be used to link together similar types of content. For example, our pop culture blog might have the following two articles:
- Rihanna Knocks it out of the Park in Ocean’s 8
- Top 10 Rihanna Songs of the Decade
One of these articles would be categorized under movies while the other would be categorized under music. That said, there’s clearly a thread linking these two articles together: Rihanna. If we want our users to be able to enjoy all content related to Rihanna, we need some way to group that content. This is where tags come in.
If we keep going with the scenario from above, we should probably create a tag to string together all the Rihanna related content. To do that, we can go right into our editor for one of the articles, type “Rihanna”, and hit enter. That will create the tag and apply it to that post only:
If we want to add that tag to all Rihanna related content, we’ll have to take advantage of the bulk edit feature under “All Posts”:
With this window, we can add our tag to the “Tags” box and click “Update”.
Naturally, we can to do this for any tag we like. On a pop culture blog, we might have tags for various celebrities and artists like Justin Bieber, Lady Gaga, and more. Likewise, we might have tags for studios, record labels, directors, movies, Netflix series, etc.
If we don’t plan out our content, we may find ourselves adding these sort of tags any time we make an article. For example, maybe we write an article about the latest Godzilla movie and give it the following tags:
- Millie Bobby Brown
- Kyle Chandler
- Warner Bros. Pictures
To us, all these tags make sense because they serve as metadata for our content. In other words, users might be interested in other movies in English, so we figured it was important to include that tag. In fact, we can use this sort of argument for all our tags.
Unfortunately, the sporadic tagging argument doesn’t really hold up. In reality, if we just tag every article with our main keywords, we’re likely to watch our list of tags explode. For instance, let’s say we add five tags to every article for 100 articles. At best, we collect 5 tags—albeit mostly useless. At worst, we’re sitting on 500 unique tags—also useless.
While the sheer number of tags is an issue, there are other issues as well. For example, let’s say we only have 20 tags, but 15 of them are unique. In other words, those tags only link to one article each. What sort of value is that serving our users? Some might say none. I’d argue it’s actually worse than having no tags at all. After all, how annoying would it be to stumble upon a tag that doesn’t bring us to at least one other post?
Beyond the maintenance and usability issues, there are actually other tagging related issues. For instance, tags can be redundant, overlap, or share similar scope. If we have two tags, “Film” and “Films”, how do we know which one to use? How do we know which way to interpret the tag “Film”? Is it like camera film or a movie? What about tags that are synonyms like “Film” and “Movie”?
Finally, there’s one major issue with tags that I alluded to already: they create tag archive pages. Just like categories, every new tag generates a new archive page which can be used to share all articles with the same tag. If we have 500 tags and only 100 articles, we have nearly five times as many pages with almost no value. That’s a bad recipe for success.
WordPress Tagging Strategies
Clearly, all sorts of issues can come up when tagging, and it can be hard to manage this issue once it spirals out of control. That’s why it’s important to have a basic tagging strategy. In this section, we’ll take a look at a few things we can do to manage our tags better.
One tagging strategy that is becoming increasingly common in the WordPress space is to eliminate tags altogether. Since they’re such a pain to begin with, why even bother? After all, what’s the real benefit of tagging?
For most people, this is probably the way to go. Instead of managing tags, just don’t use them at all. That way, there are no tag archive pages to manage, and there’s no potential negative hit to SEO.
Of course, if we choose not to use tags, we lose out on the potential benefit that tag archives could have on our users. For example, let’s say our users want to learn more about Rihanna. Without a tag page, we’d have to be more deliberate in our internal linking strategy. In other words, we’d have to make sure to link new Rihanna content in older articles which can be a tiring but rewarding process. Instead, we might opt for tags.
Track Keywords in a Spreadsheet
One of the challenges with tagging is that you can’t just start tagging without a plan. In other words, it’s a good idea to take an inventory of your content and track keywords in a spreadsheet. That way, you can get an idea of what articles are related. Then, you can come up with tags.
To continue with our pop culture blog example, I might list out every blog post in a spreadsheet. For instance, we already mentioned the two Rihanna related articles:
- Rihanna Knocks it out of the Park in Ocean’s 8
- Top 10 Rihanna Songs of the Decade
Place these in separate rows and add a few columns for metadata:
|Rihanna Knocks it out of the Park in Ocean’s 8||Movies||Rihanna, Comedy|
|Top 10 Rihanna Songs of the Decade||Music||Rihanna, Pop|
Then, we might write a small function which totals each keyword and places the results in a separate table:
From this table, we can start to see what could make for a good tag. With a blog this small, we might not want to create any. However, as we build up our catalog, we can start to see trends in our content. If we find the right niche, the tags should pretty much write themselves.
Plan Out Niche
Instead of letting tags come out organically, we might try planning out our niche first. In other words, we could decide our first 30 articles based on our vision of the future site. Then, we can anticipate what some our tags will be.
Again, using our pop culture blog example, we might decide that we want to focus on Eminem related content. If that’s the case, we can start picking our keywords before we write. Naturally, those keywords can flow right into tags.
In this case, it’s probably still a good idea to draft a spreadsheet in the same form as above—even if the content is tentative. At least that way, we know we aren’t constructing tags in vain.
WordPress Tag Cleaning Strategies
Note: Before you go through this process, understand that it’s not necessarily linear. For example, you may find duplicate tags in the process of merging smaller tags into more general topics. As a result, you should think about doing several passes of your tags for each of the following steps. Eventually, you’ll be able to reduce your overall tag count to something manageable.
Unfortunately, most of you are probably here because you didn’t do any planning. Luckily for you, I was in this exact situation at the start of the year. In fact, I probably had over to 400 tags with about half of them unique.
Assess the Situation
If you don’t believe me, here’s my coverage according to Google Search Console:
Essentially, this is a graph of every known page on my website over the past three months. As you can see, Google has indexed roughly 958 pages on a blog with only about 330 articles. As of January first, there were 675 pages indexed that were also submitted in my sitemap:
If you take a look at what content is being indexed, you’ll see a huge list of URLs. With a quick Google search, I was able to write the following Excel formula which computes the number of URLs that include “/tag/”:
According the formula, there were 261 pages. Naturally, I had to run the same formula on the indexed URLs that weren’t submitted. In that case, the formula returned 109. Naturally, I decided to run the same formula on the list of URLs that weren’t indexed. That returned 335!
At that point, I realized my mistake. Apparently, all pages (including tag pages) have RSS feeds. Naturally, these feeds are not submitted or indexed, so I had to find a way to exclude them from my count. To do that, I ran a similar formula that didn’t count anything including “/feed/”:
=COUNTIFS(A:A, "*/tag/*", A:A, "<>*/feed/*")
My thought behind this formula was to count everything that including “/tag/” and ignore everything that included “/feed/”. As a result, it returned 96. If we add the three totals together, we get something like 466 which seems right.
Naturally, you don’t have to go through this much work to figure out how many tags you have. You can go straight to the tags page and check for yourself. In my case, I started writing this piece after I had already finished, so my tag page looks like this:
At any rate, let’s talk about how we can actually go about achieving this kind of reduction.
One way we can begin to reduce our number of tags is to consolidate duplicate tags. In other words, we need a way to identify tags that are similar and merge them.
On the first pass, we might try sorting the tags alphabetically. That way, we can catch plural and singular forms of the same term. In addition, we might be able to catch some misspellings or synonyms. When we do, we need to find each article with the incorrect tag and replace that tag with the correct one.
To do this, we can take advantage of the bulk edit tool. First, we’ll identify a tag that we want to replace, go to posts, and search for that tag. Then, we’ll select all articles with that tag and bulk add the correct tag. At that point, it’s just a matter of deleting the incorrect tag.
Feel free to follow this process for a little while. As you grow more acquainted with your massive catalog of tags, you’ll find that it becomes easier to spot duplicates.
After passing over our tags a few times, we should be able to easily identify tags that are useless. For example, there are probably dozens of tags that were created for one-off topics. We can delete them outright.
Before we get too delete happy, it might be a good idea at this point to take a look at the articles with junk tags. Are these old articles that aren’t really relevant to our readers, or are they still related to the rest of our content? If the answer is the latter, then we should probably apply appropriate tags while removing the junk ones.
As this junk removal process continues, we should begin to see our network of connected posts grow. In other words, each junk tag that gets removed is replaced by zero or more proper tags. As a result, old content finds its way in front of new eyes through appropriate tagging.
Over time, we’ll get more comfortable with the existing tags. This will set us up well for the next step.
For me, the process of removing junk and duplicate tags was great, but it wasn’t helping me deal with my main issue: most of my tags were still extremely valid. For instance, I had tags for just about every piece of the software development life cycle (e.g. pull requests, continuous integration, deployment, version control, etc.).
Unfortunately, these tags were so narrow that there may only have been two or three posts that even mentioned the topic. As a result, I decided to merge many of them into my more general Software Development tag.
For our pop culture blog example, that means going through our tags and determining what the general themes and ideas are. For instance, if our blog focuses on Rihanna, we might have tags down to song titles (e.g. Umbrella, Shut Up and Drive, etc.). Clearly, we could abstract them a bit into album titles. Then, if we end up writing 10+ articles on Umbrella, we could reintroduce the tag.
As this merging process continues, we’ll start to notice more overall trends for the site. These trends are important for understanding the identity of our own site. In the end, it becomes a cool meta-analysis of our own writing.
For me, one of the biggest benefits of going through this process is knowing all of my tags. As a result, I can use them to inspire my next article. For instance, one of my lowest count tags is Operators. As a result, I might try to create more content in that area. If I can’t come up with anything, I might delete that tag altogether.
In addition, knowing all my tags helps me figure out where future articles fit in my website structure. For example, an article teaching readers how to create a git repo falls right into my Version Control and Open Source Software tags.
Finally, I think knowing all my tags has helped me go through old articles for tagging purposes. In other words, I’m able to bring older content back into the mix by associating it with dozens of related articles. In the past, that article might not have many related posts in its network. Now, it leads people to boatloads of fresh content.
If your WordPress theme supports tag descriptions, I recommend adding them. While tag archives alone can helpful for directing readers around your site, the descriptions can be a great opportunity to explain the tag or even direct the reader to the most important content in that archive.
Personally, I use the descriptions to help give myself bounds for the tag, but that doesn’t stop me from also using them as calls to action. For example, here is my current description for the Hello World tag:
Hello World is a common program used to introduce language syntax. Naturally, the goal of the program is to print the “Hello, World!” string to the user. Articles with this tag demonstrate just that.
If you’re looking for somewhere to start, check out one of these articles:
If you just want to see some sample Hello World code, I have an entire GitHub repository dedicated to that sort of thing.
Naturally, this sort of description can set your archive page apart and perhaps even encourage Google to index it. After all, I’d love for my Python archive to show up for people looking for Python resources.
More realistically, however, I think this description just helps the page seem more valuable to Google. I don’t necessarily want it to rank well. I just don’t want it to contribute negatively to my site overall. After all, Google doesn’t like a high ratio of low quality pages.
At this point, that’s I’ve got. As a result, I figured I’d summarize some of the major points from this article, and let you get to work!
First, tags are a way of organizing content—similar to categories. However, instead of being hierarchical in nature, tags are meant to provide more of a network structure. After all, articles in different categories can still be connected by topics (e.g. a film article about Rihanna’s role in Ocean’s 8 and a music article about Rihanna’s song Umbrella).
Unlike categories, however, tags tend to get out of hand quickly. After all, what’s the appropriate scope for a tag? Luckily, there are some basic strategies for dealing for tags:
- Don’t—forget ’em
- Take note of existing articles and find connecting threads
- Plan out niche and use general tags to indicate connecting threads
Of course, if you already have way too many tags like I did, consider using the following strategies:
- Consolidate duplicate tags like synonyms (e.g. hat vs. cap), singulars and plurals (e.g. cat vs. cats), and misspellings (e.g. housing vs. hosing)
- Remove junk tags like overly specific keywords from one-off articles (e.g. python list comprehensions)
- Merge specific tags (e.g. flyers, ads, etc.) into more broad tags (e.g. marketing)
Eventually, you should get all your tags in order! Otherwise, feel free to share your own tips below in the comments.
If you enjoyed this piece, help me out by hopping on my mailing list, becoming a patron, or browsing the shop—every little bit helps! In addition, you can check out some of these related articles:
Otherwise, thanks for stopping by. I appreciate it!
Today, I'm whipping out some philosophy jargon to characterize some of the problems I see in the tech education community.
Have you ever wondered how Python's power function works internally? Well, I took a stab at it!