How to Automate Your GitHub Wiki

About a month or so ago, I posted an update about how we’ve automated our GitHub wiki for the Sample Programs in Every Language project. In this article, we’ll cover exactly how we did it and how you can do it too.

Overview
Python Scripting
Continuous Integration
- Build Script
- Build Log
Alternatives

Overview

As a word of caution, the following solution provided in this article is by no means the standard or de facto solution to automating a GitHub wiki. In fact, I may have mentioned in the update that we considered a few alternatives.

That said, the following article details how we implemented the solution, and you’re free to do the same. After all, the solution is kind of fun. Though, it’s a little clunky.

In general, our solution to automating the wiki came in two parts: scripting and continuous integration.

In terms of scripting, we used a Python script to gather data about our repository and generate a wiki in Markdown. Scripting alone isn’t enough to automate the solution as far as I can tell. In other words, the Python script only generates the wiki files. It does not upload them to GitHub. That said, I don’t see why it couldn’t.

Once we had the script, we used a continuous integration tool called Travis CI to trigger a fresh build of the wiki every time a commit is made to master. In other words, we never have to touch the wiki again.

At this point, let’s get into the details a bit.

Python Scripting

In order to understand how the Python Script works, we’ll have to get an understanding of the directory structure of the Sample Programs repo.

Directory Structure

Currently, all code snippets sit several layers deep in the repo. The typical trajectory through the repo from the top layer to the script looks as follows:

Top -> archive -> [some letter] -> [some language] -> [some script]

Or if it makes more sense, here’s a cross-section of the collection:

archive
|--a
|  |--ada
|  |  |--README.md
|  |  |--hello-world.ada
|--b

As you can see, the archive folder contains 26 folders—one for each letter of the alphabet. Under each letter folder, we’ll find a list of language folders which share the same first letter. In each language folder, there’s a set of scripts as well as a README file and occasionally a Dockerfile.

Next, we’ll take a look at the wiki plan, so we can figure out how to connect the dots.

Wiki Structure

With the directory structure in mind, all we had to do was determine what we wanted our wiki to look like, so we knew what data to collect.

In general, I just wanted to clone what already existed thanks to Alcha, one of our contributors. In their design, the wiki was composed of 27 pages: 1 alphabetical list and 26 letter pages.

The alphabetical list would contain at least the links to the 26 letter pages. From there, each letter page would contain a list of the languages for that letter.

In other words, the wiki structure would almost mirror the directory structure exactly. Of course, that’s not very exciting on its own, so we added data columns like the number of snippets per language as well as links to the open issues and the various articles.

Now, let’s dig into the Python code.

Python Structure

To be honest, I’m not super proud of the Python solution as it’s very much a quick and dirty solution. In other words, it’s clunky, so I won’t be copying the entire solution here. Instead, I’ll be sharing some of the general concepts.

To start, the Python solution models each part of the entire system using objects. For example, the following objects were used in the solution: Repo, Wiki, Page, and Language.

Repo

The Repo object models the Sample Programs repo:

class Repo:
    def __init__(self):
        self.source_dir: str = os.path.join("..", "archive")
        self.languages: List[Language] = list()
        self.total_snippets: int = 0

In general, this object keeps track of a list of Language objects and allows for operations like computing the total number of code snippets in the repo. Of course, the main functionality of Repo is to traverse the repo and collect data.

Language

The Language object tracks data related to the various language folders in the Sample Programs repository:

class Language:
    def __init__(self, name: str, path: str, file_list: List[str]):
        self.name: str = name
        self.path: str = path
        self.file_list: List[str] = file_list
        self.total_snippets: int = 0
        self.total_dir_size: int = 0

For example, it tracks data like the language name, a list of files, and the directory size. These data points are used to generate the wiki pages.

Wiki

Once we’ve generated our Repo object, we’re able to start generating the Wiki object:

class Wiki:
    def __init__(self):
        self.repo: Repo = None
        self.wiki_url_base: str = "/jrg94/sample-programs/wiki/"
        self.repo_url_base: str = "/jrg94/sample-programs/tree/master/archive/"
        self.tag_url_base: str = "https://therenegadecoder.com/tag/"
        self.issue_url_base: str = "/jrg94/sample-programs/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+"
        self.pages: List[Page] = list()

This object takes the Repo and uses it to create wiki pages. In general, there are two phases of wiki generation: the alphabet catalog and the alphabet pages.

Page

Finally, the Page object represents a wiki page:

class Page:
    def __init__(self, name: str):
        self.name: str = name
        self.wiki_url_base: str = "/jrg94/sample-programs/wiki/"
        self.content: List(str) = list()

In general, a wiki page is composed of a name, a URL, and a list of data rows. When it’s time to generate the physical pages, we create a file using the name field, generate a string from the content, and output that string to our new file.

Solution

Using everything we learned so far, the following code is enough to generate the wiki:

if __name__ == '__main__':
    wiki = Wiki()
    wiki.build_wiki()

As stated previously, the solution works by gathering data about the repo and storing it in objects. Then, we use those objects to build up the Markdown pages for the wiki.

Currently, there’s a lot of hardcoding and other trickery to get this to work, but it works. If you’re interested in contributing to the wiki, check out the source code for the generator.py script.

Continuous Integration

While the Python script is wonderful, it doesn’t actually automate the wiki building on its own. That’s why I opted for continuous integration using Travis CI.

Travis CI works by tapping into our GitHub pull request and commit system. Every time a pull request or commit is made, a build is triggered. Usually, teams use Travis CI for testing, but I figured we could leverage it for building and deploying our own wiki.

Build Script

For Travis CI to know what to do, we have to provide it with a build script. Fortunately, the build script is simple enough to include here:

language: python
python:
  - "3.6"

branches:
  only:
    - master

script:
  - cd tools
  - python generate-wiki.py
  - cd ..

after_success:
  - cd ..
  - git clone "https://${GITHUB_TOKEN}@github.com/${GITHUB_USERNAME}/sample-programs.wiki.git"
  - mv -v sample-programs/tools/wiki/* sample-programs.wiki/
  - cd sample-programs.wiki
  - git add .
  - git commit -m "Generated Wiki via Travis-CI"
  - git push

notifications:
  email: false

Every time a pull request or commit occurs in the repo, Travis CI will pull this script and run it.

Of course, what exactly does this script tell Travis CI? Well, for starters, there are several configuration headings which all come together to specify the build parameters:

language
branches
script
after_success
notifications

In the following sections, we’ll briefly cover each of those headings.

Language

As you probably image, the language tag specifies the language to be loaded on the build machine:

language: python
python:
  - "3.6"

In this case, we’ve chosen to specify our target language as Python 3.6.

Branches

The branches heading can be used to specify which branches to include or exclude when building:

branches:
  only:
    - master

In our case, we only want builds to occur on the master branch. More specifically, we want to exclude pull requests and focus only on commits to master. That way, we aren’t rebuilding the wiki every time someone makes or changes a pull request.

It’s important to note that I had to specifically uncheck “Build pushed pull requests” in the Travis CI settings to get the exact behavior I wanted.

Script

The script heading is where the build actually occurs:

script:
  - cd tools
  - python generate-wiki.py
  - cd ..

In this case, we’re defining three commands that are to be run as bash commands. As you can see, the first thing we do is move to the Python working directory. Then, we execute our wiki generation script, and move back out to our initial location.

If the wiki generation fails for whatever reason, we won’t move on, and the build will fail.

After Success

Naturally, the after_success heading is used to specify actions after the success of the build:

after_success:
  - cd ..
  - git clone "https://${GITHUB_TOKEN}@github.com/${GITHUB_USERNAME}/sample-programs.wiki.git"
  - mv -v sample-programs/tools/wiki/* sample-programs.wiki/
  - cd sample-programs.wiki
  - git add .
  - git commit -m "Generated Wiki via Travis-CI"
  - git push

In our case, we’ve specified several bash commands—some of which leverage environmental variables.

To start, we move out of the current git directory before cloning the wiki. That way, both projects are side-by-side. It’s important to note that I had to generate a special GitHub token to be able to do this.

Next, we grab all the wiki files from the Python script, and we move them into the wiki. As a result, we overwrite any files with the same name.

When we’re done, we navigate into the wiki directory, add the files to git, commit the files, and push those files to GitHub. Then, we’re done! The wiki is now fully automated.

Notifications

Finally, we have the notifications heading which I used to turn off emails:

notifications:
  email: false

Personally, I found the emails kind of annoying.

Build Log

As mentioned already, the build script above sets the parameters surrounding the build. However, to run the actual build script, we’ll need to commit something to master.

Once the build is triggered, we’ll see a config that looks like the following:

{
  "os": "linux",
  "dist": "trusty",
  "group": "stable",
  "python": "3.6",
  "script": [
    "cd tools",
    "python generate-wiki.py",
    "cd .."
  ],
  "language": "python",
  "after_success": [
    "cd ..",
    "git clone \"https://${GITHUB_TOKEN}@github.com/${GITHUB_USERNAME}/sample-programs.wiki.git\"",
    "mv -v sample-programs/tools/wiki/* sample-programs.wiki/",
    "cd sample-programs.wiki",
    "git add .",
    "git commit -m \"Generated Wiki via Travis-CI\"",
    "git push"
  ]
}

In addition, we’ll see a rather lengthy log that I won’t bother to share here. Instead, I’ll share a link to what you might expect a build to look like.

Alternatives

With the Python script being automated by Travis CI, we’re done!

Of course, there are several other ways to accomplish what we got to work. For instance, we could have accomplished almost everything in the Python script alone including all the git commands. We would just need some way to automate it. Perhaps a Cron job could do.

Alternatively, we could have built a Probot which would act like a regular user. To be honest, we may move this direction in the future. For now though, we’re happy with our solution.

If you’d like to help with the wiki automation, check out the Sample Programs repo. While you’re at it, why not subscribe to The Renegade Coder? I appreciate the support!

How to Automate Your GitHub Wiki

Table of Contents

Overview

Python Scripting

Directory Structure

Wiki Structure

Python Structure

Repo

Language

Wiki

Page

Solution

Continuous Integration

Build Script

Language

Branches

Script

After Success

Notifications

Build Log

Alternatives

Recent Posts