The Complete Guide to Subete: A Python Library for Browsing Code Snippets

The Complete Guide to Subete: A Python Library for Browsing Code Snippets Featured Image

To kick off my new series on Python libraries, I figured I’d start with a library of my own: subete. After all, I know it in and out, so I figured I could really write up some useful docs. Let’s get into it!

Table of Contents

What Is Subete?

SubeteOpens in a new tab. is a library that I created to allow for programmatic browsing of code snippets in the Sample Programs repo. It was originally designed for writing up documentation automatically, but it’s expanded into its own tool for many uses.

At the moment, I use it in the following places:

All of the code in the Sample Programs repository is under the MIT license, so you’re free to use Subete to look up and use code snippets however you like. I use it for various projects related to the Sample Programs repo, but I also use it in one of my Discord bots. It’s cool to be able to pull up a random code snippet at any time.

How Do I Install Subete?

Unfortunately, Subete is a 3rd party package, so you will have to install it yourself. That said, Subete is pretty straightforward to install. Like most Python packages, you can install it using `pip`:

pip install subete

Here’s what you should see:

C:\Users\jerem>pip install subete
Collecting subete
  Using cached subete-0.9.3-py3-none-any.whl (9.6 kB)
Collecting PyYAML>=5
  Using cached PyYAML-6.0-cp310-cp310-win_amd64.whl (151 kB)
Collecting GitPython>=3
  Using cached GitPython-3.1.27-py3-none-any.whl (181 kB)
Collecting gitdb<5,>=4.0.1
  Using cached gitdb-4.0.9-py3-none-any.whl (63 kB)
Collecting smmap<6,>=3.0.1
  Using cached smmap-5.0.0-py3-none-any.whl (24 kB)
Installing collected packages: smmap, PyYAML, gitdb, GitPython, subete
Successfully installed GitPython-3.1.27 PyYAML-6.0 gitdb-4.0.9 smmap-5.0.0 subete-0.9.3

At the time of writing, the latest version of Subete was 0.9.3, so all of the documentation will be based on this. Future versions of the library may have additional or removed features.

How Do I Use Subete?

To be able to use Subete, you need to make sense of it’s structure.

Subete Structure

First thing to note is that Subete is object-oriented. Specifically, it’s designed to mirror the Sample Programs repository. Therefore, the primary Subete object is the `Repo` object.

Inside the `Repo` object, you’ll find a list of `LanguageCollection` objects. A `LanguageCollection` is how its name sounds: a collection of programs for a specific language. There should be a few hundred of these as the Sample Programs repo supports a couple hundred languages.

Finally, inside each `LanguageCollection` is a list of `SampleProgram` objects. These objects represent each individual program in the repository. In total, there are around 600 of these at the time of writing.

Generating a Repo Object

Out of the box, Subete has a single function, `load()`, which creates a `Repo` object. You can use it as follows:

import subete

repo: subete.Repo = subete.load()

And to prove it works, here’s what you might see in IDLE:

import subete

subete.load()
<subete.repo.Repo object at 0x0000020C75829E10>

As currently constructed, this function will generate all of the language collections and sample programs from the latest version of the Sample Programs repository. This takes quite a bit of time as the repo takes time to download. If you have a copy of the repo already downloaded, you can speed up the process as follows:

import subete

repo: subete.Repo = subete.load(source_dir="path/to/sample-programs/archive")

Instead of downloading the git repo, you can point the load function to an existing repo. From there, the language collections and sample programs will be generated.

Using the Repo Object

The `Repo` object is somewhat limited in the types of things you can do with it. As of right now, you can only use this object to lookup information about the repository. For example, the following functions can be used to learn about the repo:

# Returns a list of Project objects that are supported
projects = repo.approved_projects()

# Returns a random SampleProgram object from the Repo
program = repo.random_program()

# Returns the number of approved projects
count = repo.total_approved_projects()

# Returns the number of programs in the Repo
count = repo.total_programs()

# Returns the number of tested languages in the Repo
count = repo.total_tests()

In addition, there are a handful of convenience methods that can be used to get collection information:

# Returns all of the languages that start with a certain letter
langs = languages_by_letter('p')

# Returns a sorted list of letters with languages in the Repo
letters = sorted_language_letters()

That said, here’s what you might see if you actually run these methods:

repo.approved_projects()
[<subete.repo.Project object at 0x0000020C75F0EA70>, <subete.repo.Project object at 0x0000020C75F0EB00>, <subete.repo.Project object at 0x0000020C75F0EB90>, <subete.repo.Project object at 0x0000020C75F0EA40>, <subete.repo.Project object at 0x0000020C75F0E800>, <subete.repo.Project object at 0x0000020C75F0EBC0>, <subete.repo.Project object at 0x0000020C75F0EAA0>, <subete.repo.Project object at 0x0000020C75F0E7A0>, <subete.repo.Project object at 0x0000020C75F0E770>, <subete.repo.Project object at 0x0000020C75F0E8F0>, <subete.repo.Project object at 0x0000020C75F0E8C0>, <subete.repo.Project object at 0x0000020C75F0E890>, <subete.repo.Project object at 0x0000020C75F0F070>, <subete.repo.Project object at 0x0000020C75F0F040>, <subete.repo.Project object at 0x0000020C75F0EE00>, <subete.repo.Project object at 0x0000020C75F0ED40>, <subete.repo.Project object at 0x0000020C75F0ECB0>, <subete.repo.Project object at 0x0000020C75F0F160>, <subete.repo.Project object at 0x0000020C75F0F1C0>, <subete.repo.Project object at 0x0000020C75F0F220>, <subete.repo.Project object at 0x0000020C75F0F280>, <subete.repo.Project object at 0x0000020C75F0F2E0>, <subete.repo.Project object at 0x0000020C75F0F340>, <subete.repo.Project object at 0x0000020C75F0F3A0>, <subete.repo.Project object at 0x0000020C75F0F400>, <subete.repo.Project object at 0x0000020C75F0F460>, <subete.repo.Project object at 0x0000020C75F0F4C0>, <subete.repo.Project object at 0x0000020C75F0F520>, <subete.repo.Project object at 0x0000020C75F0F580>, <subete.repo.Project object at 0x0000020C75F0F5E0>, <subete.repo.Project object at 0x0000020C75F0F640>, <subete.repo.Project object at 0x0000020C75F0F6A0>, <subete.repo.Project object at 0x0000020C75F0F700>, <subete.repo.Project object at 0x0000020C75F0F760>, <subete.repo.Project object at 0x0000020C75F0F7C0>, <subete.repo.Project object at 0x0000020C75F0F820>, <subete.repo.Project object at 0x0000020C75F0F880>, <subete.repo.Project object at 0x0000020C75F0F8E0>, <subete.repo.Project object at 0x0000020C75F0F940>, <subete.repo.Project object at 0x0000020C75F0F9A0>]

repo.random_program()
<subete.repo.SampleProgram object at 0x0000020C75F0FCD0>

repo.total_approved_projects()
40

repo.total_programs()
617

repo.total_tests()
37

Outside of these methods, the `Repo` object is fairly limited. Up next, we’ll learn how to iterate over all the languages in the repo.

Traversing the Language Collections

One thing you might notice is that the `Repo` object has not methods for getting the list of language collections. That’s because the `Repo` object can actually be iterated over directly:

for language in repo:
  print(language)

It can also be searched from directly:

python = repo["Python"]

The lookup functionality is a bit more rigid and relies on knowing the exact String used for the language. Currently, lookup uses titlecase for all of the languages, but not all of the languages can be looked up as expected (e.g., Javascript vs. JavaScript). That said, if you happen to know the right key, you should have no problems.

With that said, here’s what the code actually looks like when executed:

for language in repo:
    print(language)

Abap
Ada
Agda
Algol68
...
Visual Basic
Whitespace
Wren
Wu
Wyvern
Zig

repo["Python"]
<subete.repo.LanguageCollection object at 0x0000020C75FDDF90>

Up next, we’ll look at how to make use of these `LanguageCollection` objects.

Using the LanguageCollection Objects

Once you’ve obtained the `LanguageCollection` you’ve been looking for, you’ll have access to a series of functions that might be useful. For example, here are all the functions related to testing:

# Returns true if the language has a testing file
state = language.has_testinfo()

# Returns the actual testing file
test = language.testinfo()

# Returns the testinfo file URL
url = langauge.testinfo_url()

Likewise, there are useful functions for collecting data about a particular language:

# Returns the total number of lines of code in that particular language
count = language.total_line_count()

# Returns the total number of programs in that language
count = language.total_programs()

# Returns the total size of the language in bytes
size = language.total_size()

Similarly, there are a handful of useful URL methods for linking up to data related to that language:

# Returns the language documentation URL
url = language.lang_docs_url()

# Returns the testinfo URL (mentioned already)
url = language.testinfo_url()

In addition, if you’re interested in adding programs to this specific language, you can always look up which programs are missing and how many there are:

# Returns the number of missing programs for this language
count = language.missing_programs_count()

# Returns the list of missing programs for this language
count = language.missing_programs()

Finally, there are a couple of methods you can use to retrieve the name of the language:

# Returns the name of the language as it would be used in Repo lookup (e.g., Python, C++)
name = language.name()

# Returns the name of the language as it would used in a URL (e.g., python, c-plus-plus)
name = language.pathlike_name()

As you can see, these methods are a bit more friendly as they get at useful details about a language in the repo. As usual, here’s what they all look like in action, using Python as the example language:

python.has_testinfo()
True

python.testinfo()
{'folder': {'extension': '.py', 'naming': 'underscore'}, 'container': {'image': 'python', 'tag': '3.7-alpine', 'cmd': 'python {{ source.name }}{{ source.extension }}'}}

python.testinfo_url()
'https://github.com/TheRenegadeCoder/sample-programs/blob/main/archive/p/python/testinfo.yml'

python.total_line_count()
1248

python.total_programs()
32

python.total_size()
31401

python.lang_docs_url()
'https://sampleprograms.io/languages/python'

python.testinfo_url()
'https://github.com/TheRenegadeCoder/sample-programs/blob/main/archive/p/python/testinfo.yml'

python.missing_programs_count()
8

python.missing_programs()
[<subete.repo.Project object at 0x0000020C75F0F9A0>, <subete.repo.Project object at 0x0000020C75F0F760>, <subete.repo.Project object at 0x0000020C75F0E7A0>, <subete.repo.Project object at 0x0000020C75F0ECB0>, <subete.repo.Project object at 0x0000020C75F0F3A0>, <subete.repo.Project object at 0x0000020C75F0F220>, <subete.repo.Project object at 0x0000020C75F0EAA0>, <subete.repo.Project object at 0x0000020C75F0F280>]

python.name()
'Python'

python.pathlike_name()
'python'

Up next, we’ll take a look at how we can loop over the language collection to see each program.

Traversing the Sample Programs

To keep things consistent, the `LanguageCollection` objects work just like `Repo` objects. As a result, you can iterate over them with ease:

for program in language:
  print(program)

And just like the `Repo` object, a `LanguageCollection` object is subscriptable, meaning it can be indexed:

hello_world = language["Hello World"]

On the output, you’ll get a `SampleProgram` object. Once again, let’s take a peek at what this looks like for real using “Hello World” as the example:

for program in python:
    print(program)

    
Baklava in Python
Binary Search in Python
Bubble Sort in Python
...
Roman Numeral in Python
Rot 13 in Python
Selection Sort in Python
Sleep Sort in Python

python["Hello World"]
<subete.repo.SampleProgram object at 0x0000020C75FDE7D0>

Up next, we’ll learn how to make sense of these SampleProgram objects.

Using the SampleProgram Objects

At the bottom of this colossal data structure is the `SampleProgram` object which represents an individual program in the repo. As a result, each program has a lot of fun features. For example, you can access the code directly:

# Returns the code of a program
code = program.code()

As with languages, sample programs also have a few related URLs you can pull:

# Returns the documentation URL for the project this program is implementing
url = program.documentation_url()

# Returns the URL to a GitHub query for articles related to this program
url = program.article_issue_query_url()

Meanwhile, there are several convenience methods for looking up data about the Sample Program:

# Returns the language collection that this program belongs to
language = program.language_collection()

# Returns the language name in its human-readable form (e.g., Python)
name = program.language_name()

# Returns the language name in its URL form (e.g., python)
name = program.language_pathlike_name()

# Returns the project object associated with this program
project = program.project()

# Returns the project name in its human-readable form (e.g., Hello World)
name = program.project_name()

# Returns the project name in its URL form (e.g., hello-world)
name = program.project_pathlike_name()

And then of course, we wouldn’t be able to survive without some data methods:

# Returns the number of lines in the program
count = program.line_count()

# Returns the size of the program in bytes
size = program.size()

And there we have it, a full breakdown of the `Repo` object and its constituent parts. Here’s the usual rundown with actual code:

hello_world.code()
"print('Hello, World!')\n"

hello_world.documentation_url()
'https://sampleprograms.io/projects/hello-world/python'

hello_world.article_issue_query_url()
'https://github.com//TheRenegadeCoder/sample-programs-website/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+hello+world+python'

hello_world.language_collection()
<subete.repo.LanguageCollection object at 0x0000020C75FDDF90>

hello_world.language_name()
'Python'

hello_world.language_pathlike_name()
'python'

hello_world.project()
<subete.repo.Project object at 0x0000020C75FDE770>

hello_world.project_name()
'Hello World'

hello_world.project_pathlike_name()
'hello-world'

hello_world.line_count()
1

hello_world.size()
24

Finally, in the next section, we’ll talk about this mysterious project object.

Making Sense of the Project Object

In addition to the three objects mentioned previously, there’s actually a fourth object for convenience purposes: `Project`. The `Project` object exists because internally it’s somewhat annoying to handle projects as strings. As a result, I wrapped them in objects.

That said, `Project` object is really only good for a couple of things. First, it’s main purpose is for getting project names in their respective formats:

# Returns project name in human-readable format (e.g., Hello World)
name = project.name()

# Returns project name in URL format (e.g., hello-world)
name = project.pathlike_name()

In addition, you can get the project requirements URL from this object as well:

# Returns the project requirements URL
url = project.requirements_url()

But that’s it! As usual, here’s what these methods actually do, using the Hello World project:

project.name()
'Hello World'

project.pathlike_name()
'hello-world'

project.requirements_url()
'https://sampleprograms.io/projects/hello-world'

Up next, we’ll take a look at how these objects can be used in actual code.

Real World Use of Subete

Now that we’ve taken a chance to look at Subete under-the-hood, here’s are some examples of where I’ve actually used it.

GitHub Profile Automation

First, let’s take a look at my GitHub profile. Each week, I generate a new code snippet using GitHub Actions and a simple Python script:

import subete
from subete.repo import SampleProgram

repo = subete.load()

def get_code_snippet() -> SampleProgram:        
    code = repo.random_program()
    return code

if __name__ == "__main__":
  code = get_code_snippet()

There’s a bit of context missing, but you get the idea. Once I’ve retrieved the code snippet, I dump it into a markdown file using my SnakeMD library (maybe a topic for next time).

Sample Programs Website Automation

Recently, I started automating the Sample Programs website. As a result, I use Subete extensively to do things like this:

def generate_projects_index(repo: subete.Repo):
    projects_index_path = pathlib.Path("docs/projects")
    projects_index = snakemd.new_doc("index")
    _generate_front_matter(
      projects_index,
      projects_index_path / "front_matter.yaml", 
      "Projects"
    )
    projects_index.add_paragraph(
        "Welcome to the Projects page! Here, you'll find a list of all of the projects represented in the collection."
    )
    projects_index.add_header("Projects List", level=2)
    projects_index.add_paragraph(
        "To help you navigate the collection, the following projects are organized alphabetically."
    )
    repo.approved_projects().sort(key=lambda x: x.name().casefold())
    projects = [
        snakemd.InlineText(
            project.name(),
            url=project.requirements_url()
        )
        for project in repo.approved_projects()
    ]
    projects_index.add_element(snakemd.MDList(projects))
    projects_index.output_page(str(projects_index_path))

This function makes the projects page found hereOpens in a new tab.. It makes use of the `approved_projects()` method of `Repo` to get a list of `Project` objects. These objects are then used to generate the projects page using the `requirements_url()` method. And of course, I make use of SnakeMD here as well.

What Other Libraries Would You Like to See?

With all that said, there’s not much else to say about Subete. It’s a library I wrote to navigate the existing code base of the Sample Programs collection. If you like it, I recommend heading over to GitHub to give it a starOpens in a new tab.. Hell, try it out while you’re at it too!

In the meantime, I’d appreciate it if you took a minute to check out my article on ways to grow the site. Google does a pretty terrible job of ranking this style of content, so if you want direct access to it, that link is a good starting place. If you’re still not sure, check out some of these related articles:

Otherwise, that’s all I have! Thanks for stopping by and take care.

The Python Docs You Didn't Know You Needed (2 Articles)—Series Navigation

After toying with Python for a long time, I’m come to realize that both the standard libraries and 3rd party libraries are a major draw of the language. Very rarely do you ever need to write anything from scratch. That said, despite this amazing ecosystem, it can be hard to make sense of all the libraries and the often limited documentation. As a result, I decided to make a series where I share a library and how to is it.

Jeremy Grifski

Jeremy grew up in a small town where he enjoyed playing soccer and video games, practicing taekwondo, and trading Pokémon cards. Once out of the nest, he pursued a Bachelors in Computer Engineering with a minor in Game Design. After college, he spent about two years writing software for a major engineering company. Then, he earned a master's in Computer Science and Engineering. Today, he pursues a PhD in Engineering Education in order to ultimately land a teaching gig. In his spare time, Jeremy enjoys spending time with his wife, playing Overwatch and Phantasy Star Online 2, practicing trombone, watching Penguins hockey, and traveling the world.

Recent Posts