To kick off my new series on Python libraries, I figured I’d start with a library of my own: subete. After all, I know it in and out, so I figured I could really write up some useful docs. Let’s get into it!
Table of Contents
- What Is Subete?
- How Do I Install Subete?
- How Do I Use Subete?
- Real World Use of Subete
- What Other Libraries Would You Like to See?
What Is Subete?
Subete is a library that I created to allow for programmatic browsing of code snippets in the Sample Programs repo. It was originally designed for writing up documentation automatically, but it’s expanded into its own tool for many uses.
At the moment, I use it in the following places:
- The Sample Programs Website
- The Sample Programs READMEs
- The Sample Programs Wiki
- My GitHub Profile README
All of the code in the Sample Programs repository is under the MIT license, so you’re free to use Subete to look up and use code snippets however you like. I use it for various projects related to the Sample Programs repo, but I also use it in one of my Discord bots. It’s cool to be able to pull up a random code snippet at any time.
How Do I Install Subete?
Unfortunately, Subete is a 3rd party package, so you will have to install it yourself. That said, Subete is pretty straightforward to install. Like most Python packages, you can install it using pip
:
pip install subete
Here’s what you should see:
C:\Users\jerem>pip install subete Collecting subete Using cached subete-0.9.3-py3-none-any.whl (9.6 kB) Collecting PyYAML>=5 Using cached PyYAML-6.0-cp310-cp310-win_amd64.whl (151 kB) Collecting GitPython>=3 Using cached GitPython-3.1.27-py3-none-any.whl (181 kB) Collecting gitdb<5,>=4.0.1 Using cached gitdb-4.0.9-py3-none-any.whl (63 kB) Collecting smmap<6,>=3.0.1 Using cached smmap-5.0.0-py3-none-any.whl (24 kB) Installing collected packages: smmap, PyYAML, gitdb, GitPython, subete Successfully installed GitPython-3.1.27 PyYAML-6.0 gitdb-4.0.9 smmap-5.0.0 subete-0.9.3
At the time of writing, the latest version of Subete was 0.9.3, so all of the documentation will be based on this. Future versions of the library may have additional or removed features.
How Do I Use Subete?
To be able to use Subete, you need to make sense of it’s structure.
Subete Structure
First thing to note is that Subete is object-oriented. Specifically, it’s designed to mirror the Sample Programs repository. Therefore, the primary Subete object is the Repo
object.
Inside the Repo
object, you’ll find a list of LanguageCollection
objects. A LanguageCollection
is how its name sounds: a collection of programs for a specific language. There should be a few hundred of these as the Sample Programs repo supports a couple hundred languages.
Finally, inside each LanguageCollection
is a list of SampleProgram
objects. These objects represent each individual program in the repository. In total, there are around 600 of these at the time of writing.
Generating a Repo Object
Out of the box, Subete has a single function, load()
, which creates a Repo
object. You can use it as follows:
import subete repo: subete.Repo = subete.load()
And to prove it works, here’s what you might see in IDLE:
import subete subete.load() <subete.repo.Repo object at 0x0000020C75829E10>
As currently constructed, this function will generate all of the language collections and sample programs from the latest version of the Sample Programs repository. This takes quite a bit of time as the repo takes time to download. If you have a copy of the repo already downloaded, you can speed up the process as follows:
import subete repo: subete.Repo = subete.load(source_dir="path/to/sample-programs/archive")
Instead of downloading the git repo, you can point the load function to an existing repo. From there, the language collections and sample programs will be generated.
Using the Repo Object
The Repo
object is somewhat limited in the types of things you can do with it. As of right now, you can only use this object to lookup information about the repository. For example, the following functions can be used to learn about the repo:
# Returns a list of Project objects that are supported projects = repo.approved_projects() # Returns a random SampleProgram object from the Repo program = repo.random_program() # Returns the number of approved projects count = repo.total_approved_projects() # Returns the number of programs in the Repo count = repo.total_programs() # Returns the number of tested languages in the Repo count = repo.total_tests()
In addition, there are a handful of convenience methods that can be used to get collection information:
# Returns all of the languages that start with a certain letter langs = languages_by_letter('p') # Returns a sorted list of letters with languages in the Repo letters = sorted_language_letters()
That said, here’s what you might see if you actually run these methods:
repo.approved_projects() [<subete.repo.Project object at 0x0000020C75F0EA70>, <subete.repo.Project object at 0x0000020C75F0EB00>, <subete.repo.Project object at 0x0000020C75F0EB90>, <subete.repo.Project object at 0x0000020C75F0EA40>, <subete.repo.Project object at 0x0000020C75F0E800>, <subete.repo.Project object at 0x0000020C75F0EBC0>, <subete.repo.Project object at 0x0000020C75F0EAA0>, <subete.repo.Project object at 0x0000020C75F0E7A0>, <subete.repo.Project object at 0x0000020C75F0E770>, <subete.repo.Project object at 0x0000020C75F0E8F0>, <subete.repo.Project object at 0x0000020C75F0E8C0>, <subete.repo.Project object at 0x0000020C75F0E890>, <subete.repo.Project object at 0x0000020C75F0F070>, <subete.repo.Project object at 0x0000020C75F0F040>, <subete.repo.Project object at 0x0000020C75F0EE00>, <subete.repo.Project object at 0x0000020C75F0ED40>, <subete.repo.Project object at 0x0000020C75F0ECB0>, <subete.repo.Project object at 0x0000020C75F0F160>, <subete.repo.Project object at 0x0000020C75F0F1C0>, <subete.repo.Project object at 0x0000020C75F0F220>, <subete.repo.Project object at 0x0000020C75F0F280>, <subete.repo.Project object at 0x0000020C75F0F2E0>, <subete.repo.Project object at 0x0000020C75F0F340>, <subete.repo.Project object at 0x0000020C75F0F3A0>, <subete.repo.Project object at 0x0000020C75F0F400>, <subete.repo.Project object at 0x0000020C75F0F460>, <subete.repo.Project object at 0x0000020C75F0F4C0>, <subete.repo.Project object at 0x0000020C75F0F520>, <subete.repo.Project object at 0x0000020C75F0F580>, <subete.repo.Project object at 0x0000020C75F0F5E0>, <subete.repo.Project object at 0x0000020C75F0F640>, <subete.repo.Project object at 0x0000020C75F0F6A0>, <subete.repo.Project object at 0x0000020C75F0F700>, <subete.repo.Project object at 0x0000020C75F0F760>, <subete.repo.Project object at 0x0000020C75F0F7C0>, <subete.repo.Project object at 0x0000020C75F0F820>, <subete.repo.Project object at 0x0000020C75F0F880>, <subete.repo.Project object at 0x0000020C75F0F8E0>, <subete.repo.Project object at 0x0000020C75F0F940>, <subete.repo.Project object at 0x0000020C75F0F9A0>] repo.random_program() <subete.repo.SampleProgram object at 0x0000020C75F0FCD0> repo.total_approved_projects() 40 repo.total_programs() 617 repo.total_tests() 37
Outside of these methods, the Repo
object is fairly limited. Up next, we’ll learn how to iterate over all the languages in the repo.
Traversing the Language Collections
One thing you might notice is that the Repo
object has not methods for getting the list of language collections. That’s because the Repo
object can actually be iterated over directly:
for language in repo: print(language)
It can also be searched from directly:
python = repo["Python"]
The lookup functionality is a bit more rigid and relies on knowing the exact String used for the language. Currently, lookup uses titlecase for all of the languages, but not all of the languages can be looked up as expected (e.g., Javascript vs. JavaScript). That said, if you happen to know the right key, you should have no problems.
With that said, here’s what the code actually looks like when executed:
for language in repo: print(language) Abap Ada Agda Algol68 ... Visual Basic Whitespace Wren Wu Wyvern Zig repo["Python"] <subete.repo.LanguageCollection object at 0x0000020C75FDDF90>
Up next, we’ll look at how to make use of these LanguageCollection
objects.
Using the LanguageCollection Objects
Once you’ve obtained the LanguageCollection
you’ve been looking for, you’ll have access to a series of functions that might be useful. For example, here are all the functions related to testing:
# Returns true if the language has a testing file state = language.has_testinfo() # Returns the actual testing file test = language.testinfo() # Returns the testinfo file URL url = langauge.testinfo_url()
Likewise, there are useful functions for collecting data about a particular language:
# Returns the total number of lines of code in that particular language count = language.total_line_count() # Returns the total number of programs in that language count = language.total_programs() # Returns the total size of the language in bytes size = language.total_size()
Similarly, there are a handful of useful URL methods for linking up to data related to that language:
# Returns the language documentation URL url = language.lang_docs_url() # Returns the testinfo URL (mentioned already) url = language.testinfo_url()
In addition, if you’re interested in adding programs to this specific language, you can always look up which programs are missing and how many there are:
# Returns the number of missing programs for this language count = language.missing_programs_count() # Returns the list of missing programs for this language count = language.missing_programs()
Finally, there are a couple of methods you can use to retrieve the name of the language:
# Returns the name of the language as it would be used in Repo lookup (e.g., Python, C++) name = language.name() # Returns the name of the language as it would used in a URL (e.g., python, c-plus-plus) name = language.pathlike_name()
As you can see, these methods are a bit more friendly as they get at useful details about a language in the repo. As usual, here’s what they all look like in action, using Python as the example language:
python.has_testinfo() True python.testinfo() {'folder': {'extension': '.py', 'naming': 'underscore'}, 'container': {'image': 'python', 'tag': '3.7-alpine', 'cmd': 'python {{ source.name }}{{ source.extension }}'}} python.testinfo_url() 'https://github.com/TheRenegadeCoder/sample-programs/blob/main/archive/p/python/testinfo.yml' python.total_line_count() 1248 python.total_programs() 32 python.total_size() 31401 python.lang_docs_url() 'https://sampleprograms.io/languages/python' python.testinfo_url() 'https://github.com/TheRenegadeCoder/sample-programs/blob/main/archive/p/python/testinfo.yml' python.missing_programs_count() 8 python.missing_programs() [<subete.repo.Project object at 0x0000020C75F0F9A0>, <subete.repo.Project object at 0x0000020C75F0F760>, <subete.repo.Project object at 0x0000020C75F0E7A0>, <subete.repo.Project object at 0x0000020C75F0ECB0>, <subete.repo.Project object at 0x0000020C75F0F3A0>, <subete.repo.Project object at 0x0000020C75F0F220>, <subete.repo.Project object at 0x0000020C75F0EAA0>, <subete.repo.Project object at 0x0000020C75F0F280>] python.name() 'Python' python.pathlike_name() 'python'
Up next, we’ll take a look at how we can loop over the language collection to see each program.
Traversing the Sample Programs
To keep things consistent, the LanguageCollection
objects work just like Repo
objects. As a result, you can iterate over them with ease:
for program in language: print(program)
And just like the Repo
object, a LanguageCollection
object is subscriptable, meaning it can be indexed:
hello_world = language["Hello World"]
On the output, you’ll get a SampleProgram
object. Once again, let’s take a peek at what this looks like for real using “Hello World” as the example:
for program in python: print(program) Baklava in Python Binary Search in Python Bubble Sort in Python ... Roman Numeral in Python Rot 13 in Python Selection Sort in Python Sleep Sort in Python python["Hello World"] <subete.repo.SampleProgram object at 0x0000020C75FDE7D0>
Up next, we’ll learn how to make sense of these SampleProgram objects.
Using the SampleProgram Objects
At the bottom of this colossal data structure is the SampleProgram
object which represents an individual program in the repo. As a result, each program has a lot of fun features. For example, you can access the code directly:
# Returns the code of a program code = program.code()
As with languages, sample programs also have a few related URLs you can pull:
# Returns the documentation URL for the project this program is implementing url = program.documentation_url() # Returns the URL to a GitHub query for articles related to this program url = program.article_issue_query_url()
Meanwhile, there are several convenience methods for looking up data about the Sample Program:
# Returns the language collection that this program belongs to language = program.language_collection() # Returns the language name in its human-readable form (e.g., Python) name = program.language_name() # Returns the language name in its URL form (e.g., python) name = program.language_pathlike_name() # Returns the project object associated with this program project = program.project() # Returns the project name in its human-readable form (e.g., Hello World) name = program.project_name() # Returns the project name in its URL form (e.g., hello-world) name = program.project_pathlike_name()
And then of course, we wouldn’t be able to survive without some data methods:
# Returns the number of lines in the program count = program.line_count() # Returns the size of the program in bytes size = program.size()
And there we have it, a full breakdown of the Repo
object and its constituent parts. Here’s the usual rundown with actual code:
hello_world.code() "print('Hello, World!')\n" hello_world.documentation_url() 'https://sampleprograms.io/projects/hello-world/python' hello_world.article_issue_query_url() 'https://github.com//TheRenegadeCoder/sample-programs-website/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+hello+world+python' hello_world.language_collection() <subete.repo.LanguageCollection object at 0x0000020C75FDDF90> hello_world.language_name() 'Python' hello_world.language_pathlike_name() 'python' hello_world.project() <subete.repo.Project object at 0x0000020C75FDE770> hello_world.project_name() 'Hello World' hello_world.project_pathlike_name() 'hello-world' hello_world.line_count() 1 hello_world.size() 24
Finally, in the next section, we’ll talk about this mysterious project object.
Making Sense of the Project Object
In addition to the three objects mentioned previously, there’s actually a fourth object for convenience purposes: Project
. The Project
object exists because internally it’s somewhat annoying to handle projects as strings. As a result, I wrapped them in objects.
That said, Project
object is really only good for a couple of things. First, it’s main purpose is for getting project names in their respective formats:
# Returns project name in human-readable format (e.g., Hello World) name = project.name() # Returns project name in URL format (e.g., hello-world) name = project.pathlike_name()
In addition, you can get the project requirements URL from this object as well:
# Returns the project requirements URL url = project.requirements_url()
But that’s it! As usual, here’s what these methods actually do, using the Hello World project:
project.name() 'Hello World' project.pathlike_name() 'hello-world' project.requirements_url() 'https://sampleprograms.io/projects/hello-world'
Up next, we’ll take a look at how these objects can be used in actual code.
Real World Use of Subete
Now that we’ve taken a chance to look at Subete under-the-hood, here’s are some examples of where I’ve actually used it.
GitHub Profile Automation
First, let’s take a look at my GitHub profile. Each week, I generate a new code snippet using GitHub Actions and a simple Python script:
import subete from subete.repo import SampleProgram repo = subete.load() def get_code_snippet() -> SampleProgram: code = repo.random_program() return code if __name__ == "__main__": code = get_code_snippet()
There’s a bit of context missing, but you get the idea. Once I’ve retrieved the code snippet, I dump it into a markdown file using my SnakeMD library (maybe a topic for next time).
Sample Programs Website Automation
Recently, I started automating the Sample Programs website. As a result, I use Subete extensively to do things like this:
def generate_projects_index(repo: subete.Repo): projects_index_path = pathlib.Path("docs/projects") projects_index = snakemd.new_doc("index") _generate_front_matter( projects_index, projects_index_path / "front_matter.yaml", "Projects" ) projects_index.add_paragraph( "Welcome to the Projects page! Here, you'll find a list of all of the projects represented in the collection." ) projects_index.add_header("Projects List", level=2) projects_index.add_paragraph( "To help you navigate the collection, the following projects are organized alphabetically." ) repo.approved_projects().sort(key=lambda x: x.name().casefold()) projects = [ snakemd.InlineText( project.name(), url=project.requirements_url() ) for project in repo.approved_projects() ] projects_index.add_element(snakemd.MDList(projects)) projects_index.output_page(str(projects_index_path))
This function makes the projects page found here. It makes use of the
approved_projects()
method of Repo
to get a list of Project objects. These objects are then used to generate the projects page using the requirements_url()
method. And of course, I make use of SnakeMD here as well.
What Other Libraries Would You Like to See?
With all that said, there’s not much else to say about Subete. It’s a library I wrote to navigate the existing code base of the Sample Programs collection. If you like it, I recommend heading over to GitHub to give it a star. Hell, try it out while you’re at it too!
In the meantime, I’d appreciate it if you took a minute to check out my article on ways to grow the site. Google does a pretty terrible job of ranking this style of content, so if you want direct access to it, that link is a good starting place. If you’re still not sure, check out some of these related articles:
- Write a Python Script to Autogenerate Google Form Responses
- How to Use Python to Build a Simple Visualization Dashboard Using Plotly
Otherwise, that’s all I have! Thanks for stopping by and take care.
Recent Posts
It seems I'm in my mentorship arc because I can't stop writing about how to support students. Today, we're going to tackle one of the more heartbreaking concerns that students have: the idea that...
For my friends in the humanities, it probably comes as a surprise, but our STEM programs are really allergic to open-ended projects. As a result, I figured I would pitch them today.