At long last, I’ve decided to finally get over my fear of Input/Output long enough to write another article about files. In particular, we’re going to take a look at the process behind opening a file in Python.
For those of you short on time, the quickest way to open a file in Python is take advantage of the `open()`python function. Specifically, all we need to do is pass a path to the function: `open(‘/path/to/file/’)`python. Alternatively, we can take advantage of the `pathlib`python module which allows us to store `Path`python objects.
If that’s not enough to get you started, keep reading! Otherwise, I’d appreciate it if you took a moment to check out the list of ways to help grow the site. Thanks again for the support!
Table of Contents
As this series grows, I find myself constantly pushed into uncomfortable domains. For example, a lot of people use Python for data science, so I feel some pressure to write about libraries like Pandas and Numpy. Likewise, one topic that comes up a lot is Input/Output—specifically working with files.
Now, I’ve sort of avoided talking about files in this series because files are complex. They can come in many, many different shapes and sizes, and they’re never consistent across platforms.
To add insult to injury, Python has expanded its file support over time. As a result, you really have to take care when listing solutions because they almost certainly won’t work in all versions of Python. In fact, I saw this issue in my file existence article from way back.
That said, today I’ve decided to wade back out into the dark territory that is IO. Specifically, we’re going to talk about how to open a file in Python. Basically, that means we’re going to look at some different ways to access a file for reading and writing.
Fortunately, Python is quite a bit less painful to work with than languages like Java or C. In other words, we should find IO to be a piece of cake (with lots of caveats along the way).
If you’ve been around this series for any amount of time, you know that I like to pool together a whole series of solutions. Of course, each list comes with the caveat that not all solutions are applicable in ever scenario. For example, the first solution in this should almost never be used, but I included it for the sake of tradition.
With that said, let’s go ahead and take a look a few ways to open a file in Python.
Open a File with Shell Commands
With Python being a high-level language, there are tons of utilities built directly into the language for opening files. Of course, if you know me, I always like to take my first swipe at the challenge the hard way. In other words, I wanted to see if there was a way to open a file without using any straightforward functions.
Naturally, the first thing I though about were shell commands. In other words, what if there was some way to interact with the command line directly? That way, I could just run Windows or Linux commands to open a file.
Unsurprisingly, Python has an interface for this. All we have to do is import the `os`python library and run the commands directly:
import os os.system('type NUL > out.txt') # Windows only
Here, we create an empty file called “out.txt” in the current working directory. Unfortunately, this doesn’t really open a file in the sense that we don’t have a file reference to play with—though I’m sure we could read a file using this same syntax.
That said, this solution gives us a lot of flexibility, and if we want even more flexibility, we can rely on the `subprocess`python module. However, I have no desire to go down that rabbit hole when there are so many better solutions to follow.
Open a File with the Open Function
If you’re like me, and you’re first language was Java, you know how painful it can be to open a file. Luckily, Python has a built-in function to make opening a file easy:
Of course, it’s a bit more clunky to use because it can throw an exception. For example, if the file doesn’t exist, the code will crash with the following error:
>>> open('/path/to/file') Traceback (most recent call last): File "<pyshell#0>", line 1, in <module> open('/path/to/file') FileNotFoundError: [Errno 2] No such file or directory: '/path/to/file'
As a result, a call to `open()`python is usually wrapped in a try/except:
try: open('/path/to/file') except FileNotFoundError: pass
That way, if the error does arise, we have a mechanism for dealing with it.
As an added wrinkle, opening a file introduces a resource to our program. As a result, it’s also good practice to close the file when we’re done with it:
try: my_file = open('/path/to/file') my_file.close() except FileNotFoundError: pass
Or, if we’re clever, we can take advantage of the `with`python statement:
try: with open('/path/to/file') as my_file: pass except FileNotFoundError: pass
This cleans up the code quite a bit! Now, we don’t have to explicitly close the file.
The only thing left to mention are our options. After all, it’s not enough just to open the file. We need to specify some parameters. For example, are we going to open the file just for reading? Then, we should probably open in reading mode:
try: with open('/path/to/file', 'r') as my_file: pass except FileNotFoundError: pass
Alternatively, if we wanted to read and write to the file, we can use “r+”:
try: with open('/path/to/file', 'r+') as my_file: pass except FileNotFoundError: pass
For those that are interested, here’s a (mostly) complete table of modes:
|r||Opens an existing file as text for reading only|
|w||Opens a new file or overwrites an existing file as text for writing only|
|a||Opens a new file or overwrites an existing file as text for writing where new text is added to the end of the file (i.e. append)|
|r+||Opens an existing file as text for reading and writing|
|w+||Opens a new file or overwrites an existing file as text for reading and writing|
|a+||Opens a new file or overwrites an existing file as text for reading and writing where new text is added to the end of the file (i.e. append)|
|rb||Opens an existing file as binary for reading only|
|wb||Opens a new file of overwrites an existing file as binary for writing only|
|ab||Opens a new file or overwrites an existing file as binary for writing where new text is added to the end of the file (i.e. append)|
|rb+||Opens an existing file as binary for reading and writing|
|wb+||Opens a new file or overwrites an existing file as binary for reading and writing|
|ab+||Opens a new file or overwrites an existing file as binary for reading and writing where new binary is added to the end of the file (i.e. append)|
In addition, there are a handful of other modes that you can read more about in the documentation. That said, keep in mind that a lot of the concepts mentioned here are still useful in following solutions.
Open a File with the pathlib Module
While the `open()`python function is handy, there is another option that’s a bit more robust: the `pathlib`python module. Basically, this module allows us to think of files at a higher level by wrapping them in a `Path`python object:
from pathlib import Path my_file = Path('/path/to/file')
Then, opening the file is as easy as using the `open()`python method:
That said, many of the same issues still apply. For example, running the code above will result in the following error:
>>> my_file = Path('/path/to/file') >>> my_file.open() Traceback (most recent call last): File "<pyshell#16>", line 1, in <module> my_file.open() File "C:\Users\Jeremy Grifski\AppData\Local\Programs\Python\Python38-32\lib\pathlib.py", line 1213, in open return io.open(self, mode, buffering, encoding, errors, newline, File "C:\Users\Jeremy Grifski\AppData\Local\Programs\Python\Python38-32\lib\pathlib.py", line 1069, in _opener return self._accessor.open(self, flags, mode) FileNotFoundError: [Errno 2] No such file or directory: '\\path\\to\\file'
Look familiar? It should! After all, we ran into this error when we tried to open this imaginary file before. In other words, all the same rules apply. For example, a mode can be passed along as needed:
That said, `pathlib`python is nice because it provides a lot of helpful methods. For instance, instead of using a try/except, we can use one of the helpful boolean methods:
if my_file.exists(): my_file.open('a')
Of course, there’s a bit of a catch here. If for some reason the file is deleted after we check if it exists, there will be an error. As a result, it’s usually a safer bet to use the try/except strategy from before.
Overall, I’m a big fan of this solution—especially when I want to do more than read the file. For instance, here’s a table of helpful methods that can be executed on these `Path`python objects:
|chmod()||Change the file mode and permissions|
|is_file()||Returns True if the path is a file|
|mkdir()||Creates a directory at the given path|
|rename()||Renames the file/directory at the given path|
|touch()||Creates a file at the given path|
Of course, if you’re interested in browsing the entire suite of methods, check out the documentation. In the meantime, we’re going to move on to performance.
In my experience, IO is a bit of a pain to test because we usually need to run our tests for at least two scenarios: the file either exists or it doesn’t. In other words, for every possible test we come up with, we have to test it once for an existing file and again for a nonexistent file.
Now, to make matters worse, we also have a ton of modes to explore. Since I didn’t purposefully limit the scope of this article, that means we have a lot to test. For simplicity, I’m going to only test two modes: read and write. I have no idea if there will be a performance difference here, but I’m interested in exploring it.
With those caveats out of the way, let me remind everyone that we use `timeit`python for all my performance tests. For these tests, we’ll need to create strings of all the different tests we’d like to try. Then, it’s just a matter of running them. If you’re interested in learning more about this process, I have an article about performance testing just for you. Otherwise, here are the strings:
setup = """ import os from pathlib import Path """ system_commands = """ os.system('type NUL > out.txt') """ open_r = """ open("out.txt", "r") # Existing file """ open_w = """ open("out.txt", "w") # Existing file """ path_r = """ Path("out.txt").open("r") # Existing file """ path_w = """ Path("out.txt").open("w") # Existing file """
As we can see, none of these solutions are written with a nonexistent file in mind. I realized that those would be a bit more difficult to test because we would have to delete the file between executions (at least for the write solutions). As a result, I chose to leave them out. Feel free to test them yourself and let me know what you find.
At any rate, now that we have our strings, we can begin testing:
>>> import timeit >>> min(timeit.repeat(setup=setup, stmt=open_r)) 462.8889031000001 >>> min(timeit.repeat(setup=setup, stmt=open_w)) 201.32850720000033 >>> min(timeit.repeat(setup=setup, stmt=path_r)) 576.0263794000002 >>> min(timeit.repeat(setup=setup, stmt=path_w)) 460.5153201000003
One thing that’s worth mentioning before we discuss the results is that I had to exclude the system command solution. Whenever it was executed, a command prompt launched on my system. It was so slow that I didn’t bother finishing the test.
With that said, IO is an extremely slow process in general. Even without the fun little window spam, these solutions took forever to test. In fact, I wouldn’t even read too far into these metrics because there’s just too much variability between runs.
That said, I’m most interested in the difference between the speed of reading versus writing when using the `open()`python function. It makes me wonder how much more work goes into preparing a file for reading versus writing. However, I didn’t see quite as dramatic of a difference with the `pathlib`python solutions.
If anyone is interested in doing a bit more research, I’d love to know more about the inner workings of these solutions. In general, I’m a fairly skeptical of my metrics, but I don’t have a ton of time to play around with these sort of things.
At any rate, let’s move on to the challenge!
Now that we’ve had a chance to look at the performance, we can move on to the challenge. After having a chance to play around with file opening, I figured the sky’s the limit for IO challenges. As a result, I wasn’t really sure where to start.
At first, I thought it might be interesting to try to put together a quine which is a program that duplicates itself. Unfortunately, these are usually done through standard output and not to files. In fact, I wasn’t able to find any examples that output to a file, so I decided wasn’t the way to go.
Instead, I figured we could take this idea of opening files a step further by moving on to file reading. In other words, now that we know how to open a file, what would it take to read the contents of that file? Specifically, I’m interested in writing a program similar to `cat` for linux users:
cat example.txt # Outputs the contents of the file
This program should prompt the user for a file name and output the contents to standard out. In addition, it’s safe to assume the supplied file is text, but you’re welcome to create a more robust program if desired:
>>> Please enter the path to a text file: example.txt Here are some sample file contents!
Naturally, a solution to this challenge will involve one of the file opening methods discussed in this article. From there, it’s up to you to decided how you want to read and display the file.
As always, I’ve come up with a solution already! Check it out:
If you’d like to share your own solution, head on over to Twitter and share your solution using the hashtag #RenegadePython. Alternatively, you can share your solution with our GitHub repo, and I’ll tweet it out if you want. I’m excited to see what you come up with!
A Little Recap
At long last, we’re finished! Here’s are all the solutions in one place:
# "Open" a file with system commands import os os.system('type NUL > out.txt') # Open a file for reading with the open() function open("out.txt", "r") # Open a file for reading with the pathlib module from pathlib import Path Path("out.txt").open("r")
While you’re here, check out some of these related articles:
Likewise, here are some helpful resources from Amazon (ad):
- Effective Python: 90 Specific Ways to Write Better Python
- Python Tricks: A Buffet of Awesome Python Features
- Python Programming: An Introduction to Computer Science
Otherwise, thanks for sticking around! I hope to see you back here soon.
Magic numbers are numerical constants that have no clear meaning in the code and therefore make code harder to read. Anything that makes code harder to read is something we can use to obfuscate our...
Type hinting is a nice tool that dynamic typing languages employ to make code more readable. As you can probably imagine, readability is not the goal with obfuscating code, so we ought to get rid of...