Handling Files in Python: A Guide for Google Colab Users
Python for the Humanities 16
Welcome back to our Python course. Today, we’re diving into the essential topics of file handling and the basics of text mining. Before we can analyze text, we first need to understand how to get that text into our program. This brings up a unique challenge when working in a cloud-based environment like Google Colab.
Understanding the Google Colab Environment
The first thing to grasp about Google Colab is that your code doesn’t run on your own computer. When you write and execute a Python script in a Colab notebook, your browser sends that code to a server in one of Google’s data centers. A virtual machine is spun up just for you, your code runs there, and the output is sent back to your browser.
This is why the first time you run a cell, there’s often a short delay. Google is busy allocating a server and setting up the Python environment for you. Subsequent runs are faster because that connection remains active for a while.
This remote execution model has a critical implication for file handling: your Colab notebook cannot directly access files on your local hard drive. The “local” file system for your Colab program is on that temporary Google server, which starts empty every time. If you try to open a file like C:\Users\YourName\Documents\my_file.txt, your program will fail because that path doesn’t exist on the Google server.
The Solution: Uploading Files on Demand
So, how do we work with files? There are two main approaches in Colab:
Mounting Google Drive: You can connect your Google Drive to your Colab instance, allowing your program to read and write files directly from your cloud storage. This is powerful but can add complexity, so we will skip it for now to stay focused on Python fundamentals.
Uploading Files Directly: For many data processing and analysis tasks, you only need to work with one or two files at a time. The simplest method is to upload these files directly to the Colab environment when you need them.
We will focus on the second method, which is perfect for getting started.
Reading an Uploaded File: Step-by-Step
The google.colab library provides a simple function, files.upload(), that triggers a file selection dialog in your browser. Let’s walk through the code to upload a text file and read its contents.
# In the following, please replace all ‘curly’ quotes by "straight" ones!
# This is a Substack bug I can do nothing about.
# First, we need to import the ‘files’ module from the google.colab library
from google.colab import files
# Prompt the user to upload a file
print(”Please upload your text file.”)
uploaded = files.upload()
# The ‘upload’ function returns a dictionary where keys are the filenames.
# We get the filename of the first uploaded file.
filename = list(uploaded.keys())[0]
print(f”\nSuccessfully uploaded file: ‘{filename}’”)
# Now, we open and read the file in a safe way
print(”\n--- File Content ---”)
with open(filename, ‘r’) as file_handle:
for line in file_handle:
# .strip() removes leading/trailing whitespace, including the newline character
print(line.strip())
Let’s break down what’s happening here:
from google.colab import files: This line imports the necessary tools for file operations in Colab.uploaded = files.upload(): This is the magic. Executing this line will display a “Choose Files” button in your notebook. When you select a file and the upload completes, the function returns a dictionary.filename = list(uploaded.keys())[0]: The returneduploadeddictionary uses the filenames as keys and the file content as values. To get the name of the file you just uploaded, we get a list of the dictionary’s keys and take the first element.with open(...): This is the standard, modern Python way to work with files, which we will explore in detail next.
The Pythonic Way to Work with Files: The with Statement
When you work with files in any programming language, you are interacting with the operating system (OS). The process generally involves three steps: open, read/write, and close.
Open: You tell the OS you want to access a specific file. You also specify your intent: are you reading from it (
‘r’), writing to it (‘w’), or appending to it (‘a’)? The OS then gives you a file handle—think of it as a ticket or a key that grants you access to the file.Read/Write: You use the file handle to perform your operations.
Close: You tell the OS you are finished. This is critically important. If you open a file for writing and forget to close it, the OS may keep the file “locked,” preventing other programs (or even you) from using it. It can also lead to data loss if the program terminates unexpectedly before all data is written from memory to the disk.
Forgetting to close a file is a common bug. Python provides an elegant solution: the with statement.
# In the following, please replace all ‘curly’ quotes by “straight” ones!
# This is a Substack bug I can do nothing about.
with open(filename, ‘r’) as file_handle:
# Code inside this block can use file_handle
# ...
# Once the block is exited, Python automatically closes the file for you.
The with statement creates a block of code. Within this block, the file is guaranteed to be open. As soon as your program exits the block (either by finishing or due to an error), Python automatically and safely closes the file for you. This is why it’s considered the “safe” way to handle files.
Processing the File Content
Once you have an open file handle, you can treat it as an iterable sequence of lines. This makes it incredibly easy to process a text file line by line with a simple for loop.
# Inside the ‘with’ block from our example:
for line in file_handle:
print(line.strip())
Here, line will be a string containing one line from the file on each iteration. We use the .strip() string method to remove any leading or trailing whitespace, including the invisible newline character (\n) at the end of each line, which makes for cleaner output.
A Note on Best Practices
While our example prints the content directly from within the with block, a better practice for larger programs is to keep the file open for the shortest time possible.
Imagine your file processing takes a long time—minutes or even hours. If you do all that work inside the with block, you are keeping the file locked and inaccessible to other programs for that entire duration.
A better pattern is:
Open the file.
Read all its contents into a variable (e.g., a list of lines).
Close the file (by exiting the
withblock).Process the data from the variable.
# A better practice for longer processing tasks
with open(filename, 'r') as file_handle:
lines = file_handle.readlines() # Read all lines into a list
# The file is now closed. The user is free to move or delete it.
# We can now process the data from the ‘lines’ variable at our leisure.
for line in lines:
# Do some time-consuming processing here...
print(line.strip())
This approach minimizes the time the file is locked, making your program more robust and considerate to the overall system.
Now that you know how to read files, we’re ready for the next step: analyzing their content. In our next session, we’ll use these skills to count word frequencies in a text file. See you there!


