Managing the many libraries and packages used by an application is complex and has some hidden risks. The difficulties increase when you want to run an application in a container, because you need to manage a development environment when creating a different set of libraries and packages for the containerized application. This article discusses some of the common problems Python developers face when containerizing Python applications, and how Pipenv and Source-to-Image (S2I) can help to resolve those problems. We will build a simple Python application on the Red Hat OpenShift container platform using those tools.
Shortcomings of pip
Pip is the nearly universal tool employed by Python programmers to install dependent packages. Pip is incredibly simple and powerful. But that simplicity creates several weak points that make it easy for both new and experienced developers to unknowingly introduce problems for themselves.
The central challenge developers face with dependencies is controlling the versions of packages they need to install. A requirements.txt
file is commonly used to track packages that need to be installed in the container. At first glance, the requirements file appears to meet this challenge, and only requires developers to run pip install -r requirements.txt
in the container build process.
However, problems can still occur with a requirements.txt
file. A developer might specify package A in the requirements file, which then automatically installs package B as a dependency of A. This might work perfectly today, but potentially introduces a future dependency problem.
Package A probably defines the requirement for B as simply B>=1.0.0
and does not specify an upper limit for the dependency version. At some point, package B can release an update that removes a feature that A is using, and your application breaks. Many related problems can also occur, as bugs and feature changes are introduced into dependencies.
Introducing Pipenv, Pipfiles, and Pipfile.lock files
Pipenv attempts to solve many of these problems. Pipenv replaces Pip as the tool for installing packages. Unlike some package managers, such as Conda, Pipenv installs the same packages from the PyPI repository that are available with Pip.
If you're already using Python, you can get Pipenv by executing:
pip install pipenv
Once the Pipenv package is installed, you are ready to start installing additional packages specific to your project. Where you previously would have run pip install requests
, you can instead run pipenv install requests
to get the exact same packages.
After you run Pipenv in a project for the first time, the tool creates a file called Pipfile
. The generated Pipfile
looks like this:
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
requests = "*"
[dev-packages]
[requires]
python_version = "3.9"
Just like requirements.txt
in Pip, the Pipfile
can capture which packages you wish to install. But Pipenv is able to automatically maintain the file for you. It also captures some other useful information, such as the Python version you are using.
Additionally, a Pipfile
has a dev-packages
section. If you wish to use a package during development but not in production, such as the automatic code formatter black
, you can simply run pipenv install black --dev
. The --dev
option captures development packages separately from the application's packages, so you can use packages during development while keeping them out of the production application.
Keeping dependencies from breaking a build
Pipenv creates another file called Pipfile.lock
. The lock file limits the versions of all the packages you have installed and their dependencies, similar to running pip freeze > requirements.txt
.
Pipfile.lock
allows you to reinstall the exact same versions of all components you used before, even if newer versions of those components have come out since last running Pipenv. If you need to rebuild your container several months down the line, running pipenv install --deploy
installs the exact package versions specified in the lock file, ensuring that changes in upstream dependencies don't accidentally break your application.
Although Pipfile.lock
is automatically generated, it is intended to be checked into source control, along with your Pipfile
.
Virtual environments
Another mistake that new Python developers often make is working from their global user Python environment. Virtual environments allow you to create a "clean" Python environment where you can install and manage packages independently from the global Python environment. Python offers a number of tools and methods for creating and managing virtual environments, which can be a bit overwhelming.
Thankfully, as its name implies, Pipenv can manage the environment for you. When you run pipenv install
, Pipenv automatically detects whether a virtual environment was already created for this project, and either creates a new virtual environment or installs the packages into the existing virtual environment. That virtual environment can easily be activated with pipenv shell
, allowing you to work with and run your application and packages from that virtual environment.
Tip: By default, Pipenv generates the environment in a centrally located folder. I prefer to keep my virtual environment in my project folder with my
Pipfile
. You can change the default behavior by setting the following definition in your.bashrc
file:export PIPENV_VENV_IN_PROJECT=1
With this option set, Pipenv creates a
.venv
folder to manage the virtual environment directly in your project folder. This folder can easily be deleted if you want to rebuild it from scratch or you just need to clean up disk space..venv
is a standard folder naming convention for virtual environments and should already be included in any standard Python.gitignore
file.
Benefits of Source-to-Image
Source-to-Image (S2I) is a tool that enables developers to easily generate a container image from source code without having to write a Dockerfile. Creating an accurate Dockerfile may sound like a minor task for a seasoned containers expert, but generating an optimized image involves a number of "gotchas" that many developers aren't aware of. You need to manage layers correctly, clean up unneeded install artifacts, and run applications as a non-root user. Slipping up on any of those tasks can lead to a sub-optimal or nonfunctional image.
To combat these problems, organizations often maintain "reference" Dockerfiles and tell their developers, "Go copy this Dockerfile for your Python app and modify it as needed." That workaround creates a challenging maintenance task down the road.
S2I instead does away with the Dockerfile and simply ships the instructions for building the image in the image itself. This procedure does require you have an S2I-enabled image for the language you are attempting to build. The good news is that nearly all of the language-specific images shipped with OpenShift are enabled for S2I.
S2I images expect you to follow some standard conventions for the language in the application structure. But if necessary, you can set your own conventions by modifying or extending Python S2I's default assemble and run scripts. The Python assemble
script expects the application to have a requirements.txt
file and the run
script looks for an app.py
file. The assemble
script defines some options that can be customized for Pipenv, as we will explore later.
Tip: When you have to deal with more advanced configuration options in S2I, it's always valuable to refer to the source code to see exactly what S2I is running. You can
exec
into the container to view theassemble
andrun
scripts directly, but most of the time I find it easier to just look the scripts up on GitHub. The S2I scripts for Python 3.9 can be found at this GitHub repository.
Building an example application with pipenv and S2I
To demonstrate the capabilities of Pipenv and S2I, we will build a simple "Hello World" application that exposes an interface through FastAPI. To view the completed application, get the source code from this GitHub repository.
Installing the initial dependencies
To begin, create a new Pipfile
and virtual environment with FastAPI by executing:
pipenv install fastapi
As discussed previously, Pipenv creates the Pipfile
, Pipfile.lock
file, and a virtual environment with FastAPI installed. Verify that you can activate the virtual environment and list the packages with the following commands:
pipenv shell
pip list
The output should show FastAPI and its dependencies.
While still in the shell, you can install additional packages such as black
. Because black
is needed only in the development environment and not in the production application, use the --dev
flag:
pipenv install black --dev
Creating the application
Next, create the FastAPI example application based on the FastAPI first-steps tutorial. The code will be in hello_world/main.py
:
from fastapi import FastAPI
app = FastAPI()
@app.get("/")
async def root():
return {"message": "Hello World"}
Additionally, it is always a best practice to create an empty file called __init__.py
in the folder containing the Python components.
At this point, your folder structure should look like this:
.
├── hello_world/
│ ├── __init__.py
│ └── main.py
├── Pipfile
└── Pipfile.lock
The application is now ready to start in your local environment. With the virtual environment still active, you can run the following command to start the application:
uvicorn hello_world.main:app
I have chosen to put the application file in a subfolder inside of my Git repository instead of creating the application in the root of the project. Although we don't have much in our hello_world
folder, most real applications require additional files and folders. By keeping the main application in a subfolder, you can keep the root folder of the project relatively clean and readable, while maintaining future flexibility for the application.
Setting up the launch of the application
The application is now functioning and you are ready to consider how to containerize it. The first question to answer is how the application will start.
As mentioned earlier, Python-S2I looks for an app.py
file in the root of the project and attempts to use that to start the application. However, the run
script allows you to start the application from a file named app.sh
if an app.py
file isn't found. One option is to include the uvicorn
command shown earlier in the app.sh
file, but I prefer to try to keep everything as Python code. So you can start the application with the following app.py
file:
from hello_world.main import app
import uvicorn
if __name__ == "__main__":
uvicorn.run(
app,
host="0.0.0.0",
port=8080,
)
To test the file, again, run the following:
python app.py
This time, you will encounter an error because you're missing the uvicorn
package:
Traceback (most recent call last):
File "/home/troyer/code/pipenv-tutorial/app.py", line 3, in <module>
import uvicorn
ModuleNotFoundError: No module named 'uvicorn'
To resolve this problem, simply add the package with Pipenv:
pipenv install uvicorn
Pipenv will capture the new dependency in the Pipfile
and Pipfile.lock
file automatically.
Running app.py
again should now function correctly.
Configuring the S2I build
Next, you need to consider how to build the application. As mentioned before, Python-S2I looks for a requirements.txt
file by default, but it does support other build options. The assemble script refers to two different environment variables you can use: ENABLE_PIPENV
and ENABLE_MICROPIPENV
.
ENABLE_PIPENV
allows the assemble
script to install packages from Pipfile.lock
using the standard Pipenv package. ENABLE_MICROPIPENV
also installs packages from the Pipfile.lock
file, but uses a tool called micropipenv from Project Thoth, an open source group sponsored by Red Hat.
Micropipenv has a few advantages over Pipenv: micropipenv is smaller, optimized for installing packages in containers, and incredibly fast. It has the added benefit of supporting Poetry, another popular dependency manager that is an alternative to Pip and Pipenv.
To enable micropipenv, set the ENABLE_MICROPIPENV
environment variable directly in the Git repository by creating the following .s2i/environment
file:
ENABLE_MICROPIPENV=True
Finally, consider which files to include in the image. By default, S2I does the equivalent of Docker's COPY . .
statement, which copies everything in the Git repository into the image. Our example application doesn't have a whole lot extra in it now, but copying everything might accidentally introduce unwanted artifacts in the image. For example, if you later add a tests
folder, you don't want to include those tests in the image. To manage what gets added to the final image, use a .s2iignore
file. This file semantically functions exactly the same as .gitignore
, but determines what to ignore when copying the contents of the repo to the image.
While most .gitignore
files list the files you don't want to include in the Git repository, I generally prefer to start by excluding all files in my .s2iignore
and then explicitly add back the ones I do need. This practice helps prevent any extra files accidentally slipping through later on, and keeps the image size to a minimum. A typical .s2iignore
file looks like this:
# Ignore everything
*
# Allow specific files
!.s2iignore
!.s2i/
!hello_world/
!LICENSE
!Pipfile
!Pipfile.lock
!app.py
After pushing your code to GitHub, you are ready to build the application with OpenShift.
Building and deploying the container
For the final step of building and deploying the container on OpenShift, you can create the necessary artifacts from the command line with oc new-app
or through the user interface (UI) using the +Add interface.
Creating the application from the command line
Before creating the application from the command line, make sure you have chosen the project with a oc project
command. Then run the new-app
command as follows:
oc new-app openshift/python:3.9-ubi8~https://github.com/strangiato/pipenv-s2i-example.git --name hello-world
A new application should appear in OpenShift, a build should run relatively quickly, and the application should start successfully.
To test the application, create a route with the following command:
oc expose svc/hello-world
You should now be able to visit the API endpoint at that route and see the "Hello World" message.
Creating the application from the web console
To perform the same actions from the UI, navigate to the +Add menu in the Developer view. Next, select Import from Git and copy the Git URL into the Git Repo URL field. Click Edit Import Strategy, select Python, and make sure that a 3.9 image is automatically selected. Update any of the object names and click Create.
Just as with the oc new-app
command, a new build should kick off and the application should deploy successfully. Because the UI defaults to creating a route, you should have access to the API endpoint right away.
Pipenv and S2I simplify container building with Python applications
This article discussed some of the common problems Python developers encounter when attempting to containerize applications, and how you can solve some of those problems with Pipenv and S2I. Additionally, we created a simple web application using Pipenv and Python S2I on OpenShift.
Last updated: August 14, 2023