Featured image for "Faster web deployment with Python serverless functions."

Managing the many libraries and packages used by an application is complex and has some hidden risks. The difficulties increase when you want to run an application in a container, because you need to manage a development environment when creating a different set of libraries and packages for the containerized application. This article discusses some of the common problems Python developers face when containerizing Python applications, and how Pipenv and Source-to-Image (S2I) can help to resolve those problems. We will build a simple Python application on the Red Hat OpenShift container platform using those tools.

Shortcomings of pip

Pip is the nearly universal tool employed by Python programmers to install dependent packages. Pip is incredibly simple and powerful. But that simplicity creates several weak points that make it easy for both new and experienced developers to unknowingly introduce problems for themselves.

The central challenge developers face with dependencies is controlling the versions of packages they need to install. A requirements.txt file is commonly used to track packages that need to be installed in the container. At first glance, the requirements file appears to meet this challenge, and only requires developers to run pip install -r requirements.txt in the container build process.

However, problems can still occur with a requirements.txt file. A developer might specify package A in the requirements file, which then automatically installs package B as a dependency of A. This might work perfectly today, but potentially introduces a future dependency problem.

Package A probably defines the requirement for B as simply B>=1.0.0 and does not specify an upper limit for the dependency version. At some point, package B can release an update that removes a feature that A is using, and your application breaks. Many related problems can also occur, as bugs and feature changes are introduced into dependencies.

Introducing Pipenv, Pipfiles, and Pipfile.lock files

Pipenv attempts to solve many of these problems. Pipenv replaces Pip as the tool for installing packages. Unlike some package managers, such as Conda, Pipenv installs the same packages from the PyPI repository that are available with Pip.

If you're already using Python, you can get Pipenv by executing:

pip install pipenv

Once the Pipenv package is installed, you are ready to start installing additional packages specific to your project. Where you previously would have run pip install requests, you can instead run pipenv install requests to get the exact same packages.

After you run Pipenv in a project for the first time, the tool creates a file called Pipfile. The generated Pipfile looks like this:

[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
requests = "*"

[dev-packages]

[requires]
python_version = "3.9"

Just like requirements.txt in Pip, the Pipfile can capture which packages you wish to install. But Pipenv is able to automatically maintain the file for you. It also captures some other useful information, such as the Python version you are using.

Additionally, a Pipfile has a dev-packages section. If you wish to use a package during development but not in production, such as the automatic code formatter black, you can simply run pipenv install black --dev. The --dev option captures development packages separately from the application's packages, so you can use packages during development while keeping them out of the production application.

Keeping dependencies from breaking a build

Pipenv creates another file called Pipfile.lock. The lock file limits the versions of all the packages you have installed and their dependencies, similar to running pip freeze > requirements.txt.

Pipfile.lock allows you to reinstall the exact same versions of all components you used before, even if newer versions of those components have come out since last running Pipenv. If you need to rebuild your container several months down the line, running pipenv install --deploy installs the exact package versions specified in the lock file, ensuring that changes in upstream dependencies don't accidentally break your application.

Although Pipfile.lock is automatically generated, it is intended to be checked into source control, along with your Pipfile.

Virtual environments

Another mistake that new Python developers often make is working from their global user Python environment. Virtual environments allow you to create a "clean" Python environment where you can install and manage packages independently from the global Python environment. Python offers a number of tools and methods for creating and managing virtual environments, which can be a bit overwhelming.

Thankfully, as its name implies, Pipenv can manage the environment for you. When you run pipenv install, Pipenv automatically detects whether a virtual environment was already created for this project, and either creates a new virtual environment or installs the packages into the existing virtual environment. That virtual environment can easily be activated with pipenv shell, allowing you to work with and run your application and packages from that virtual environment.

Tip: By default, Pipenv generates the environment in a centrally located folder. I prefer to keep my virtual environment in my project folder with my Pipfile. You can change the default behavior by setting the following definition in your .bashrc file:

export PIPENV_VENV_IN_PROJECT=1

With this option set, Pipenv creates a .venv folder to manage the virtual environment directly in your project folder. This folder can easily be deleted if you want to rebuild it from scratch or you just need to clean up disk space. .venv is a standard folder naming convention for virtual environments and should already be included in any standard Python .gitignore file.

Benefits of Source-to-Image

Source-to-Image (S2I) is a tool that enables developers to easily generate a container image from source code without having to write a Dockerfile. Creating an accurate Dockerfile may sound like a minor task for a seasoned containers expert, but generating an optimized image involves a number of "gotchas" that many developers aren't aware of. You need to manage layers correctly, clean up unneeded install artifacts, and run applications as a non-root user. Slipping up on any of those tasks can lead to a sub-optimal or nonfunctional image.

To combat these problems, organizations often maintain "reference" Dockerfiles and tell their developers, "Go copy this Dockerfile for your Python app and modify it as needed." That workaround creates a challenging maintenance task down the road.

S2I instead does away with the Dockerfile and simply ships the instructions for building the image in the image itself. This procedure does require you have an S2I-enabled image for the language you are attempting to build. The good news is that nearly all of the language-specific images shipped with OpenShift are enabled for S2I.

S2I images expect you to follow some standard conventions for the language in the application structure. But if necessary, you can set your own conventions by modifying or extending Python S2I's default assemble and run scripts. The Python assemble script expects the application to have a requirements.txt file and the run script looks for an app.py file. The assemble script defines some options that can be customized for Pipenv, as we will explore later.

Tip: When you have to deal with more advanced configuration options in S2I, it's always valuable to refer to the source code to see exactly what S2I is running. You can exec into the container to view the assemble and run scripts directly, but most of the time I find it easier to just look the scripts up on GitHub. The S2I scripts for Python 3.9 can be found at this GitHub repository.

Building an example application with pipenv and S2I

To demonstrate the capabilities of Pipenv and S2I, we will build a simple "Hello World" application that exposes an interface through FastAPI. To view the completed application, get the source code from this GitHub repository.

Installing the initial dependencies

To begin, create a new Pipfile and virtual environment with FastAPI by executing:

pipenv install fastapi

As discussed previously, Pipenv creates the Pipfile, Pipfile.lock file, and a virtual environment with FastAPI installed. Verify that you can activate the virtual environment and list the packages with the following commands:

pipenv shell
pip list

The output should show FastAPI and its dependencies.

While still in the shell, you can install additional packages such as black. Because black is needed only in the development environment and not in the production application, use the --dev flag:

pipenv install black --dev

Creating the application

Next, create the FastAPI example application based on the FastAPI first-steps tutorial. The code will be in hello_world/main.py:

from fastapi import FastAPI

app = FastAPI()


@app.get("/")
async def root():
    return {"message": "Hello World"}

Additionally, it is always a best practice to create an empty file called __init__.py in the folder containing the Python components.

At this point, your folder structure should look like this:

.
├── hello_world/
│   ├── __init__.py
│   └── main.py
├── Pipfile
└── Pipfile.lock

The application is now ready to start in your local environment. With the virtual environment still active, you can run the following command to start the application:

uvicorn hello_world.main:app

I have chosen to put the application file in a subfolder inside of my Git repository instead of creating the application in the root of the project. Although we don't have much in our hello_world folder, most real applications require additional files and folders. By keeping the main application in a subfolder, you can keep the root folder of the project relatively clean and readable, while maintaining future flexibility for the application.

Setting up the launch of the application

The application is now functioning and you are ready to consider how to containerize it. The first question to answer is how the application will start.

As mentioned earlier, Python-S2I looks for an app.py file in the root of the project and attempts to use that to start the application. However, the run script allows you to start the application from a file named app.sh if an app.py file isn't found. One option is to include the uvicorn command shown earlier in the app.sh file, but I prefer to try to keep everything as Python code. So you can start the application with the following app.py file:

from hello_world.main import app

import uvicorn


if __name__ == "__main__":
    uvicorn.run(
        app,
        host="0.0.0.0",
        port=8080,
    )

To test the file, again, run the following:

python app.py

This time, you will encounter an error because you're missing the uvicorn package:

Traceback (most recent call last):
  File "/home/troyer/code/pipenv-tutorial/app.py", line 3, in <module>
    import uvicorn
ModuleNotFoundError: No module named 'uvicorn'

To resolve this problem, simply add the package with Pipenv:

pipenv install uvicorn

Pipenv will capture the new dependency in the Pipfile and Pipfile.lock file automatically.

Running app.py again should now function correctly.

Configuring the S2I build

Next, you need to consider how to build the application. As mentioned before, Python-S2I looks for a requirements.txt file by default, but it does support other build options. The assemble script refers to two different environment variables you can use: ENABLE_PIPENV and ENABLE_MICROPIPENV.

ENABLE_PIPENV allows the assemble script to install packages from Pipfile.lock using the standard Pipenv package. ENABLE_MICROPIPENV also installs packages from the Pipfile.lock file, but uses a tool called micropipenv from Project Thoth, an open source group sponsored by Red Hat.

Micropipenv has a few advantages over Pipenv: micropipenv is smaller, optimized for installing packages in containers, and incredibly fast. It has the added benefit of supporting Poetry, another popular dependency manager that is an alternative to Pip and Pipenv.

To enable micropipenv, set the ENABLE_MICROPIPENV environment variable directly in the Git repository by creating the following .s2i/environment file:

ENABLE_MICROPIPENV=True

Finally, consider which files to include in the image. By default, S2I does the equivalent of Docker's COPY . . statement, which copies everything in the Git repository into the image. Our example application doesn't have a whole lot extra in it now, but copying everything might accidentally introduce unwanted artifacts in the image. For example, if you later add a tests folder, you don't want to include those tests in the image. To manage what gets added to the final image, use a .s2iignore file. This file semantically functions exactly the same as .gitignore, but determines what to ignore when copying the contents of the repo to the image.

While most .gitignore files list the files you don't want to include in the Git repository, I generally prefer to start by excluding all files in my .s2iignore and then explicitly add back the ones I do need. This practice helps prevent any extra files accidentally slipping through later on, and keeps the image size to a minimum. A typical .s2iignore file looks like this:

# Ignore everything
*

# Allow specific files
!.s2iignore
!.s2i/
!hello_world/
!LICENSE
!Pipfile
!Pipfile.lock
!app.py

After pushing your code to GitHub, you are ready to build the application with OpenShift.

Building and deploying the container

For the final step of building and deploying the container on OpenShift, you can create the necessary artifacts from the command line with oc new-app or through the user interface (UI) using the +Add interface.

Creating the application from the command line

Before creating the application from the command line, make sure you have chosen the project with a oc project command. Then run the new-app command as follows:

oc new-app openshift/python:3.9-ubi8~https://github.com/strangiato/pipenv-s2i-example.git --name hello-world

A new application should appear in OpenShift, a build should run relatively quickly, and the application should start successfully.

To test the application, create a route with the following command:

oc expose svc/hello-world

You should now be able to visit the API endpoint at that route and see the "Hello World" message.

Creating the application from the web console

To perform the same actions from the UI, navigate to the +Add menu in the Developer view. Next, select Import from Git and copy the Git URL into the Git Repo URL field. Click Edit Import Strategy, select Python, and make sure that a 3.9 image is automatically selected. Update any of the object names and click Create.

Just as with the oc new-app command, a new build should kick off and the application should deploy successfully. Because the UI defaults to creating a route, you should have access to the API endpoint right away.

Pipenv and S2I simplify container building with Python applications

This article discussed some of the common problems Python developers encounter when attempting to containerize applications, and how you can solve some of those problems with Pipenv and S2I. Additionally, we created a simple web application using Pipenv and Python S2I on OpenShift.

Comments