Developers and data scientists who want to build healthy and high-performance Python applications often face challenges related to dependency management, including security risks introduced by the installation of dependencies. This article presents a quick introduction to managing Python dependencies with Project Thoth. The included video tutorial shows you how Thoth's cloud-based resolver finds problems in your Python dependencies and execution environment. Thoth's resolver is a drop-in replacement for other Python resolvers such as pip, Pipenv, or Poetry. Thoth's resolution process can also be used in containerized environments.
Thoth security for Python applications
Containerized environments offer a way to deploy applications to cluster orchestrators such as Kubernetes and Red Hat OpenShift. The base container image used also provides software that can be shipped with the application. Figure 1 shows the hardware and software underlying a typical Python application.
Thoth can be used to discover and guide the security aspects of containerized environments through successful dependency resolution. The following video tutorial is an overview of how Thoth's cloud-based resolver resolves Python application dependencies.
Managing vulnerabilities with Thoth
Once you have an idea of how Thoth works, you can get started using its resolver to manage your Python dependencies. Our Managing vulnerabilities with Thoth tutorial guides you through installing and setting up the environment for Thoth's command-line utility, Thamos. You can start by using pip to install the utility:
pip install thamos
Once you've installed Thamos, you can follow the instructions in the tutorial to inspect an application present in the Thoth Station cli-examples repository. The tutorial also illustrates how to manage applications and application dependencies using the classic Game of Life application:
git clone https://github.com/thoth-station/cli-examples
cd cli-examples
thamos advise
The tutorial also presents a variety of command outputs and shows how to detect security flaws in your Python application dependencies. The linked extended video can walk you through key Thoth resolver features.
Developing Project Thoth
Project Thoth started as a research project in the Artificial Intelligence Center of Excellence (AICoE) group in 2018. Initially, the Thoth team consisted of two engineers, but it quickly expanded with new interns and hires. From 2018 until the time of this writing, the core repositories of Project Thoth accepted contributions from 49 engineers, approximately half of them external to the Thoth team. The number of repositories associated with the thoth-station organization on GitHub has grown to more than 180 (60 of which are now archived).
Note: Project Thoth is also known as AIDevSecOps because of its role as part of a DevSecOps strategy.
To support data aggregation, we've switched our main database twice, and during the whole development phase, the project has been deployed on seven OpenShift clusters. The system generated more than 1.9 TiB of data in these clusters, which were stored in Ceph. The production PostgreSQL database keeps more than 27GiB of mostly Python dependency data, aggregated by background aggregation logic that uses Argo Workflows and Strimzi.
Argo CD helps guarantee GitOps best practices and supports observability through Grafana and OpenShift metrics exposed by OpenShift itself. Tekton and AICoE-CI help automate builds of container images that are hosted on Quay. Prow checks make sure that developers deliver high-quality contributions.
Engineers have given talks about various parts of the Thoth project more than 25 times in North America and Europe.
All the statistics were aggregated as of this writing and we believe the project will continue to expand. You can learn more about Project Thoth by reading the following articles on Red Hat Developer:
-
Build and extend containerized applications with Project Thoth
-
Customize Python dependency resolution with machine learning
-
micropipenv: Installing Python dependencies in containerized applications
-
Managing Python dependencies with the Thoth JupyterLab extension
-
Use Kebechet machine learning to perform source code operations
-
Microbenchmarks for AI applications using Red Hat OpenShift on PSI in project Thoth
Connect with the Thoth team!
As part of Project Thoth, we are accumulating knowledge to help Python developers create healthy applications. If you would like to follow updates, feel free to subscribe to our YouTube channel or follow us on the @ThothStation Twitter handle.
Even though the project is in its early stages, we are constantly improving its stability and reliability. We would be happy for any feedback. To send us feedback or get involved in improving the Python ecosystem, please contact the Thoth Station support repository. You can also directly reach out to the Thoth team on Twitter. You can report any issues you've spotted in open source Python libraries to the support repository or directly write prescriptions for the resolver and send them to our prescriptions repository. By participating in these ways, you can help the Python cloud-based resolver come up with better recommendations for the whole Python community.
Last updated: September 20, 2023