Using editable dependencies is becoming more popular, especially if you want to install from a version control system. But
--editable is not without dangers. This article discusses why using editable dependencies should be considered a bad practice, and why it's a particularly bad practice for data scientists using Project Thoth.
The use case for editable dependencies
With Python’s pip and with pipenv, you can install dependencies in an editable form. As an example, imagine you wanted to fix a bug. You could install the package from a version control system in an editable way:
pipenv install -e git+https://github.com/requests/requests.git#egg=requests
Now, you can create the changes to fix the bug and test them on your local machine.
Over time, however, we have seen practices that are generally acceptable, but not good for data scientists to follow. One of these practices is to include an application's dependencies in an editable way. If your goal is to change the package itself, like a developer or open source contributor would,
--editable is indeed a good practice. But let's focus on why editable dependencies are bad in the context of data science.
Note: Red Hat's now+Next blog includes posts that discuss technologies that are under active development in upstream open source communities and at Red Hat. We believe in sharing early and often the things we're working on, but we want to note that unless otherwise stated the technologies and how-tos shared here aren't part of supported products, nor promised to be in the future.
Editable dependencies and Project Thoth
Project Thoth is working on a number of software stacks running in Jupyter notebooks, which themselves are run as a container in the context of Open Data Hub. The software stack being run is considered immutable: It’s coming from a container image and has been built up-front, so the container image is read-only.
While the dependencies could have been installed as editable during the container build, the resulting container image will be immutable. Therefore, these dependencies typically should not be edited because they are the known and trusted versions. They are intended to be included as-is in the immutable container images when built by Red Hat OpenShift using Tekton pipelines.
--editable breaks provenance checks
Editable installs are editable versions of a specific package—they can be easily adjusted locally and there is no direct way of controlling what is present in the package source code once it's been adjusted. Also, any package dependency adjustments cannot be tracked. Putting a repeatable and traceable build pipeline in place using editable installs opens the door for untrackable changes.
Using an editable dependency catapults us back into a bygone era, where every deployment is maintained like fine china instead of paper plates: There is no way to tell what software is being executed!
We strongly recommend using so-called provenance checks to verify the origin of the software package going into a deployment.
--editable breaks recommendations
Thoth analyzes dependencies and aggregates information about them, so it has extensive knowledge about which packages are “good” or “bad” with respect to various aspects of the software. Examples include performance indications, Python security issue scores from Bandit, and CVE information.
Packages coming from a local filesystem, or coming randomly from the internet, can introduce malicious or unpredictable behavior. Therefore, we strongly recommend looking at all packages going into a deployment and having a justification for each one.
--editable leads to unpredictable software stacks
As source code introduced by editable installs can have additional adjustments, we simply do not recommend using editable installs for any purpose other than local development or debugging an application.
Conclusion: Work toward predictable software stacks
Always use sources you have vetted and trust, and use properly packaged Python packages released on package indexes conforming to Python Enhancement Proposal (PEP) standards. Use these practices to ensure your applications do not introduce any unpredictable issues.
The CI/CD pipelines in the Operate First initiative are a good place to find resources. With Project Thoth and our services like the Khebhut GitHub marketplace and thamos, we support this mindset and offer a rich knowledge graph as a foundation for your package selection, and ultimately for your decision for what you put into production.