Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

AI software stack inspection with Thoth and TensorFlow

September 30, 2020
Francesco Murdaca
Related topics:
Artificial intelligenceCI/CD
Related products:
Red Hat Enterprise Linux

Share:

    Project Thoth develops open source tools that enhance the day-to-day life of developers and data scientists. Thoth uses machine-generated knowledge to boost the performance, security, and quality of your applications using artificial intelligence (AI) through reinforcement learning (RL). This machine-learning approach is implemented in Thoth adviser (if you want to know more, click here) and it is used by Thoth integrations to provide the software stack based on user inputs.

    In this article, I introduce a case study—a recent inspection of a runtime issue when importing TensorFlow 2.1.0—to demonstrate the human-machine interaction between the Thoth team and Thoth components. By following the case study from start to finish, you will learn how Thoth gathers and analyzes some of the data to provide advice to its users, including bots such as Kebechet, AI-backed continuous integration pipelines, and developers using GitHub apps.

    Both the Thoth machinery and team rely on bots and automated pipelines running on Red Hat OpenShift. Thoth takes a variety of inputs to determine the correct advice:

    • Solver, which Thoth uses to discover if something can be installed in a particular runtime environment, such as Red Hat Enterprise Linux (RHEL) 8 with Python 3.6.
    • Security indicators that uncover vulnerabilities of a different nature, which can be applied to security advice.
    • Project meta information, such as project-maintenance status or development-process behavior that affects the overall project.
    • Inspections, which Thoth uses to discover code quality issues or performance across packages.

    This article focuses on inspections. I will show you the results from an automated software stack inspection run through Project Thoth's Dependency Monkey and Amun components. Thoth uses automated inspections to introduce new advice about software stacks for Thoth users. Another way to integrate advice could be via automated pipelines that can:

    • Boost performance
    • Optimize machine learning (ML) model inference
    • Ensure that there are no failures during the model runtime (for example, during inference)
    • Avoid using software stacks that does not guarantee security.

    Thoth components: Amun and Dependency Monkey

    Given the list of packages that should be installed and the hardware requested to run the application, Amun executes the requested application stack in the requested environment. Amun acts as an execution engine for Thoth. Applications are then built and tested using Thoth Performance Indicators (PI). See Amun's README documentation for more information about this service.

    Another Thoth component, Dependency Monkey, can be used to schedule Amun. Dependency Monkey was designed to automate the evaluation of certain aspects of a software stack, such as code quality or performance. Therefore, it aims to automatically verify software stacks and aggregate relevant observations.

    From these two components, the Thoth team created Thoth Performance Datasets, which contains observations about performance for software stacks. For example, Thoth Performance Datasets could use PIconv2d to obtain performance data for different application types (such as machine learning) and code quality. It could then use a performance indicator like PiImport to discover errors during an application run.

    Transparent and reproducible datasets

    In the spirit of open source, the Thoth team wants to guarantee that the datasets and knowledge that we collect and use are transparent and reproducible. Machine learning models, such as the reinforcement learning model leveraged by Thoth Adviser, should be as transparent as the datasets they are working on.

    For transparency, we've introduced Thoth Datasets, where we share the notebooks that we used to analyze a data collection and all of the results. We encourage anyone interested in the topic to use Thoth Datasets to verify our findings or for other purposes.

    For reproducibility, we've introduced Dependency Monkey Zoo, where we collect all of the specifications used to run an analysis. Having all of the specs in one place allows us to reproduce the results of a study. Anyone can use the specs to perform similar studies in different environments for comparison.

    Case study: Automated software stack inspection for TensorFlow 2.1.0

    For this case study, we will use Thoth's Amun and Dependency Monkey components to automatically produce data. We'll then introduce reusable Jupyter notebook templates to extract specific information from the datasets. Finally, we'll create new advice based on the results.

    The human side of this human-machine interaction focuses on assessing the quality of the results and formulating the advice. The rest of the process is machine-automated. Automation makes the process easy to repeat to produce a new source of information for analysis.

    In the next sections, I introduce the initial problem, then describe the analysis performed and the resulting new advice for Thoth users.

    Initial request

    Our goal with this inspection is to analyze build- and runtime failures when importing TensorFlow 2.1.0 and use these to derive observations about the quality of the software stack.

    For this analysis, Dependency Monkey sampled the state space of all of the possible TensorFlow==2.1.0 stacks (from upstream builds). For inspection purposes, we built and ran the application using the PiMatmul performance indicator.

    The sections below detail the Dependency Monkey inspection results and the resulting analysis.

    The first analysis

    From the software stack analysis of inspection results, we discovered that TensorFlow 2.1.0 was giving errors during approximately 50% of inspections during a run. The error is shown in the following output from the Jupyter Notebook:

    '2020-09-05 07:14:36.333589: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library \'libnvinfer.so.6\'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
    2020-09-05 07:14:36.333811: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library \'libnvinfer_plugin.so.6\'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
    2020-09-05 07:14:36.333844: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
    /opt/app-root/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
    from ._conv import register_converters as _register_converters
    /opt/app-root/lib/python3.6/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.5) or chardet (2.3.0) doesn\'t match a supported version!
    RequestsDependencyWarning)
    Traceback (most recent call last):
     File "/home/amun/script", line 14, in <module>
      import tensorflow as tf
      File "/opt/app-root/lib/python3.6/site-packages/tensorflow/__init__.py", line 101, in <module>
    from tensorflow_core import *
      File "/opt/app-root/lib/python3.6/site-packages/tensorflow_core/__init__.py", line 40, in <module>
    from tensorflow.python.tools import module_util as _module_util
      File "/opt/app-root/lib/python3.6/site-packages/tensorflow/__init__.py", line 50, in __getattr__
    module = self._load()
      File "/opt/app-root/lib/python3.6/site-packages/tensorflow/__init__.py", line 44, in _load\n    module = _importlib.import_module(self.__name__)
    File "/opt/app-root/lib64/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
      File "/opt/app-root/lib/python3.6/site-packages/tensorflow_core/python/__init__.py", line 95, in <module>
    from tensorflow.python import keras
      File "/opt/app-root/lib/python3.6/site-packages/tensorflow_core/python/keras/__init__.py", line 27, in <module>
    from tensorflow.python.keras import models
      File "/opt/app-root/lib/python3.6/site-packages/tensorflow_core/python/keras/__init__.py", line 27, in <module>
    from tensorflow.python.keras import models
      File "/opt/app-root/lib/python3.6/site-packages/tensorflow_core/python/keras/models.py", line 25, in <module>
    from tensorflow.python.keras.engine import network
      File "/opt/app-root/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/network.py", line 46, in <module>
    from tensorflow.python.keras.saving import hdf5_format
      File "/opt/app-root/lib/python3.6/site-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 32, in <module>
    from tensorflow.python.keras.utils import conv_utils
      File "/opt/app-root/lib/python3.6/site-packages/tensorflow_core/python/keras/utils/conv_utils.py", line 22, in <module>
    from six.moves import range  # pylint: disable=redefined-builtin
    ImportError: cannot import name \'range\''

    Specifically, we could see that some combinations of six and urllib3 produced that error, as described in the following output:

    =============================================
    urllib3
    =============================================
    
    In successfull inspections:
    ['urllib3-1.10.4-pypi-org' 'urllib3-1.16-pypi-org' 'urllib3-0.3-pypi-org'
    'urllib3-1.21.1-pypi-org' 'urllib3-1.25.1-pypi-org'
    'urllib3-1.25-pypi-org' 'urllib3-1.18.1-pypi-org'
    'urllib3-1.24.1-pypi-org' 'urllib3-1.10.1-pypi-org'
    'urllib3-1.10.3-pypi-org' 'urllib3-1.25.7-pypi-org'
    'urllib3-1.10-pypi-org' 'urllib3-1.7.1-pypi-org' 'urllib3-1.13-pypi-org'
    'urllib3-1.19.1-pypi-org' 'urllib3-1.11-pypi-org'
    'urllib3-1.10.2-pypi-org' 'urllib3-1.15.1-pypi-org'
    'urllib3-1.25.3-pypi-org' 'urllib3-1.13.1-pypi-org'
    'urllib3-1.21-pypi-org' 'urllib3-1.17-pypi-org' 'urllib3-1.23-pypi-org']
    
    In failed inspections:
    ['urllib3-1.5-pypi-org']
    
    In failed inspections but not in successfull:
    {'urllib3-1.5-pypi-org'}
    
    In failed inspections and in successfull:
    set()
    
    
    =============================================
    six
    =============================================
    
    In successfull inspections:
    ['six-1.13.0-pypi-org' 'six-1.12.0-pypi-org']
    
    In failed inspections:
    ['six-1.13.0-pypi-org' 'six-1.12.0-pypi-org']
    
    In failed inspections but not in successfull:
    set()
    
    In failed inspections and in successfull:
    {'six-1.13.0-pypi-org', 'six-1.12.0-pypi-org'}

    Therefore, we discovered that urllib3 library releases were the same across all failed inspections but not in any of the successful inspections, while six library releases didn't show any differences between failed and successful once.

    The second analysis

    For our next step, we decided to run another analysis to restrict the cases. For this run, we used a newly created performance indicator called PiImport as shown in Table 1.

    Table 1: The PiImport performance indicator.
    Description Dependency Monkey sampled the state space of all the possible TensorFlow==2.1.0 stacks (from upstream builds). The application was built and run using the PiImport performance indicator.
    Specification   Dependency Monkey specification
    Goal Identify specific versions that fail to produce final advice.
    Reference Issue

    Results of the second analysis

    From the new analysis, we were able to identify all of the specific versions of urllib3 and six that did not work together and that were causing issues during runtime. The output in Figure 1 shows the incompatible versions of the two packages.

    dFigure 1: Identifying the incompatible versions of urllib3 and six that do not allow to run Tensorflow 2.1.0.

    The advice

    All of this backtracing led to an adviser step called TensorFlow21Urllib3Step that you can find in adviser steps. With this step, we can penalize software stacks containing the specific version of urllib3 that cause runtime issues when attempting to import TensorFlow 2.1.0. The following prediction, created by Thoth, results in a higher quality software stack for users.

    Table 2: The TensorFlow21Urllib3Step adviser step.
    Title TensorFlow in version 2.1 can cause runtime errors when imported, caused by incompatibility between urllib3 and six packages.
    Issue description Package urllib3 in some versions is shipped with a bundled version of six, which has its own mechanism for imports and import context handling. Importing urllib3 in the TensorFlow codebase causes initialization of the bundled six module, which collides with a subsequent import from unbundled six modules.

    You can find the complete issue description, and the recommended resolution, here.

    Last updated: June 20, 2022

    Recent Posts

    • Assessing AI for OpenShift operations: Advanced configurations

    • OpenShift Lightspeed: Assessing AI for OpenShift operations

    • OpenShift Data Foundation and HashiCorp Vault securing data

    • Axolotl meets LLM Compressor: Fast, sparse, open

    • What’s new for developers in Red Hat OpenShift 4.19

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue