Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Knowledge meets machine learning for smarter decisions, Part 1

January 14, 2021
Donato Marrazzo
Related topics:
Artificial intelligencePython

Share:

    Drools is a popular open source project known for its powerful rules engine. Few users realize that it can also be a gateway to the amazing possibilities of artificial intelligence. This two-part article introduces you to using Red Hat Decision Manager and its Drools-based rules engine to combine machine learning predictions with deterministic reasoning. In Part 1, we'll prepare our machine learning logic. In Part 2, you'll learn how to use the machine learning model from a knowledge service.

    Note: Examples in this article are based on Red Hat Decision Manager, but all of the technologies used are open source.

    Machine learning meets knowledge engineering

    Few Red Hat Decision Manager users know about its roots in artificial intelligence (AI), specifically the AI branch of knowledge engineering (also known as knowledge representation and reasoning). This branch aims to solve the problem of how to organize human knowledge so that a computer can treat it. Knowledge engineering uses business rules, which means a set of knowledge metaphors that subject matter experts can easily understand and use.

    The Decision Model and Notation (DMN) standard recently released a new model and notation for subject matter experts. After years of using different methodologies and tools, we finally have a common language for sharing knowledge representation. A hidden treasure of the DMN is that it makes dealing with machine learning algorithms easier. The connecting link is another well-known standard in data science: The Predictive Model Markup Language, or PMML.

    Using these tools to connect knowledge engineering and machine learning empowers both domains, so that the whole is greater than the sum of its parts. It opens up a wide range of use cases where combining deterministic knowledge and data science predictions leads to smarter decisions.

    A use case for cooperation

    The idea of algorithms that can learn from large sets of data and understand patterns that we humans cannot see is fascinating. However, overconfidence in machine learning technology leads us to underestimate the value of human knowledge.

    Let’s take an example from our daily experience: We are all used to algorithms that use our internet browsing history to show us ads for products we've already purchased. This happens because it’s quite difficult to train a machine learning algorithm to exclude ads for previously purchased products.

    What is a difficult problem for machine learning is very easy for knowledge engineering to solve. On the flip side, encoding all possible relationships between searched words and suggested products is extremely tedious. In this realm, machine learning complements knowledge engineering.

    Artificial intelligence has many branches—machine learning, knowledge engineering, search optimization, natural language processing, and more. Why not use more than one technique to achieve more intelligent behavior?

    Artificial intelligence, machine learning, and data science

    Artificial intelligence, machine learning, and data science are often used interchangeably. Actually, they are different but overlapping domains. As I already noted, artificial intelligence has a broader scope than machine learning. Machine learning is just one facet of artificial intelligence. Similarly, some argue that data science is a facet of artificial intelligence. Others say the opposite, that data science includes AI.

    In the field, data scientists and AI experts offer different kinds of expertise with some overlap. Data science uses many machine learning algorithms, but not all of them. The Venn diagram in Figure 1 shows the spaces where artificial intelligence, machine learning, and data science overlap.

    The overlaps between artificial intelligence, machine learning, and data science
    Figure 1: The overlaps between artificial intelligence, machine learning, and data science.

    Note: See Data Science vs. Machine Learning and Artificial Intelligence for more about each of these technology domains and the spaces where they meet.

    Craft your own machine learning model

    Data scientists are in charge of defining machine learning models after careful preparation. This section will look at some of the techniques data scientists use to select and tune a machine learning algorithm. The goal is to understand the workflow and learn how to craft a model that can cope with prediction problems.

    Note: To learn more about data science methods and processes, see Wikipedia's Cross-industry standard process for data mining (CRISP-DM) page.

    Prepare and train a machine learning algorithm

    The first step for preparing and training a machine learning algorithm is to collect, analyze, and clean the data that we will use. Data preparation is an important phase that significantly impacts the quality of the final outcome. Data scientists use mathematics and statistics for this phase.

    For simplicity, let’s say we have a reliable data set based on a manager’s historical decisions in an order-fulfillment process. The manager receives the following information: Product type (examples are phone, printer, and so on), price, urgency, and category. There are two categories: Basic, for when the product is required employee equipment, and optional, for when the product is not necessary for the role.

    The two decision outcomes are approved or denied. Automating this decision will free the manager from a repetitive task and speed up the overall order-fulfillment process.

    As a first attempt, we could take the data as-is to train the model. Instead, let's introduce a bit of contextual knowledge. In our fictitious organization, the purchasing department has a price-reference table where target prices are defined for all product types. We can use this information to improve the quality of the data. Instead of training our algorithm to focus on the product type, we’ll train it to consider the target price. This way, we won't need to re-train the model when the reference price list changes.

    Choosing a machine learning algorithm

    We now have a typical classification problem: Given the incoming data, the algorithm must find a class for those data. In other words, it has to label each data item approved or denied. Because we have the manager’s collected responses, we can use a supervised learning method. We only need to choose the correct algorithm. The major machine learning algorithms are:

    • Linear Regression
    • Logistic Regression
    • K-Nearest Neighbors
    • Support Vector Machines
    • Decision Trees and Random Forests
    • Neural Networks

    Note: For more about each of these algorithms, see
    9 Key Machine Learning Algorithms Explained in Plain English.

    Except for linear regression, we could apply any of these algorithms to our classification problem. For this use case, we will use a Logistic Regression model. Fortunately, we don't need to understand the algorithm's implementation details. We can rely on existing tools for implementation.

    Python and scikit-learn

    We will use Python and the scikit-learn library to train our Logistic Regression model. We choose Python because it is concise and easy to understand and learn. It is also the de facto standard for data scientists. Many libraries expressly designed for data science are written in Python.

    The example project

    Before we go further, download the project source code here. Open the python folder to find the machine training code (ml-training.py) and the CSV file we'll use to train the algorithm.

    Even without experience with Python and machine learning, the code is easy to understand and adapt. The program's logical steps are:

    1. Initialize the algorithm to train.
    2. Read the available data from a CSV file.
    3. Randomly split the training and test data sets (40% is used for testing).
    4. Train the model.
    5. Test the model against the testing data set.
    6. Print the test results.
    7. Save the trained model in PMML.

    A nice feature of the scikit-learn library is that its machine learning algorithms expose nearly all the same APIs. You can switch between the available algorithms by changing one line of code. This means you can easily benchmark different algorithms for accuracy and decide which one best fits your use case. This type of benchmarking is common because it's often hard to know in advance which algorithm will perform better for a use case.

    Run the program

    If you run the Python program, you should see results similar to the following, but not exactly the same. The training and test data are randomly selected so that the results will differ each time. The point is to verify that the algorithm works consistently across multiple executions.

    Results for model LogisticRegression
    
    Correct: 1522
    
    Incorrect: 78
    
    Accuracy: 95.12%
    
    True Positive Rate: 93.35%
    
    True Negative Rate: 97.10%
    

    The results are quite accurate, at 95%. More importantly, the True Negative Rate (measuring specificity) is very high, at 97.1%. In general, there is a tradeoff between the True Negative Rate and True Positive Rate, which measures sensitivity. Intuitively, you can liken the prediction sensitivity to a car alarm: If we increase an alarm's sensitivity, it is more likely to go off by mistake and increase the number of false positives. The increase in false positives lowers specificity.

    Tune the algorithm

    In this particular use case, of approving or rejecting a product order, we would reject the order. Manual approval is better than having too many false positives, which would lead to wrongly approved orders. To improve our results, we can adjust the logistic regression to reduce the prediction sensitivity.

    Predictive machine learning models are also known as classification algorithms because they place an input dataset in a specific class. In our case, we have two classes:

    • "true" to approve the order.
    • "false" to refuse it.

    To reduce the likelihood of a false positive, we can tune the "true" class weight (note that 1 is the default):

    model = LogisticRegression(class_weight ={
       "true" : .6,
       "false" : 1
    })
    

    Store the model in a PMML file

    Python is handy for analysis, but we might prefer another language or product for running a machine learning model in production. Reasons include better performance and integration with the enterprise ecosystem.

    What we need is a way to exchange machine learning model definitions between different software. The PMML format is commonly used for this purpose. The DMN specification includes a direct reference to a PMML model, which makes this option straightforward.

    You should make a couple of changes to the PMML file before importing it to the DMN editor. First, you might need to change the Python PMML version tag to 4.3, which is the version supported by Decision Manager 7.7 (the current version as of this writing):

    <PMML version="4.3" xmlns="http://www.dmg.org/PMML-4_3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    

    Next, you want to be able to easily identify the predictive model from the DMN modeler. Use the modelName attribute to name your model:

    <RegressionModel modelName="approvalRegression" functionName="classification" normalizationMethod="logit">
    

    The diagram in Figure 2 shows where we are currently with this project.

    A usage block diagram for scikit-learn
    Figure 2: A usage block diagram for scikit-learn.

    Conclusion

    So far, you've seen how to create a machine learning model and store it in a PMML file. In the second half of this article, you will learn more about using PMML to store and transfer machine learning models. You'll also discover how to consume a predictive model from a deterministic decision using DMN. Finally, we'll review the advantages of creating more cooperation between the deterministic world and the predictive one.

    Last updated: January 13, 2021

    Recent Posts

    • Storage considerations for OpenShift Virtualization

    • Upgrade from OpenShift Service Mesh 2.6 to 3.0 with Kiali

    • EE Builder with Ansible Automation Platform on OpenShift

    • How to debug confidential containers securely

    • Announcing self-service access to Red Hat Enterprise Linux for Business Developers

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue