How to set up and reproduce data science experiments

In this learning path, you will learn how to set up data science projects. You will also learn how to consistently reproduce or execute Jupyter notebooks in the data science projects and serve the developed models in the form of a web service on top of Red Hat OpenShift.

Serve the model

To serve the model, you need to update files in the notebook to perform the following tasks:

#

Step

Files

1

Wrap the model inference code into a prediction function.

prediction.py

2

Build a Flask application to serve the prediction function.

wsgi.py

3

Run and test the Flask application locally before 

deploying it to OpenShift.

flask_run.ipynb and flask_test.ipynb

4

Create a Dockerfile to containerize everything: the Flask application, prediction code, model, and dependencies.

Dockerfile

5

Deploy the container to OpenShift and run the prediction model via the REST API.

 

The files in the previous list are provided with the repository. Some things to note are:

  • The prediction function is simply the 05-model-inference.ipynb notebook in the form of a function.
  • The Flask application allows you to create a web application that feeds the user input into the prediction function and returns the result.
  • The Dockerfile copies all the necessary files, including the saved model, and installs all the necessary package.

On the Red Hat OpenShift Data Science page, click the Rubik’s cube icon to list the Red Hat applications, then click the OpenShift console menu option (Figure 16).

Get access to the OpenShift console from the Red Hat OpenShift Data Science console.
Figure 16: Get access to the OpenShift console from the Red Hat OpenShift Data Science console.

Once in the OpenShift console, click on Topology. Right-click the right-side panel to bring up Add to Project. Then select Import from Git (Figure 17).

Creating an application using the “Import from Git” option.
Figure 17: Creating an application using the “Import from Git” option.

Enter the URL of the desired Git repository, https://github.com/rh-aiservices-bu/peer-review.git, and click Create (Figure 18).

Enter the GitHub URL and click Create.
Figure 18: Enter the GitHub URL and click Create.

Wait for the build to complete (Figure 19) and for the pod to be ready. Once the pod is ready, you can see the route to access the application on the bottom right side of the screen (Figure 20).

A build is in progress.
Figure 19: A build is in progress.

 

 The build is complete and the pod is running.
Figure 20: The build is complete and the pod is running.

Test the application using the curl command from the flask_test.ipynb notebook. Default results from your local notebook are shown in Figure 21. Then substitute the route shown in Figure 20 (as illustrated in Figure 22), and the results will look like Figure 23.

A curl command tests the deployment.
Figure 21: A curl command tests the deployment.

 

Replace the host with the route from the deployment.
Figure 22: Replace the host with the route from the deployment.

 

The new results reflect the deployment.
Figure 23: The new results reflect the deployment.

Alternatively, paste the curl command into the terminal to get the prediction for sample data:

[1002700000@jupyterhub-nb-panbalag experiments]$ curl -X POST -H 
"Content-Type: application/json" --data '{"HR": 103, "O2Sat": 
90, "Temp": null, "SBP": null, "MAP": null, "DBP": null, "Resp": 
30, "EtCO2": null, "BaseExcess": 21, "HCO3": 45, "FiO2": null, 
"pH": 7.37, "PaCO2": 90, "SaO2": 91, "AST": 16, "BUN": 14, 
"Alkalinephos": 98, "Calcium": 9.3, "Chloride": 85, 
"Creatinine": 0.7, "Glucose": 193, "Lactate": null, "Magnesium": 
2, "Phosphate": 3.3, "Potassium": 3.8, "Bilirubin_total": 0.3, 
"Hct": 37.2, "Hgb": 12.5, "PTT": null, "WBC": 5.7, "Fibrinogen": 
null, "Platelets": 317}' 
http://peer-review-panbalag-dev.apps.rhods-sb-prod.3sox.p1.openshiftapps.com/predictions
{
  "prediction": "No Sepsis"
}
 
Previous resource
Reproduce notebooks with custom data