How to set up and reproduce data science experiments

In this learning path, you will learn how to set up data science projects. You will also learn how to consistently reproduce or execute Jupyter notebooks in the data science projects and serve the developed models in the form of a web service on top of Red Hat OpenShift.

Reproduce notebooks with custom data

Replicating the experiments with custom data is super easy. Upload your new data and update the configuration to point to your data. For example, assume that you have data in a comma-separated values (CSV) file named my_input.csv. Upload that file to the my_input.csv folder, then update the value for input in the [Data] section of the configuration value. Remember also to increment the version for the current run. The resulting configuration looks like Figure 13.

The configuration reflects the custom data in my_input.csv under data/input.
Figure 13: The configuration reflects the custom data in my_input.csv under data/input.

Then rerun all the notebooks. Results are stored in the experiments/experiment_2 folder (Figure 14).

The experiment_2 directory contains results from the second run of the notebooks with custom data.
Figure 14: The experiment_2 directory contains results from the second run of the notebooks with custom data.

Now you can compare your results against the published version. Figure 15 compares the results of the trained models from both runs.

Results of the new run are compared against the published version.
Figure 15: Results of the new run are compared against the published version.
Previous resource
Review and reproduce notebooks
Next resource
Serve the model