How to access, download, and analyze data for S3 usage

Learning path | 4 resources | 45 mins | Published on March 14, 2022

In this learning path, you will start your Jupyter notebook server and select preferences for S3 usage. You will also learn how to access and download the data you create as well as analyze it, using a variety of skills and tools.

Page

Access and download S3 data

August 10, 2022

Access and download S3 data

You are now inside your JupyterLab environment (Figure 2). It's a web-based environment, but everything you do here is actually happening on the OpenShift Data Science cluster. This means that, without having to install and maintain anything on your own computer, and without consuming lots of local resources such as CPU and RAM, you can conduct your data science work in this powerful and stable managed environment.

Figure 2. The Name explorer panel in a JupyterLab workspace shows available options.

You are now in a window that resembles a file browser on a desktop. The window displays the files and folders that are saved in your personal space inside OpenShift Data Science. The window is pretty empty right now, though. So the first thing we will do is add content into this environment by using Git.

Cloning a GitHub repository

You can clone a Git repository in JupyterLab through the left-hand toolbar or the Git menu option in the main menu (Figure 3).

Figure 3. Access to GitHub repositories is available in the main menu at the top of the screen and in the toolbar on the left.

Let's clone a repository using the left-hand toolbar. Click on the GitHub icon, shown in Figure 4.

The Git icon is the third icon from the top in the JupyterLab toolbar.

Figure 4. The GitHub icon is the third icon from the top in the JupyterLab toolbar.

Then click on Clone a Repository (Figure 5.)

Figure 5. After selecting the Git icon, select “Clone a Repository”.

Enter your Git repository URL, which for this learning path is https://github.com/rh-aiservices-bu/access-s3-data. Then click CLONE (Figure 6).

Figure 6. Finish cloning the repository by clicking the CLONE button.

Cloning takes a few seconds, after which you can double-click and navigate to the newly-created folder (access-s3-data) which contains your cloned Git repository.

For this learning path, double-click and navigate to the newly-created folder, named access-s3-data. The Git repository contains an empty datasets directory and the following files (Figure 7):

```
downloadData.ipynb
```
```
simpleCalc.ipynb
```
```
Requirements.txt
```
```
README.md
```

Figure 7. The user interface shows a list of downloaded files.

Access and download S3 data

In the Name menu, double-click the downloadData.ipynb notebook (Figure 8).

Figure 8. Open the downloadData Jupyter notebook.

Run each cell in the notebook, using the Shift-Enter key combination, and pay attention to the execution results. Using this notebook, we will:

Make a connection to an AWS S3 storage bucket
Download a CSV file into the ‘datasets’ folder
Rename the downloaded CSV file to 'newtruckdata.csv'

View your new CSV file

Inside the ‘datasets’ directory, double-click the 'newtruckdata.csv' file. File contents should appear as shown in Figure 9.

Figure 10. The user interface shows the contents of the newtruckdata.csv file.

The file contains the data you will analyze. Now we can move to the next learning resource and perform some analytics.

Previous resource

Overview: How to access, download, and analyze data for S3 usage

Next resource

Analyze your S3 data

Report a website issue

Your name

Your e-mail address

Subject

Message

Type of request/issue

Problem Page URL

Country/Territory

Red Hat Account Number

Linux

Java runtimes & frameworks

Kubernetes

Integration & App Connectivity

Automation

Developer tools

Developer Sandbox for Red Hat OpenShift

Programming Languages & Frameworks

System Design & Architecture

Developer Productivity

Secure Development & Architectures

Platform Engineering

Automated Data Processing

Start exploring in the Developer Sandbox for free

Interactive Lessons and Learning Paths

Developer Sandbox Activities

E-Books

Tutorials

Cheat Sheets

API Catalog

Red Hat Learning

Tech Talks

Deep Dives

Red Hat Summit

How to access, download, and analyze data for S3 usage

Access and download S3 data