Manage data and work with notebooks in the cloud

estimated time to complete: 10 minutes

What will you learn?

In this tutorial you will learn how to start notebook in the cloud on the infrastructure and environment of your choosing. You will also get to know how to upload data to the cloud and download the result to your local machine among others. We will continue with the keras example introduced in the first tutorial so if you haven’t seen it yet I would suggest that you check it out.

Once you are finished with this tutorial you will know how to:

  • Spin-off a cloud notebook on a chosen infrastructure.
  • Upload and download data to and from the cloud and use it in notebooks.
  • Upload and run your local notebook in the cloud.
  • Manage channels in the cloud notebook.
  • Use outputs from one cloud experiment in the other.

Before we start: download the data and notebooks

Please go to the neptune-tutorials repository and download the tutorial-3 folder. It contains the auxiliary_data.pkl file, that contains the data we will need for this tutorial and two example notebooks that we will be using throughout.

Part I: Data management

Lets learn how to upload/download and remove data from the Neptune cloud.

Upload data

There are a few options to upload the data to the cloud:

Command Line Interface

You can upload the data from the cli by running the following command

1
neptune data upload --project USERNAME/PROJECT_NAME LOCAL_FILEPATH

For example:

1
neptune data upload --project neptune-ml/neptune-tutorials auxiliary_data.pkl

Browser

You can also upload/download directly from the browser. Go to Uploads section on the right.

image

Click on the browse button to upload your file (or simply drag it):

image

List uploads

Command Line Interface

You can list the uploaded files by typing the following command:

1
neptune data ls --project USERNAME/PROJECT_NAME

Browser

You can go directly to the Neptune app and check the Uploads section:

image

Download data

You also have two options to download the data from Neptune cloud.

Command Line Interface

Choose your --project and the data path

1
neptune data download --project USERNAME/PROJECT_NAME CLOUD_FILEPATH

For example:

1
neptune data download --project neptune-ml/neptune-tutorials auxiliary_data.pkl

Browser

Click on the download button next to your file:

image

Remove data

You also have two options to remove data from the Neptune cloud.

Command Line Interface

You need to specify your --project and the file or folder to be removed:

1
neptune data rm --project USERNAME/PROJECT_NAME CLOUD_FILEPATH

For example:

1
neptune data rm --project neptune-ml/neptune-tutorials auxiliary_data.pkl

Browser

Go to the Uploads section and click on the bucket icon next to your file:

image

Check if your auxiliary_data.pkl was uploaded

You might have accidentally deleted the file while playing with the neptune data rm. If so, upload the file again by using neptune data upload command.

Part II: Create a new notebook

Firing up a GPU-fueled notebook with the environment that has all the libraries that you need is really easy with Neptune. Lets do it together, step by step.

Click on the start notebook button on the top

You can also choose a name for your experiment if you like.

image

Choose the infrastructure and the environment

Neptune is running on top of the Google Cloud Processing, so there are a lot of options to choose the appropriate machine to spin-off. You can choose preemptible instances, multi-gpu machines or a small and cheap CPU instance. Whatever suits your needs.

When it comes to environment you can choose the python version and the base docker image on which your environment will be started. You can choose environments with pytorch/keras/tensorflow or others. Again, choose whatever you need!

image

Choose the data that you want to mount

If you want to mount the data that you have previously uploaded to the cloud to your experiment, you need to specify it in the Files section. In our case we need to mount the auxiliary_data.pkl that we uploaded so the data path should read uploads/auxiliary_data.pkl.

image

Upload local notebook

If you have a notebook, that you worked on locally, and you would like to continue working on it but benefit from better infrastructure for you computations, you may want to use this option. Simply click on the browse button. I will upload the 6-example-cloud-notebook-essentials.ipynb from the neptune-tutorials repository.

image

Click start and you should have your notebook started in no time

image

You have successfully started your first Neptune notebook

Part III: Working with the cloud notebook

Cloud notebook is just a jupyter notebook that we all know and love. But to make your work easier we added some extra Neptune features.

Neptune Context

You should not create any Neptune Context instance in the cloud notebooks because it is created for you at runtime as ctx. You can see for yourself by printing it:

1
ctx

Do not instantiate Neptune Context

Load data

All the data that you mounted is available in the /input directory. You can access it normally. For example you can load it:

1
2
3
4
from sklearn.externals import joblib
auxiliary_data = joblib.load('/input/auxiliary_data.pkl')

x_aux, y_aux = auxiliary_data

Data is available in the /input

Channel reset

When training models in the cloud notebook, all the charts are created nicely in a separate section on the right:

image

However, if you decide to restart training with Neptune callbacks you may cause errors like this one:

image

That is why you should usually reset all channels that you plan on using with the .reset_all_channels() method. Just like we’ve done in did in the snippet below:

1
2
3
4
5
ctx.reset_all_channels()

model.fit(x_train, y_train,
          epochs=5, batch_size=BATCH_SIZE,
          callbacks=[NeptuneMonitor()])

With that, all the training should go smoothly!

Save data

You can save models, predictions or any other data with no problems. Neptune has a special /ouptut directory that is designed exactly for this, so put whatever you want to save for later in there! For example, lets save model predictions on test data:

1
2
y_test_pred = model.predict(x_test)
joblib.dump(y_test_pred, '/output/test_predictions.pkl')

Part IV: Work on data from previous or multiple experiments

Clone notebook

Sometimes you want to pause your work and get back to it later. You can simply stop your current notebook and use the Clone button on the left to restart it:

image

Remember to save all the data you will need for later

Stopping your notebook will kill the kernel and forget the state of your notebook. If you have something important you can always save it in the special /output folder.

Start a new notebook with mounted data from the previous experiment

Sometimes you may want to use outputs from previous cloud experiments in other experiments. For example, you may want to analyse the predictions on test data that you saved in the previous part of the tutorial.

Check the id of the previous experiment

In our case it is the TT-54 experiment.

Start a notebook

Click on Start notebook button and specify the path to the previous experiment. Neptune uses a convention where you need to add ../ before the experiment id. Since we want to mount the /output folder from that experiment, in our case the path will read ../TT-54/output.

image

Load the data in the new notebook

All the mounted data is available in the /input folder. So you just need to load it. In our case it would be:

1
2
3
from sklearn.externals import joblib

test_predictions = joblib.load('/input/output/test_predictions.pkl')

image

Data from the previous experiment is loaded

You can now analyze it. Lets plot a distribution of the test predictions for each class:

1
2
3
4
5
6
7
import seaborn as sns
import pandas as pd
import numpy as np

pred_melted = pd.melt(pd.DataFrame(test_predictions))
g = sns.FacetGrid(pred_melted, col="variable", col_wrap=3)
g = g.map(sns.distplot, "value")

image

Summary

In this tutorial you learnt how to manage Neptune cloud storage. You learnt how to start a notebook choosing the infrastructure, environment and data that you need. Finally you got to know how to train models in the cloud notebooks and how to use outputs from one experiment as inputs to the next one.

Code examples are available on Github and on Neptune. Latter resource also contains experiments.

Tutorial in a nutshell

upload data

1
neptune data upload --project USER_NAME/PROJECT_NAME filepath

download data

1
neptune data download --project USER_NAME/PROJECT_NAME filepath

remove data

1
neptune data rm --project USER_NAME/PROJECT_NAME filepath

list data

1
neptune data ls --project USER_NAME/PROJECT_NAME

reset channels

1
ctx.reset_all_channels()