Manage data and work with notebooks in the cloud¶
estimated time to complete: 10 minutes
What will you learn?
In this tutorial you will learn how to start notebook in the cloud on the infrastructure and environment of your choosing. You will also get to know how to upload data to the cloud and download the result to your local machine among others. We will continue with the keras example introduced in the first tutorial so if you haven’t seen it yet I would suggest that you check it out.
Once you are finished with this tutorial you will know how to:
- Spin-off a cloud notebook on a chosen infrastructure.
- Upload and download data to and from the cloud and use it in notebooks.
- Upload and run your local notebook in the cloud.
- Manage channels in the cloud notebook.
- Use outputs from one cloud experiment in the other.
Before we start: download the data and notebooks
Please go to the neptune-tutorials repository and download the
tutorial-3 folder. It contains the
auxiliary_data.pkl file, that contains the data we will need for this tutorial and two example notebooks that we will be using throughout.
Part I: Data management¶
Lets learn how to upload/download and remove data from the Neptune cloud.
There are a few options to upload the data to the cloud:
Command Line Interface
You can upload the data from the cli by running the following command
neptune data upload --project USERNAME/PROJECT_NAME LOCAL_FILEPATH
neptune data upload --project neptune-ml/neptune-tutorials auxiliary_data.pkl
You can also upload/download directly from the browser. Go to Uploads section on the right.
Click on the
browse button to upload your file (or simply drag it):
Command Line Interface
You can list the uploaded files by typing the following command:
neptune data ls --project USERNAME/PROJECT_NAME
You can go directly to the Neptune app and check the
You also have two options to download the data from Neptune cloud.
Command Line Interface
--project and the data path
neptune data download --project USERNAME/PROJECT_NAME CLOUD_FILEPATH
neptune data download --project neptune-ml/neptune-tutorials auxiliary_data.pkl
Click on the download button next to your file:
You also have two options to remove data from the Neptune cloud.
Command Line Interface
You need to specify your
--project and the file or folder to be removed:
neptune data rm --project USERNAME/PROJECT_NAME CLOUD_FILEPATH
neptune data rm --project neptune-ml/neptune-tutorials auxiliary_data.pkl
Go to the
Uploads section and click on the bucket icon next to your file:
Check if your auxiliary_data.pkl was uploaded
You might have accidentally deleted the file while playing with the
neptune data rm.
If so, upload the file again by using
neptune data upload command.
Part II: Create a new notebook¶
Firing up a GPU-fueled notebook with the environment that has all the libraries that you need is really easy with Neptune. Lets do it together, step by step.
Click on the
start notebook button on the top
You can also choose a name for your experiment if you like.
Choose the infrastructure and the environment
Neptune is running on top of the Google Cloud Processing, so there are a lot of options to choose the appropriate machine to spin-off. You can choose preemptible instances, multi-gpu machines or a small and cheap CPU instance. Whatever suits your needs.
When it comes to environment you can choose the
python version and the base docker image on which your environment will be started.
You can choose environments with pytorch/keras/tensorflow or others. Again, choose whatever you need!
Choose the data that you want to mount
If you want to mount the data that you have previously uploaded to the cloud to your experiment, you need to specify it in the
In our case we need to mount the
auxiliary_data.pkl that we uploaded so the data path should read
Upload local notebook
If you have a notebook, that you worked on locally, and you would like to continue working on it but benefit from better infrastructure for you computations, you may want to use this option.
Simply click on the
browse button. I will upload the
6-example-cloud-notebook-essentials.ipynb from the neptune-tutorials repository.
Click start and you should have your notebook started in no time
You have successfully started your first Neptune notebook
Part III: Working with the cloud notebook¶
Cloud notebook is just a jupyter notebook that we all know and love. But to make your work easier we added some extra Neptune features.
You should not create any Neptune
Context instance in the cloud notebooks because it is created for you at runtime as
You can see for yourself by printing it:
Do not instantiate Neptune Context
All the data that you mounted is available in the
/input directory. You can access it normally.
For example you can load it:
1 2 3 4
from sklearn.externals import joblib auxiliary_data = joblib.load('/input/auxiliary_data.pkl') x_aux, y_aux = auxiliary_data
Data is available in the
When training models in the cloud notebook, all the charts are created nicely in a separate section on the right:
However, if you decide to restart training with Neptune callbacks you may cause errors like this one:
That is why you should usually reset all channels that you plan on using with the
Just like we’ve done in did in the snippet below:
1 2 3 4 5
ctx.reset_all_channels() model.fit(x_train, y_train, epochs=5, batch_size=BATCH_SIZE, callbacks=[NeptuneMonitor()])
With that, all the training should go smoothly!
You can save models, predictions or any other data with no problems.
Neptune has a special
/ouptut directory that is designed exactly for this, so put whatever you want to save for later in there!
For example, lets save model predictions on test data:
y_test_pred = model.predict(x_test) joblib.dump(y_test_pred, '/output/test_predictions.pkl')
Part IV: Work on data from previous or multiple experiments¶
Sometimes you want to pause your work and get back to it later.
You can simply stop your current notebook and use the
Clone button on the left to restart it:
Remember to save all the data you will need for later
Stopping your notebook will kill the kernel and forget the state of your notebook.
If you have something important you can always save it in the special
Start a new notebook with mounted data from the previous experiment¶
Sometimes you may want to use outputs from previous cloud experiments in other experiments. For example, you may want to analyse the predictions on test data that you saved in the previous part of the tutorial.
Check the id of the previous experiment
In our case it is the
Start a notebook
Start notebook button and specify the path to the previous experiment.
Neptune uses a convention where you need to add
../ before the experiment id.
Since we want to mount the
/output folder from that experiment, in our case the path will read
Load the data in the new notebook¶
All the mounted data is available in the
So you just need to load it. In our case it would be:
1 2 3
from sklearn.externals import joblib test_predictions = joblib.load('/input/output/test_predictions.pkl')
Data from the previous experiment is loaded
You can now analyze it. Lets plot a distribution of the test predictions for each class:
1 2 3 4 5 6 7
import seaborn as sns import pandas as pd import numpy as np pred_melted = pd.melt(pd.DataFrame(test_predictions)) g = sns.FacetGrid(pred_melted, col="variable", col_wrap=3) g = g.map(sns.distplot, "value")
In this tutorial you learnt how to manage Neptune cloud storage. You learnt how to start a notebook choosing the infrastructure, environment and data that you need. Finally you got to know how to train models in the cloud notebooks and how to use outputs from one experiment as inputs to the next one.
Tutorial in a nutshell¶
neptune data upload --project USER_NAME/PROJECT_NAME filepath
neptune data download --project USER_NAME/PROJECT_NAME filepath
neptune data rm --project USER_NAME/PROJECT_NAME filepath
neptune data ls --project USER_NAME/PROJECT_NAME