Organize experimentation process

estimated time to complete: 20 minutes

What will you learn?

This tutorial will guide through the process of organizing your experimentation with Neptune. We will continue with the keras example introduced in the first tutorial so if you haven’t seen it yet I would suggest that you do.

Once your are finished with this tutorial you will know how to:

  • Create a project and customize the experiments view for your needs,
  • Create Neptune configuration file,
  • Tag experiments and use custom filters to organize your experiments,
  • Log images to Neptune,
  • Access hyperparameters programatically,
  • Run grid search over hyperparameters.

Part I: Create a project

By default all experiments that you run, like the ones you ran in the first tutorial, end up in the USER_NAME/sandbox project. That is not desirable, especially if you are working on multiple problems at the same time. With Neptune you can organize your work in projects. In order to do that you need to create a new project first.

Open neptune.ml and log in. image

Go to the Projects section (top left). image

Click on New project button and select the project name, add a short description and specify whether you want it to be public or private. image

Public vs. private project

Public projects can be shared with anyone and are searchable in Neptune. If you want to make sure that your experimentation process and code cannot be accessed by anyone without permissions create a Private project.

You have created your first project

Congrats! Your new project is now available in the Projects section as a tile (see below).

New project tile image

Part II: Neptune configuration file

Neptune configuration file is just a simple .yaml that keeps the information about the project like its name, model hyperparameters and environment. Lets see an example of the neptune.yaml config:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
project: neptune-ml/neptune-tutorials

tags: [keras]

properties: 
   - key: data_version
     value: '0.1'

parameters:
   epoch_nr: 5
   batch_size: 64
   dropout: 0.5

exclude:
   - neptune.log
   - output

pip-requirements-file: requirements.txt

In the next few sections we’ll discuss each part in a bit more detail.

Project

Full project name. It looks like this: USERNAME/PROJECT_NAME. For example it can be neptune-ml/neptune-tutorials. I would suggest that you put USERNAME/PROJECT_NAME that you created in the Part I of this tutorial.

Tags

Tagged experiments are easy to filter, group and organize. It is a simple yet really powerful feature. Something that describes what you are working on in high-level should be put in tags. For example preprocessing_v1 or augmentations are good tags. For now we are working with the keras framework but we may switch to pytorch later. It’s a good candidate for a tag as well.

Properties

Properties are key:value pairs, and you can add them programatically from your code as explained in the first tutorial. You can also define them in the config. You can use those to define data paths or data versions as well as other experiment features that you want to keep track of but are not strictly model parameters.

Parameters

This is the place where you should put all your model parameters. It is a good practice to move all the magic numbers that your model training script contains in here. Yes, I know its hard, but your future self will thank you for it. In our example we’ve put epoch_nr, batch_size and dropout, but you could parametrize the architecture and put it here, or when you develop a pre-processing step you should also put its parameters in the config. It makes it easy to explore the model configuration later and keeps everything in one place.

Exclude

Neptune automatically logs your working directory by sending all its content to the cloud. That could be troublesome sometimes. For instance, when you have a data folder in the working directory or you are storing logs there. In that case, you should simply exclude the heavy files and folders from being logged to Neptune. All you need to do, is specify files and folders that you want to exclude in the exclude section of the config file.

You can exclude everything

If you want to exclude everything and not send any files you can go with:

1
2
exclude:
   - [*]

Pip-requirements-file

If you want to install additional packages before experiment is executed, you should put all the packages in a standard requirements.txt file, then add a pip-requirements-file section to Neptune configuration file:

1
pip-requirements-file: requirements.txt

If you add a section to configuration file, it will try and install the packages every single time you run your script. If you just want to do it “on-demand” you can pass it as an argument to your command:

1
neptune run --pip-requirements-file requirements.txt main.py

For example we will need Pillow, matplotlib and scikit-learn later so you should:

Prepare requirements.txt file like this

1
2
3
Pillow==5.1.0
seaborn==0.9.0
scikit-learn==0.19.1

Other options

You can specify your command in the config

For example you can pass the environmental variable or parameters to your command. Just add a command section to Neptune Configuration file:

1
2
3
#(...)
command: ["CUDA_VISIBLE_DEVICES=0", "4-example-access-parameters-and-tags.py"]
#(...)

You can name your experiments

You can use name: YOUR_EXPERIMENT_NAME section to name your experiments if you want. Just add a name section:

1
2
3
#(...)
name: neural_network_experiments
#(...)

Part III: Use Neptune context

Neptune Context is the way to communicate with Neptune programatically. It lets you access parameters, change tags or log numerical values or images to Neptune. Lets learn how to do all that!

Instantiate

To start talking to Neptune all you need to do is create a Neptune Context instance:

1
2
3
import neptune

ctx = neptune.Context()

Access parameters

You can access hyperparameters defined in the Neptune configuration file from your code. Simply use the .params attribute of the Neptune Context instance:

1
2
BATCH_SIZE = ctx.params.batch_size
DROPOUT = ctx.params.dropout

Modify experiment properties or tags

Since ctx.properties is a Python dictionary and ctx.tags is just a Python list, you can add or delete tags and properties programatically. For example, in case you wanted to add a tag based on condition, you could do:

1
2
if DROPOUT > 0.5:
    ctx.tags.append('large_dropout')

At this point it is worth to run some experiments with different hyperparameters, tags or properties. In the box below is modified minimal example that you can copy & paste to a new file: 4-example-access-parameters-and-tags.py. Play around!

Expand, copy and paste
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import neptune
import keras
from keras import backend as K
from keras.callbacks import Callback

ctx = neptune.Context()

# read parameters and add condition tag
BATCH_SIZE = ctx.params.batch_size
DROPOUT = ctx.params.dropout

if DROPOUT > 0.5:
    ctx.tags.append('large_dropout')

# prepare data
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
n_batches = x_train.shape[0] // BATCH_SIZE + 1

# prepare model
model = keras.models.Sequential([
  keras.layers.Flatten(),
  keras.layers.Dense(512, activation=K.relu),
  keras.layers.Dropout(DROPOUT),
  keras.layers.Dense(10, activation=K.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])


# prepare keras callback to monitor training progress
class NeptuneMonitor(Callback):
    def __init__(self):
        super().__init__()
        self.current_epoch = 0

    def on_batch_end(self, batch, logs=None):
        x = (self.current_epoch * n_batches) + batch
        ctx.channel_send(name='batch end accuracy', x=x, y=logs['acc'])
        ctx.channel_send(name='batch end loss', x=x, y=logs['loss'])

    def on_epoch_end(self, epoch, logs=None):
        ctx.channel_send(name='epoch end accuracy', x=epoch, y=logs['acc'])
        ctx.channel_send(name='epoch end loss', x=epoch, y=logs['loss'])
        self.current_epoch += 1


# fit the model to data
model.fit(x_train, y_train,
          epochs=5, batch_size=BATCH_SIZE,
          callbacks=[NeptuneMonitor()])

Part IV: Advanced logging

If you went through the first tutorial, you already know the basic logging option: numerical channel. There are however, two other options that are very useful as well: text and image channels. You will learn about those in the next sections.

Log numeric values

Just to refresh your memory, if you want to log a numeric value to Neptune you simply need to use .channel_send, specifying the name of the channel and numerical value.

1
ctx.channel_send("metric", metric_value)

You can easily log a sequence of values, for example losses at the end of the epoch by running .channel_send in a loop:

1
2
for epoch, value in enumerate(epoch_losses):
    ctx.channel_send("epoch_losses", epoch, metric_value)

Log text

You can log text values in the very same fashion.

1
ctx.channel_send("My Text Channel", "Incredibly valuable text information")

Log images and charts

Neptune lets you log images as well. It is very helpful for debugging when you are developing models, or to get a better view of experiment results after the training is done. In the following examples we will see how to log a list of training images, and a confusion matrix of test predictions to Neptune.

Image

In order to log an Image to Neptune, you need to convert it to the PIL.Image format and create an instance of the neptune.Image object. neptune.Image apart from the data, which is your PIL.Image, needs you to specify a name and a description arguments. They can be empty strings '' if you want to.

1
neptune.Image(name='IMAGE_TITLE', descritpion='OPTIONAL_LONGER_DESCRIPTION', data=PIL.Image)

Lets log the first 20 images from the train along with the corresponding labels as en example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from PIL import Image
import numpy as np

#(...)

# log some train images
for idx, (img, target) in enumerate(zip(x_train, y_train)):
    img_pil = Image.fromarray((img*255).astype(np.uint8))
    img_neptune = neptune.Image(name='label {}'.format(target),
                            description='image idx {} \nimage true label {}'.format(idx, target),
                            data=img_pil)
    ctx.channel_send('training image sample', img_neptune)

    if idx==20:
        break

Images are available in the Channels section in blue:

image

Chart

You can also log matplotlib charts to Neptune, or any image object that can be converted to the PIL.Image format. For example you may want to save the confusion matrix or ROC curve, after or during training. It should give you a good idea of how your model is doing with a quick glance.

In order to do that you need to convert the matplotlib fig object to PIL.Image. For example you can use the following function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def fig2pil(fig):
    fig.canvas.draw()

    w,h = fig.canvas.get_width_height()
    buf = np.fromstring(fig.canvas.tostring_argb(), dtype=np.uint8)
    buf.shape = (w, h, 4)
    buf = np.roll(buf, 3, axis=2)

    w, h, d = buf.shape
    return Image.frombytes("RGBA", (w , h), buf.tostring())

Now you can create your confusion matrix and log it to Neptune:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

#(...)

y_test_pred = np.argmax(model.predict(x_test), axis=1)
confusion_matrix = confusion_matrix(y_test, y_test_pred)

fig = plt.figure(figsize=(16,12))
sns.heatmap(confusion_matrix, annot=True)

fig_neptune = neptune.Image(name='', description='', data=fig2pil(fig))
ctx.channel_send('confusion matrix',fig_neptune)

image

Expand, copy and paste
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
import neptune
import keras
from keras import backend as K
from keras.callbacks import Callback
from PIL import Image
import numpy as np
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

ctx = neptune.Context()

# read parameters and add condition tag
BATCH_SIZE = ctx.params.batch_size
DROPOUT = ctx.params.dropout

if DROPOUT > 0.5:
    ctx.tags.append('large_dropout')

# prepare data
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
n_batches = x_train.shape[0] // BATCH_SIZE + 1

# log some train images
for idx, (img, target) in enumerate(zip(x_train, y_train)):
    img_pil = Image.fromarray((img*255).astype(np.uint8))
    img_neptune = neptune.Image(name='label {}'.format(target),
                            description='image idx {} \nimage true label {}'.format(idx, target),
                            data=img_pil)
    ctx.channel_send('training image sample', img_neptune)

    if idx==20:
        break

# prepare model
model = keras.models.Sequential([
  keras.layers.Flatten(),
  keras.layers.Dense(512, activation=K.relu),
  keras.layers.Dropout(DROPOUT),
  keras.layers.Dense(10, activation=K.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])


# prepare keras callback to monitor training progress
class NeptuneMonitor(Callback):
    def __init__(self):
        super().__init__()
        self.current_epoch = 0

    def on_batch_end(self, batch, logs=None):
        x = (self.current_epoch * n_batches) + batch
        ctx.channel_send(name='batch end accuracy', x=x, y=logs['acc'])
        ctx.channel_send(name='batch end loss', x=x, y=logs['loss'])

    def on_epoch_end(self, epoch, logs=None):
        ctx.channel_send(name='epoch end accuracy', x=epoch, y=logs['acc'])
        ctx.channel_send(name='epoch end loss', x=epoch, y=logs['loss'])
        self.current_epoch += 1


# fit the model to data
model.fit(x_train, y_train,
          epochs=5, batch_size=BATCH_SIZE,
          callbacks=[NeptuneMonitor()])

# evaluate model on test data
names = model.metrics_names
values = model.evaluate(x_test, y_test)
ctx.properties[names[0]] = values[0]
ctx.properties[names[1]] = values[1]

# create confusion matrix
y_test_pred = np.argmax(model.predict(x_test), axis=1)
confusion_matrix = confusion_matrix(y_test, y_test_pred)

fig = plt.figure(figsize=(16,12))
sns.heatmap(confusion_matrix, annot=True)

def fig2pil(fig):
    fig.canvas.draw()

    w,h = fig.canvas.get_width_height()
    buf = np.fromstring(fig.canvas.tostring_argb(), dtype=np.uint8)
    buf.shape = (w, h, 4)
    buf = np.roll(buf, 3, axis=2)

    w, h, d = buf.shape
    return Image.frombytes("RGBA", (w , h), buf.tostring())

fig_neptune = neptune.Image(name='', description='', data=fig2pil(fig))
ctx.channel_send('confusion matrix',fig_neptune)

You could encounter _tkinter.TclError DISPLAY environment variable error

If that happens, it means you need to change your matplotlib backend. It is simple, just add the following snippet at the top of your script.

1
2
import matplotlib
matplotlib.use('Agg')

Play around with the parameters and run a few experiments

It is always best to learn by doing, so just run a few experiments where you change learning rate, dropout or network architecture. Go to Neptune and explore learning curves, confusion matrices and final results.

You have logged images and charts to Neptune

With Neptune you can run a grid search over hyperparameters with just two, small adjustments.

Go to your Neptune configuration .yaml file and add the metric section

There, you define which channel should be treated as the metric to optimize. You should also define a goal: maximize or minimize, depending on the problem.

1
2
3
metric:
   channel: acc
   goal: maximize

Modify the parameters that you would like to search over from a single value to a list of values

1
2
3
4
parameters:
   epoch_nr: 5
   batch_size: [16, 64, 256]
   dropout: [0.1, 0.2, 0.5, 0.9]

You can also add a tag grid_search to be able to easily find those experiments later

1
tags: [keras, grid_search]
Expand, copy and paste
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
project: neptune-ml/neptune-tutorials

tags: [keras, grid_search]

metric:
   channel: acc
   goal: maximize

properties: 
   - key: data_version
     value: '0.1'

parameters:
   epoch_nr: 5
   batch_size: [16, 64, 128, 256]
   dropout: [0.1, 0.2, 0.5, 0.9]

exclude:
   - neptune.log
   - output

pip-requirements-file: requirements.txt

Run the script normally but point to your grid search config, for example neptune-grid-search.yaml

1
neptune run --config neptune-grid-search.yaml 5-example-log-images.py

You should see a special GRP experiment in Neptune:

image

It groups all the grid search runs so that you could inspect them easily:

image

You have ran your first grid search

Part VI: Customize experiments view

Now, that you have experimented with different parameter values, we should organize our experiment view to make sure that we are always looking at the things that are important to us. Nothing more, nothing less.

Customize dashboard columns

Drop all the columns that you don’t find useful by clicking on the x

I will drop Name, epoch_end_accuracy, epoch_end_loss, Host, accuracy_information and loss_information.

image

Ok, it looks cleaner already.

image

Go to the Manage columns section on the right and search for the columns that are important

I would suggest that you always look at the metrics, so add those straight away. It is often useful to see the model parameters and data version so you can add those too if you want to.

image

Drag and drop columns around to have it the way you like it

image

You will be looking at the experiment view during the entire project so make sure it shines

You have customized your experiment view

Custom filters

You can group experiments in logical pieces by using custom filters. You can organize by the experiment owner (the person that runs the experiment), by a list of tags or a date. Lets now add a custom filter.

Click on the add custom filter button on the left

Choose the name of the filter/group and select the criteria

image

Add a few more filters to see how it works

If you want to access a defined group of experiments, just click on it!

image

You have added a custom filter

Summary

In this tutorial you learnt how to create a new project and organize it by managing columns and adding custom filters. You learnt how to create a Neptune configuration file to keep all the meta-information regarding your experimentation in one place. You also learnt how to access that information from the code and change it programatically. Later, you discovered an advanced logging option that lets you keep track of your images and charts. Finally, with a few simple tweaks you managed to run a quick grid search over hyperparameters to figure out what works best.

Code examples are available on Github and on Neptune. Latter resource also contains experiments.

Tutorial in a nutshell

Configuration file example neptune.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
project: neptune-ml/neptune-tutorials

tags: [keras]

properties: 
   - key: data_version
     value: '0.1'

parameters:
   epoch_nr: 5
   batch_size: 64
   dropout: 0.5

exclude:
   - neptune.log
   - output

grid search

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
metric:
   channel: acc
   goal: maximize

# (...)

parameters:
   epoch_nr: 5
   batch_size: [16, 64, 256]
   dropout: [0.1, 0.2, 0.5, 0.9]

# (...)

parameters and tags

1
2
3
4
5
6
7
8
ctx = neptune.Context()

# read parameters and add condition tag
BATCH_SIZE = ctx.params.batch_size
DROPOUT = ctx.params.dropout

if DROPOUT > 0.5:
    ctx.tags.append('large_dropout')

images

1
2
3
4
5
img_pil = Image.fromarray((img*255).astype(np.uint8))
img_neptune = neptune.Image(name='label {}'.format(target),
                            description='image idx {} \nimage true label {}'.format(idx, target),
                            data=img_pil)
ctx.channel_send('training image sample', img_neptune)