-
Notifications
You must be signed in to change notification settings - Fork 134
Step by step run through
This page describes the workflow of a typical notebook run-through in a step-by-step fashion. You can also refer to this video for a complete run-through or take a look at the manual as PDF here.
But here are the different steps:
-
Decide what task you want to perform using ZeroCostDL4Mic: we currently provide published networks that can perform image segmentation, denoising, restoration and artificial labelling. Please refer to our main Wiki page for info on the currently implemented notebooks. You should also read our paper about the general framework to understand whether that's the right tool for you.
-
Now, you need a training dataset. You can either test the networks on the example training dataset that we provide or generate your own training dataset to train a model for your data. The different pages of our Wiki describe how to acquire the data you need in order to train the different networks.
-
The data should be divided into two sets:
- Train dataset: Contains the images that will be used to train the deep learning model.
- Test dataset: The images that will be used for the Quality Check of the trained model. These images should never be included in the model training.
For most tasks, the data needs to be organised in two subfolders, one with the source images
(e.g., brightfield microscopy images) and another with target images
(e.g., masks of the cells segmented in the brightfield microscopy images). When using supervised learning methods, paired images in these folders share identical filenames, ensuring each source image correctly corresponds to its target. Otherwise, they can differ. For the quality check, the names should be the same in both folders.
The notebooks are designed to work with both .png
and .tiff
file formats. For specific applications, such as SMLM, the data needs to be given in text format (.csv
).
If you decide to simply test the networks on our example training dataset, you can simply follow the different Zenodo links that we provide on our main Wiki page. You can download the whole dataset by clicking on the Download button on the Zenodo pages. (For U-net we do not provide dataset ourselves but point towards an available dataset originally generated for the ISBI 2012 Segmentation challenge, see the U-net page for details).
- In order for Google Colab to have access to these data, they need to be uploaded to your Google Drive. you can do that simply by logging into your Google Drive account and clicking on and then use the Folder upload option. Please ensure that ALL the data is uploaded properly before proceeding further.
- You can directly open the Google Colab notebook from our [main Wiki page], by clicking on the corresponding Open In Colab badge . This will automatically open the notebook on Google Colab. Here, we've chosen to use the Dark mode of the notebook (white text on black background) for improved visibility.
On the left-hand side, you will find a Table of content that can be navigated through by clicking on the different items. It is important that you read the 0. Before getting started section.
At that stage, you will not be able to make changes to the notebook unless you save a copy of the notebook. This can be easily done by going into File and Save a copy in Drive.... If you are signed into a Google Drive account, this will automatically save a copy of the notebook in your Google Drive in the Colab Notebooks folder.
- Now you're in the notebook, the first step is to check whether Google Colab will currently allow GPU access and, if so, what type. You can do this by running the 1.1 Change the run-time section. The GPU type is described on the last line of the Cell output. Here, we have been allocated Tesla P100-PCIE-16GB.
- Then, you need to give Google Colab access to the data that you uploaded into your Google Drive. This step is called Mounting the drive and is executed in the 1.2 Mount your Google Drive section. You will need to be logged in and provide an access code accessible via the provided link in the notebook.
Once mounted, the drive and all its content is accessible via the dedicated tab on the left of the page.
- Then, we're ready to install the network and the required dependencies on the virtual machine made available to us by Google Colab. This is done by running the cell corresponding to the section 2 of the notebook.
After running section 2, if you already have a trained model and only wish to perform prediction using this trained model on new data, you can then jump to the 5. Use the network section.
- Section 3 then provides very important information and requires important user input to prepare for the network training. The first part of section 3 highlights and describes the different parameters that the user will need to provide in order to perform the training. This will include going the location (path, on the Google Drive) of the different parts of the training dataset (source and target for fully supervised networks such as U-net, Stardist, CARE, and Label-free prediction, or only the source training dataset for unsupervised networks like Noise2Void). The other parts of the user input will be the training parameters, e.g. number of epochs, number of steps, batch size. If you are unsure about the meaning of these parameters, you can also refer to our Glossary page. Some parameters are currently considered as advanced and can simply be left as their default values if the user wants to get started with a simple training session.
A typical section 3 user input will look like the figure below. Do not forget to run this cell for the notebook to take your input into account.
- You will also be able to visualize a randomly chosen dataset from your training dataset for inspection that the data was uploaded and mounted properly.
- Here we go! Now, you can start the training by running the Train the network section. In some cases, the network may throw some minor warnings about TensorFlow versions, this is not an issue and this can be ignored. We have enforced specific versions of TensorFlow in order to improve the reliability of the notebooks.
This step can take a few minutes to a few hours depending on the network, training parameters, and training dataset size. So be patient! The time taken by each epoch is sometimes displayed as an output of the network training. This can be used to estimate how long the training will take ((here shown as ~2s / epochs and 400 epochs, it should take about 15min, so yes, you have time for a coffee). The time taken for the training will be indicated once it's complete.
- After training is complete, the trained model is automatically saved into the model_path folder previously selected. This allows you to set the training running and ignore the risk of running into time-out issues after the training is over. The following sections of the notebooks can be run as a new Colab session if sections 1 and 2 of the notebook are run again. You can download the trained model from your Google Drive for safekeeping or for sharing with another user (provided the limitations of using pre-trained model highlighted in our bioRxiv paper).
Section 5 of the notebook is very important! It allows you to perform some important quality control on the trained model. This can be evaluated using 2 complementary approaches:
(1) Observe the evolution of the loss function as a function of training time. This allows for the inspection of the presence of over- or under-fitting of the model to the training dataset.
(2) Compute quantitative image metrics on Quality control dataset (also often called "Test dataset" in the machine learning field), where the known ground truth can be compared to the prediction from this trusted dataset.
The Quality control datasets should not be included in the training dataset during training, otherwise the network performance will be over-estimated.
- First, you will need to choose whether you are performing Quality control on the current trained model (as obtained in section 4), or whether you will be doing on a model that was previously trained in a different session. For the latter, you will need to provide a model path and name.
Below are examples of loss function curves over training time, both from the training and validation dataset. If this doesn't make sense, please see our Glossary page and also this review, which explains how to interpret the curves very well.
- The second stage of quality control is to see whether the network is able to generalise to unseen dataset, using a Quality control dataset. This is performed here by the user providing a set of data with the known ground truth output. Depending on the type of network, we use Root Squared Error (RSE) and SSIM or Intersection over union as metrics in order to visually and qualitatively assess whether the model can provide accurate output from unseen data.
Here in the case of Stardist, because the model predicts a segmentation image, the metric used is Intersection over union. We describe the meaning of these metrics in detail in the Supplementary information of our paper.
- The last step, available in Section 6, is what you have been waiting for the whole time: It is the generation of predictions from unseen data using the trained model. This can be performed by giving the path to the Google Drive directory containing the new unseen data that you wish to run the prediction on.
A different model from the currently loaded model can also be used by providing the name and path to the model to be used.
- The notebook will show you an example of prediction chosen at random from the set of data provided. Here, in this example, we trained a Stardist model and the notebook shows an overlay of the original input image with the mask image obtained from the prediction.
- The predictions obtained from the unseen data are now available in the Results folder as chosen earlier and can therefore be downloaded from your Google Drive for further analysis.
Many thanks for trying out ZeroCostDL4Mic! Whether you find it useful, intuitive, or difficult and buggy, we want to hear from you and always welcome constructive feedback. Feel free to report issues on this page or drop us an email or simply Tweet your results using #ZeroCostDL4Mic.
Main:
- Home
- Step by step "How to" guide
- How to contribute
- Tips, tricks and FAQs
- Data augmentation
- Quality control
- Running notebooks locally
- Running notebooks on FloydHub
- BioImage Modell Zoo user guide
- ZeroCostDL4Mic over time
Fully supported networks:
- U-Net
- StarDist
- Noise2Void
- CARE
- Label free prediction (fnet)
- Object Detection (YOLOv2)
- pix2pix
- CycleGAN
- Deep-STORM
Beta notebooks
Other resources: