At the end of each training, we mainly looked at four values
to assess the quality of the new weights: the loss (mainly
L1-loss) on the train data (loss) and on the validation data
(val loss), the ’mask loss’, the pixel-wise cross entropy loss
on the validation data, and the average precision (AP) in
combination with intersection over union (IoU). The loss gave
us a direct indication if the model was getting better or worse,
and as our main goal was to generate masks for the cells,
the mask loss was also useful in the same manner giving an
information on the pixel-wise precision. The average precision
was calculated using precision and recall in function of IoU
where it computes the region of the predicted bounding box
overlapping the ground truth divided by the joined area of
those two. In sum, we looked at the two losses after each
training, and visually inspected the new weights by predicting
an image from the validation set where it could compute the
AP for different values of IoU.
The training sessions took on average from 6 to 10 hours
for one epoch, so we could only do one at the time. We
used Google Colab hosted servers as our computers could not
handle the processing. Using Google Colab was unstable and
resulted in many training sessions to be aborted due to the
way the firm allocate their servers.
D. Improvement attempts
After getting the first training sessions completed and the
first positive results, we tried to improve the model by mod-
ifying some aspects. Firstly, we adjusted the batch size (the
number of data points the network takes at each iteration) and
the number of steps per epoch to find the best combination
between execution speed and the number of images that were
used to train. Secondly, all pictures looked similar, that is
with a colony of yeast in the center of the image. To try to
make the algorithm more robust in case a new image deviated
from the usual, and also to rise the number of training data,
we decided to use image augmentation. This function takes
an original picture, and modifies it by rotation, symmetry,
cropping and adding blur randomly. This way, the images that
we train the network with have much more variability than
the ones provided by the lab. The final size of the training
image are 512x512, but the original images are cropped to
smaller size and magnified by a certain factor to match the
previously mentioned size. This makes the yeast cells appear
larger and allows the network to learn to generate precise
masks. Note that different values were tested here and the
results are presented in the next section. Another modification
was to try to pre-process the images by increasing contrast
and luminosity, which made the cells much easier to see with
the naked eye, as the original images are low-contrast images.
V. RESULTS
In this section, we present the results of many training,
where we tested different parameters or image pre-processing
to eventually obtain the weights giving the most accurate
results, based on the pixel-wise cross entropy loss on the
validation dataset (val mask loss).
Firstly, we aimed to determine if all the layers of the neural
network were necessary. By comparing the loss results in Table
I, we observe a significant difference between training on
all the layers, or only on the heads layers (RPN, classifier
and mask heads of the network). Moreover, we could choose
to train between different stages of the ResNet-101 (e.g. 4
and above), but it lead to major increase in the loss without
reducing the running time, so this option was discarded.
heads layers All layers
Loss 1.2419 1.0246
TABLE I: Loss in function of the layers trained
Secondly, we wanted to determine the optimal number of
images per batch. To do so, we tested 3 different batch size as
seen in Table II. We can observe that the loss significantly
decrease when the number of images per batch increase.
However, we weren’t able to increase the batch size above 4
to find the optimal number of images due to the considerable
training time and the lack of power and memory space.
Batch size: 1 3 4
Loss 1.2384 1.0684 1.0246
TABLE II: Loss in function of the number of images per batch
Then, we compared the number of augmentation needed per
image, as presented in Table III. We observe that increasing
the number of augmentation per image makes the dataset more
different leading to a small increase of the loss on the train
data but a large decrease on the validation data.
Number of augmentations Between 0 and 2 Between 0 and 4
Loss 1.0246 1.0388
val loss 1.1712 1.0477
val mask loss 0.1828 0.1721
TABLE III: Loss on the train and the validation data in function of
the number of augmentation
Afterwards, to try to reduce the training running time
without considerably increasing the loss, we tested to train
with or without a scaling factor of 2. The results in Table IV
show that the scaling factor is necessary to decrease the loss
even though the suppression of scaling factor allow to divide
the training time by almost a factor of 2.
Scaling factor 1x (no rescaling) 2x
Loss 1.2750 1.0246
TABLE IV: Loss in function of the scaling factor
Eventually, we added an option in the yeast class configu-
ration to select a pre-processing of the images before running
the train. The different pre-processing available (changing the
contrast, the luminosity or both) have been tested. Although
it allowed us to better visualize the yeast cells by eye, we can
observe that it did not decrease the loss. See Table V.
Therefore, following these steps, we were able to optimize
our model for the yeast dataset leading to the results described
below in Table VI.