Table of Contents

This post introduces a paper that proposed a deep learning technology GAN-DL1 witten by Mascolini et. al.

(Figures in this post are from the paper1)

Overview of this paper

  • Authors proposed a self-supervised learning framework called “Generative Adversarial Network Discriminator Learner (GAN-DL)1” based on NVIDIA’s StyleGAN2 architecture2.
  • GAN-DL was trained by using a public dataset RxRx19a3 consist of fluorescence microscopy images of human cells infected COVID-19.
  • Features of trained GAN-DL’s discriminator were applied to some downstream tasks like classification.
  • Methods of some previous research were used to evaluate GAN-DL’s performance.

Which points are better than others?

  • No image annotations required.
    • Some related works4 5 need labels of a target dataset to generate their embedding data.
    • On the other hand, GAN-DL works without labels and annotations.
  • Self-supervised learning.
    • Baseline (mentioned later) is based on traditional transfer learning.
    • GAN-DL is a self-supervised leaning framework.

img

img

Which points are important?

  • StyleGan2 learning was used for pre-text task with RxRx19a dataset.
  • Features of the discriminator gave a new representation space that was used to solve some downstream tasks.

Details of GAN-DL network

  • GAN-DL is a instance of StyleGAN2.
    • Fully connected mapping network was simplified to reduce original 8 layers to 3 layers.
    • The style vector that size is 512 is used for the latent space.
    • The filter size of the convolutional layer clocest to the image of both networks to 5 (input images are 5 channels, but RGB channels).
  • Training time:
    • 24 hours on TPU v3-8 node with 16GB of RAM per core.
    • 48 hours on a single Tesla V100.

Compaired counterparts

Baseline

  • A transfer learning-based image embedding for RxRx19a by Cuccarese et al.6 was used as a evaluation baseline in their paper by GAN-DL authors1.
  • Its embedding (1024-dimensional vectors, one vector per image) is generated by using DenseNet7 CNN architecture pre-trained by ImageNet8.
    • The initial convolutional layer’s is resized to 512 x 512 x 5.
    • Global avarage pooling is used to edtract the final feature map to vector of length 2208.
    • A fully connected layer of dimension 1024 is added as the embedding of the image.
    • Softmax activation and ArcFace Activation9 are used for two separate classification layers in the embedding layer.
  • Baseline was trained binitialy using RxRx1 dataset10 which is about 300GB of annotated microsopy images.
  • Its downstream task is do something on RxRx19a dataset.
  • Baseline’s embedding data is available, but source code and trained model are not.
  • GAN-DL authors1 used the available embedded data6 as the baseline in this paper.

DenseNet CNN

  • GAN-DL authors1 implemented other DenseNet CNN network.
  • DenseNet can accept RGB image but RxRx1’s image has 6 channels and RxRx19a is 5 channels. Therefore authors1 took two ways to reproduce baseline’s architecuture6:
    1. The ImageNet-collapsed strategy: a trainable convolutional layer (kernel size of 1) is added as the first layer of the RGB pre-trained networks, which can reduce the channel number of fluorescent image to 1 channel gray scale image, then 1 channel data is duplicated to each three channel to generate pseudo-RGB images. Added layer is learnt via fine-tuning on the given downstream task (Adam optimizer, learning rate 0.001, only a few epochs).
    2. The ImageNet-concatenated strategy: Each channel of fluorescent image is separated to 1 channel images then they are input to the network respectively. Embedding size is 5120 (1024 x 5).

Convolutional autoencoder (ConvAE11)

  • ConvAE was trained on RxRx19a.
  • Authors1 implemented this network reffering to Wallace et al12.
  • Architecture was modifid by adding two something below:
    1. The residual connection scheme used in the StyleGan2’s generator.
    2. A perceptual loss function generated by using ResNet5013 pre-trained by ImageNet.
  • Last layer is used as the embedding (size is 1024).

Preparation of RxRx19a dataset

  • Two control groups were created from RxRx19a dataset which has 305,520 flourescent microscopy images (height: 1024, width: 1024, 5 channels).
    1. Conditioned media preparations generated from uninfected cells (Mock).
    2. One is made up of cells infected in vitro by active SARS-CoV-2 virus and not treated with any compounds.
  • 75% of the control images used for training and 25% were for testing (randomely splited).
  • Imbalances in class were normalized by using inversely proportional to class frequencies (it’s mentiond about number of images in class?).
  • Outside of control group used for dose-response evaluation, but they wrer not used in the training of the downstream tasks.
  • Authors used standard post processing6 including nomalization to remove inter-plate variance for RxRx19a and RxRx1.

Evaluate ability to solve downstream tasks

1. Controls classification and 3. Cell models clasisification

A Linear support vector machine (SVN) is applied to the classification task using the GAN-DL’s style vector and competitors’ embeddings.

  • Baseline is the best performance.
  • GAN-DL performs slightly less well than baseline, but much better than DenseNet and ConvAE.

2. Dose-response modelling

  • Omitting.

Zero-shot representation learning

  • GAN-DL’s embedding learnt on RxRx19a was applied to a zero-shot representation learning task on RxRx1.
  • Baseline was not used for this experiment because it was pre-trained by using RxRx1.
  • A soft margin linear SVM was built on the top of GAN-DL’s embedding (what do they say?).
  • SVM classified input RxRx1 data to 4 classes.
  • GAN-DL showed better performance than DenseNet and ConvAE except for human umbilical vein endothelial cells (HUVEC) images.

What was being discussed?

  • GAN-DL authors 1 mentioned that baseline is generally more accurate than proposed method GAN-DL in the classification task. However it was stated that GAN-DL’s re-usability to other tasks is advantage. Baseline was trained RxRx1 big dataset with its labels, while GAN-DL does not need any annotations and labels in training phase.
  • Authors guess ConvAE’s representation capability is less than DenseNet because autoencorder’s ability to generate high quality images is limited.

Is source code aveilable?

  • Not available.
  • But authors has pablished embedding data that is a result of GAN-DL.

Is dataset available?

  • Yes.

What I learned

  • NVIDIA’s StyleGan22 is a Wasserstein Generative Adcersarial Networks (W-GANs)14 family.
  • Goldsborough et al. published a first work15 applying a self-supervised representation learning (SSRL) tasks to biological images (flourescence microscopy) but results were not good, according to the authors1.
  • The original idea using GAN’s discriminator as a feature extractor is showed by Radford et al.16.
  • It was proposed that StyleGAN2 can be resistant to mode collapse phenomenon14 17.
  • W-GANs can be resistant to two problems of training GANs.
    1. Mode collapse
    • GAN network learns only a subset of the data.
    • Collapsed distribution generates a single image or a discrete set of images.
    • It means that model is heavily over-fitted on the paticular subset.
    • The discriminator is traped in a local minimum and the generator generates same images for this discriminator.
    1. Lack of convergence
    • The speed of improvement of ether the generator or the discriminator is faster too much than other network, which prevents the mutual improvement.
  • W-GANs can reduce these problems to replace the classical discriminator model with a Wasserstein distance based one that scores the realness of a given image.
  • StyleGAN2 is a instance of W-GAN and apply residual connections in both networks.
  • The capability to generate high quality images (= to extract train data’s features well) would lead to the one to solve other downstream tasks when its pre-trained features apply to downstream tasks.
  • StyleGan22
  • W-GAN14
  • Goldsborough et al. (the first paper thet applied a self-supervised representation learning tasks to biological images)15

  1. Alessio Mascolini, Dario Cardamone, Francesco Ponzio, Santa Di Cataldo and Elisa Ficarra. Exploiting generative self-supervised learning for the assessment of biological images with lack of annotations. BMC Bioinformatics, 23, 295, 2022. doi.org/10.1186/s12859-022-04845-1↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎

  2. Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen and Timo Aila. Analyzing and Improving the Image Quality of StyleGAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8110–19, 2020. doi:10.1109/CVPR42600.2020.00813 or arXiv:1912.04958. (Github repository). ↩︎ ↩︎ ↩︎

  3. Recursion. RxRx19a dataset. https://www.rxrx.ai/rxrx19a, 2020. ↩︎

  4. Dylan Zhuangand Ali K. Ibrahim. Deep Learning for Drug Discovery: A Study of Identifying High Efficacy Drug Compounds Using a Cascade Transfer Learning Approach. Applied Sciences. 2021;11(17):7772, 2021. doi:10.3390/app11177772↩︎

  5. M. Sadegh Saberian, Kathleen P. Moriarty, Andrea D. Olmstead, Christian Hallgrimson, François Jean, Ivan R. Nabi, Maxwell W. Libbrecht and Ghassan Hamarneh. DEEMD: Drug Efficacy Estimation Against SARS-CoV-2 Based on Cell Morphology With Deep Multiple Instance Learning. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023, vol.14227, pp.676, 2023. doi:10.1109/TMI.2022.3178523 ↩︎

  6. Michael F. Cuccarese et al. Functional immune mapping with deep-learning enabled phenomics applied to immunomodulatory and COVID-19 drug discovery, BioRxiv 2020. doi:10.1101/2020.08.02.233064↩︎ ↩︎ ↩︎ ↩︎

  7. Gao Huang, Zhuang Liu, Laurens van der Maaten and Kilian Q. Weinberger. Densely Connected Convolutional Networks. arXiv preprint arxiv:1608.06993. (Github repository). ↩︎

  8. ImageNet. https://www.image-net.org/↩︎

  9. Jiankang Deng, Jia Guo, Niannan Xue and Stefanos Zafeiriou. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.4685-4694, 2019. doi:10.1109/CVPR.2019.00482↩︎

  10. Recursion. RxRx1 dataset. https://www.rxrx.ai/rxrx1, 2019. ↩︎

  11. Dong Jin Ji, Jinsol Park and Dong-Ho Cho. ConvAE: A New Channel Autoencoder Based on Convolutional Layers and Residual Connections. IEEE Communications Letters, vol.23, no.10, pp.1769-1772, 2019. doi:10.1109/LCOMM.2019.2930287↩︎

  12. Bram Wallace and Bharath Hariharan. Extending and Analyzing Self-Supervised Learning Across Domains. Computer Vision – ECCV 2020. Lecture Notes in Computer Science, vol.12371, Springer, 2020. doi:/10.1007/978-3-030-58574-7_43 ↩︎

  13. Gustav Grund Pihlgren, Fredrik Sandin and Marcus Liwicki. Improving Image Autoencoder Embeddings with Perceptual Loss. arXiv preprint arxiv:2001.03444↩︎

  14. Martin Arjovsky, Soumith Chintala and Léon Bottou. Wasserstein GAN. In Proceedings of the 34th International Conference on Machine Learning, PMLR 70:214-223, 2017. link or arXiv:1701.07875 ↩︎ ↩︎ ↩︎

  15. Peter Goldsborough, Nick Pawlowski, Juan C Caicedo, Shantanu Singh and Anne E Carpenter. CytoGAN: Generative Modeling of Cell Images. BioRxiv 2017. doi:10.1101/227645↩︎ ↩︎

  16. Alec Radford, Luke Metz and Soumith Chintala. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv preprint arxiv:1511.06434↩︎

  17. Mingyang Zhang, Maoguo Gong, Yishun Mao, Jun Li and Yue Wu. Unsupervised Feature Extraction in Hyperspectral Images Based on Wasserstein Generative Adversarial Network. IEEE Transactions on Geoscience and Remote Sensing, vol.57, no.5, pp.2669-2688, 2019. doi:10.1109/TGRS.2018.2876123↩︎