Authors: Joshua Placidi, Sara Sabzikari, Vincenzo Incutti, Ka Yeon Kim
This is a project originally built in 12 hours for a Biology + Generative Artifical Intelligence Hackathon. We trained a variational auto-encoder to learn a latent space represenation of the physiology of mushrooms, the fruiting body of fungi.
It has been estimated that more than 90% of all fungal species have yet to be described by science . We built and trained a VAE from scratch to synthesis what new, yet undiscovered, mushrooms could look like. This project was built as a fun exploration into how an auto-encoder model learns to represent images of mushrooms in a latent space. The culmination of our work can be seen in the gifs at the top of the page.
In our project, we delved into the intriguing world of fungi, specifically focusing on the physiology of their fruiting bodies - mushrooms.
We recognized that mushrooms, as the visible fruiting body of fungi, offered a more accessible means of differentiating mushroom-producing species compared to solely examining mycelium/mycelial networks. Some distinctive physical characteristics are:
- shape
- color
- gill type
These are valuable markers for identifying and categorizing different species. However, relying solely on morphology for fungal classification may overlook substantial biological information inherent in these organisms. Therefore, acknowledging the limitations of morphology-based classification, we recognise incorporation of genomic, environmental and other data would provide a more accurate means of classifying mushrooms, and exploring viable hypothetical species in the latent space.
We demonstrate the feasibility and potential of using the latent space representation of physiological variables as a proof of concept. This approach opens up exciting possibilities for exploring unknown species and broadening our understanding of the diverse world of fungi.
VAEs learn in a self-supervised manner to predict their own input, given an input
To generate new samples,
We used two datasets:
- Mushroom Common Genus: a dataset containing images from 7 mushroom genera
- Danish Fungi Dataset: a large dataset of over 100,000 images of danish mushrooms
We ran a pretraining cycle using the Danish Fungi dataset, and then finetuned the model on the Mushroom Common Genus dataset. We used the following hyperparameters:
- batch_size: 64
- initial_learning_rate:
$1 \times 10^{-4}$ - num_epochs: 20
- split_ratio: 0.9