Project 7 – Deep learning methods for species classification from seed images: AI applications in bioimaging
Applying for Summer 2025
Supervisors: Dr Jose De Vega, Dr Felix Shaw and Jonathan Ashworth
School/Institute: Earlham Institute
Introduction: Our lab aims to explore computer vision methods that can classify each seed in a mixed package based on its morphological traits. This effort is part of a larger project in our lab to evaluate the quality of the native flower seed market and to assess the fitness of these seeds when used for creating new habitats.
We conducted a proof of concept using supervised “classic” learning models (Ashworth et al., in press). We extracted over 50 numerical features based on each seed's size, shape, texture, and colour using a non-ML ImageJ plugin for image segmentation. Subsequently, we predicted the species using gradient boosting, random forest, or MLP classifiers, achieving an approximate F1 score and accuracy of 90%. The models are also capable of disregarding seeds from species that the model has never encountered during training (open-set classification).
Aims and Objectives: The objective of this studentship is to build upon the previous results and streamline the classification by utilising deep learning models to segment images and automatically classify the masks (seeds). This means employing models for image analysis rather than relying on tabulated methods.
The dataset comprises images of seeds laid out on a flat surface, featuring between 50 and 500 seeds in each image. These images showcase either mixed or single species and encompass the top approximately 10 wildflower species commercialised in the UK. Altogether, they represent around 150,000 seeds or instances.
The student will explore different DL approaches that combine segmentation and classification of instances (e.g. Mask R-CNN, YOLACT, SOLOv2) from images directly. The student will use the existing imageJ pipeline to obtain “ground truth” masks for training and combine semi-automatic annotation tools from the image dataset.
This project aims to compare seeds and cells, potentially making significant contributions to bioimaging research. It intends to establish a benchmark for analysing phenotypic patterns, which will assist in the effective and automated monitoring and treatment of cancer.
Skills gained and required: The student will have access to a comprehensive image dataset, modern HPC and GPU infrastructure at Earlham Institute, and interaction with other researchers in the field. We expect the student to work most days a week at the institute. The student will receive guidance to explore and advance the project but will be given a good degree of independence to explore different options. They will contribute to and make a real impact on an ongoing project in our lab.
This project is intended for a student with a basic understanding of Python programming and machine learning who wants to enhance their research and biocomputational analysis skills. The project will train participants in essential bioimaging and data analysis skills that will be useful in any professional setting. It will cover transversal skills, including randomisation, replication, statistical power, cross-validation, and other related competencies. Additionally, the project provides a fundamental introduction to machine learning. The student will generate more data based on their preferred research questions and experimental design. This project is 100% computational biology, there is no lab benchwork.