Producing plankton classifiers that are robust to dataset shift DOI
Christine Chen, Sreenath P. Kyathanahally, Marta Reyes

et al.

Limnology and Oceanography Methods, Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 27, 2024

Abstract Modern plankton high‐throughput monitoring relies on deep learning classifiers for species recognition in water ecosystems. Despite satisfactory nominal performances, a significant challenge arises from dataset shift, which causes performances to drop during deployment. In our study, we integrate the ZooLake dataset, consists of dark‐field images lake (Kyathanahally et al. 2021a), with manually annotated 10 independent days deployment, serving as test cells benchmark out‐of‐dataset (OOD) performances. Our analysis reveals instances where classifiers, initially performing well in‐dataset conditions, encounter notable failures practical scenarios. For example, MobileNet 92% accuracy shows 77% OOD accuracy. We systematically investigate conditions leading performance drops and propose preemptive assessment method identify potential pitfalls when classifying new data, pinpoint features that adversely impact classification. present three‐step pipeline: (i) identifying degradation compared performance, (ii) conducting diagnostic causes, (iii) providing solutions. find ensembles BEiT vision transformers, targeted augmentations addressing robustness, geometric ensembling, rotation‐based test‐time augmentation, constitute most robust model, call BEsT . It achieves an 83% accuracy, errors concentrated container classes. Moreover, it exhibits lower sensitivity reproduces abundances. proposed pipeline is applicable generic contingent availability suitable cells. By critical shortcomings offering procedures fortify models against study contributes development more reliable classification technologies.

Language: Английский

Producing plankton classifiers that are robust to dataset shift DOI
Christine Chen, Sreenath P. Kyathanahally, Marta Reyes

et al.

Limnology and Oceanography Methods, Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 27, 2024

Abstract Modern plankton high‐throughput monitoring relies on deep learning classifiers for species recognition in water ecosystems. Despite satisfactory nominal performances, a significant challenge arises from dataset shift, which causes performances to drop during deployment. In our study, we integrate the ZooLake dataset, consists of dark‐field images lake (Kyathanahally et al. 2021a), with manually annotated 10 independent days deployment, serving as test cells benchmark out‐of‐dataset (OOD) performances. Our analysis reveals instances where classifiers, initially performing well in‐dataset conditions, encounter notable failures practical scenarios. For example, MobileNet 92% accuracy shows 77% OOD accuracy. We systematically investigate conditions leading performance drops and propose preemptive assessment method identify potential pitfalls when classifying new data, pinpoint features that adversely impact classification. present three‐step pipeline: (i) identifying degradation compared performance, (ii) conducting diagnostic causes, (iii) providing solutions. find ensembles BEiT vision transformers, targeted augmentations addressing robustness, geometric ensembling, rotation‐based test‐time augmentation, constitute most robust model, call BEsT . It achieves an 83% accuracy, errors concentrated container classes. Moreover, it exhibits lower sensitivity reproduces abundances. proposed pipeline is applicable generic contingent availability suitable cells. By critical shortcomings offering procedures fortify models against study contributes development more reliable classification technologies.

Language: Английский

Citations

1