Computers, materials & continua/Computers, materials & continua (Print), Год журнала: 2024, Номер 80(2), С. 1985 - 2003
Опубликована: Янв. 1, 2024
The task of food image recognition, a nuanced subset fine-grained grapples with substantial intra-class variation and minimal inter-class differences. These challenges are compounded by the irregular multi-scale nature images. Addressing these complexities, our study introduces an advanced model that leverages multiple attention mechanisms multi-stage local fusion, grounded in ConvNeXt architecture. Our employs hybrid (HA) to pinpoint critical discriminative regions within images, substantially mitigating influence background noise. Furthermore, it fusion (MSLF) module, fostering long-distance dependencies between feature maps at varying stages. This approach facilitates assimilation complementary features across scales, significantly bolstering model's capacity for extraction. we constructed dataset named Roushi60, which consists 60 different categories common meat dishes. Empirical evaluation ETH Food-101, ChineseFoodNet, Roushi60 datasets reveals achieves recognition accuracies 91.12%, 82.86%, 92.50%, respectively. figures not only mark improvement 1.04%, 3.42%, 1.36% over foundational network but also surpass performance most contemporary methods. Such advancements underscore efficacy proposed navigating intricate landscape setting new benchmark field.
Язык: Английский