Sound Event Detection Based on Mel Spectral Envelope Estimation and Regression Detection DOI

Maocun Tian,

Ruwei Li,

Weidong An

et al.

2022 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Journal Year: 2023, Volume and Issue: 10, P. 1 - 5

Published: Nov. 14, 2023

Binary metrics is employed in traditional deep learning methods of sound event detection(SED) to determine the presence or absence an event. However, these binary activity inadequately characterize nuanced states events, which limiting performance current detection algorithms, particularly scenarios involving overlaps. Concurrently, conventional algorithms suffer from sluggish speeds, resulting substantial temporal costs. To solve above problems, a novel algorithm based on amplitude envelope estimation and regression detection(EERD) proposed this paper. In algorithm, firstly Mel Frequency Cepstrum Coefficient(MFCC) audio signal estimatied, thereby enhanced information concerning events obtained. Secondly, regression-based introduced into network model, so that algorithm's reliance post-processing reduced concomitantly speed improved. Empirical validation conducted TUT dataset. Experiments show paper attains superior F-measure for contrast benchmark hence heightened substantiated. At same time, achieved at least sixfold faster than segmentation-by-class approach.

Language: Английский

Computational bioacoustics with deep learning: a review and roadmap DOI Creative Commons
Dan Stowell

PeerJ, Journal Year: 2022, Volume and Issue: 10, P. e13152 - e13152

Published: March 21, 2022

Animal vocalisations and natural soundscapes are fascinating objects of study, contain valuable evidence about animal behaviours, populations ecosystems. They studied in bioacoustics ecoacoustics, with signal processing analysis an important component. Computational has accelerated recent decades due to the growth affordable digital sound recording devices, huge progress informatics such as big data, machine learning. Methods inherited from wider field deep learning, including speech image processing. However, tasks, demands data characteristics often different those addressed or music analysis. There remain unsolved problems, tasks for which is surely present many acoustic signals, but not yet realised. In this paper I perform a review state art learning computational bioacoustics, aiming clarify key concepts identify analyse knowledge gaps. Based on this, offer subjective principled roadmap learning: topics that community should aim address, order make most future developments AI informatics, use audio answering zoological ecological questions.

Language: Английский

Citations

197

A review of automatic recognition technology for bird vocalizations in the deep learning era DOI Open Access
Jiangjian Xie,

Yujie Zhong,

Junguo Zhang

et al.

Ecological Informatics, Journal Year: 2022, Volume and Issue: 73, P. 101927 - 101927

Published: Nov. 25, 2022

Language: Английский

Citations

62

You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection DOI Creative Commons
Satvik Venkatesh, David Moffat, Eduardo Reck Miranda

et al.

Applied Sciences, Journal Year: 2022, Volume and Issue: 12(7), P. 3293 - 3293

Published: March 24, 2022

Audio segmentation and sound event detection are crucial topics in machine listening that aim to detect acoustic classes their respective boundaries. It is useful for audio-content analysis, speech recognition, audio-indexing, music information retrieval. In recent years, most research articles adopt segmentation-by-classification. This technique divides audio into small frames individually performs classification on these frames. this paper, we present a novel approach called You Only Hear Once (YOHO), which inspired by the YOLO algorithm popularly adopted Computer Vision. We convert of boundaries regression problem instead frame-based classification. done having separate output neurons presence an class predict its start end points. The relative improvement F-measure YOHO, compared state-of-the-art Convolutional Recurrent Neural Network, ranged from 1% 6% across multiple datasets detection. As YOHO more end-to-end has fewer predict, speed inference at least 6 times faster than addition, as predicts directly, post-processing smoothing about 7 faster.

Language: Английский

Citations

37

Unsupervised classification to improve the quality of a bird song recording dataset DOI Creative Commons

Félix Michaud,

Jérôme Sueur, Maxime Le Cesne

et al.

Ecological Informatics, Journal Year: 2022, Volume and Issue: 74, P. 101952 - 101952

Published: Dec. 12, 2022

Language: Английский

Citations

22

SILIC: A cross database framework for automatically extracting robust biodiversity information from soundscape recordings based on object detection and a tiny training dataset DOI Creative Commons
Shih-Hung Wu,

Hsueh‐Wen Chang,

Ruey‐Shing Lin

et al.

Ecological Informatics, Journal Year: 2021, Volume and Issue: 68, P. 101534 - 101534

Published: Dec. 20, 2021

Passive acoustic monitoring (PAM) offers many advantages comparing with other survey methods and gains an increasing use in terrestrial ecology, but the massive effort needed to extract species information from a large number of recordings limits its application. The convolutional neural network (CNN) has been demonstrated high performance effectiveness identifying sound sources automatically. However, requiring amount training data still constitutes challenge. Object detection is used detect multiple objects photos or videos effective at detecting small complex context, such as animal sounds spectrogram shows opportunity build good model dataset. Therefore, we developed Sound Identification Labeling Intelligence for Creatures (SILIC), which integrates online databases, PAM databases object detection-based model, extracting on soundscape recordings. We six owl Taiwan demonstrate effectiveness, efficiency application potential SILIC framework. Using only 786 labels 133 recordings, our successfully identified species' collected five stations, macro-average AUC 0.89 mAP 0.83. also provided time frequency information, duration bandwidth, sounds. To best knowledge, this first that algorithm identify wildlife species. With sound-labeling platform embedded novel preprocessing approach (i.e., rainbow mapping) applied, robust species, based tiny dataset acquired existing databases. can help expand tool evaluate state change biodiversity by, example, providing temporal resolution continuous presence across network.

Language: Английский

Citations

18

On the role of audio frontends in bird species recognition DOI Creative Commons
Houtan Ghaffari, Paul Devos

Ecological Informatics, Journal Year: 2024, Volume and Issue: 81, P. 102573 - 102573

Published: March 26, 2024

Automatic acoustic monitoring of bird populations and their diversity is in demand for conservation planning. This requirement recent advances deep learning have inspired sophisticated species recognizers. However, there are still open challenges creating reliable systems natural habitats. One many questions whether predominantly used audio features like mel-filterbanks appropriate such analysis since design follows human's perception the sound, making them susceptible to discarding fine details from other animals' vocalization. Although research shows that different work better particular tasks datasets, it hard attribute all advantages input experimental setups vary. A general solution a learnable frontend extract task-relevant raw waveform contains information features. The current paper thoroughly analyzes role frontends recognition, which helped evaluate adequacy traditional time-frequency representations (static frontends) capturing relevant In particular, this main performance gain comes normalization compression operations rather than data-driven frequency selectivity functional form filters. We observed no significant discrepancy between bands learned static was much higher, we will show adequate enhance accuracy by more 16% achieve comparable results recognition. Ablation studies under configurations detailed noise robustness provide evidence conclusions, validate use similar prior works, guidelines designing future code available at https://github.com/houtan-ghaffari/bird-frontends.

Language: Английский

Citations

1

Deep Learning for Recognizing Bat Species and Bat Behavior in Audio Recordings DOI
Markus Vogelbacher, Hicham Bellafkir, Jannis Gottwald

et al.

Published: June 1, 2023

Monitoring and mitigating the continuous decline of biodiversity is a key global challenge to preserve existential basis human life. Bats as one most widespread species among terrestrial mammals are excellent indicators for hence health an ecosystem. Typically, bats monitored by analyzing ultrasonic sound recordings. Stateof-the-art deep learning approaches automatic bat detection recognition commonly rely on audio spectrogram classification models based fixed time segments, lacking exact call boundaries. While great progress has been made using echolocation calls, little attention paid behavior that provides valuable additional information about populations. In this paper, we present novel end-to-end approach neural network object detection. contrast state-of-the-art approaches, presented model accurate It recognizes 19 distinguishes between three different behaviors: orientation (echolocation calls), hunting (feeding buzzes), social (social calls). Our experiments with two data sets show our method clearly outperforms previous recognition, achieving up 86.2% mean average precision. also performs very well reaching 98.4%, 98.3%, 95.6% precision recognizing feeding buzzes, respectively.

Language: Английский

Citations

1

NEAL: an open-source tool for audio annotation DOI Creative Commons
Anthony Gibbons, Ian Donohue, Courtney E. Gorman

et al.

PeerJ, Journal Year: 2023, Volume and Issue: 11, P. e15913 - e15913

Published: Aug. 25, 2023

Passive acoustic monitoring is used widely in ecology, biodiversity, and conservation studies. Data sets collected via are often extremely large built to be processed automatically using artificial intelligence machine learning models, which aim replicate the work of domain experts. These being supervised algorithms, need trained on high quality annotations produced by Since experts resource-limited, a cost-effective process for annotating audio needed get maximal use out data. We present an open-source interactive data annotation tool, NEAL (Nature+Energy Audio Labeller). Built R associated Shiny framework, tool provides reactive environment where users can quickly annotate files adjust settings that change corresponding elements user interface. The app has been designed with goal having both expert birders citizen scientists contribute projects. popularity flexibility programming bioacoustics means modified other bird labelling sets, or even generic tasks. demonstrate from wind farm sites across Ireland.

Language: Английский

Citations

1

Sound Event Detection Based on Mel Spectral Envelope Estimation and Regression Detection DOI

Maocun Tian,

Ruwei Li,

Weidong An

et al.

2022 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Journal Year: 2023, Volume and Issue: 10, P. 1 - 5

Published: Nov. 14, 2023

Binary metrics is employed in traditional deep learning methods of sound event detection(SED) to determine the presence or absence an event. However, these binary activity inadequately characterize nuanced states events, which limiting performance current detection algorithms, particularly scenarios involving overlaps. Concurrently, conventional algorithms suffer from sluggish speeds, resulting substantial temporal costs. To solve above problems, a novel algorithm based on amplitude envelope estimation and regression detection(EERD) proposed this paper. In algorithm, firstly Mel Frequency Cepstrum Coefficient(MFCC) audio signal estimatied, thereby enhanced information concerning events obtained. Secondly, regression-based introduced into network model, so that algorithm's reliance post-processing reduced concomitantly speed improved. Empirical validation conducted TUT dataset. Experiments show paper attains superior F-measure for contrast benchmark hence heightened substantiated. At same time, achieved at least sixfold faster than segmentation-by-class approach.

Language: Английский

Citations

0