Neurocomputing, Journal Year: 2024, Volume and Issue: unknown, P. 128637 - 128637
Published: Sept. 1, 2024
Language: Английский
Neurocomputing, Journal Year: 2024, Volume and Issue: unknown, P. 128637 - 128637
Published: Sept. 1, 2024
Language: Английский
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Journal Year: 2024, Volume and Issue: unknown, P. 911 - 915
Published: March 18, 2024
Sound event detection (SED) often suffers from the data deficiency problem. Recent SED systems leverage large pretrained self-supervised learning (SelfSL) models to mitigate such restriction, where help produce more discriminative features for SED. However, are regarded as a frozen feature extractor in most systems, and fine-tuning of has been rarely studied. In this work, we study method We introduce frame-level audio teacher-student transformer model (ATST-Frame), our newly proposed SelfSL model, system. ATST-Frame was especially designed representations signals obtained state-of-the-art (SOTA) performances on series downstream tasks. then propose using both (in-domain) unlabelled labelled data. Our experiments show that, overcomes overfitting problem when pre-trained network, system obtains new SOTA results 0.587/0.812 PSDS1/PSDS2 DCASE challenge task 4 dataset.
Language: Английский
Citations
7Ecological Informatics, Journal Year: 2025, Volume and Issue: unknown, P. 103010 - 103010
Published: Jan. 1, 2025
Language: Английский
Citations
0Digital Signal Processing, Journal Year: 2025, Volume and Issue: unknown, P. 105055 - 105055
Published: Feb. 1, 2025
Language: Английский
Citations
0Sensors, Journal Year: 2025, Volume and Issue: 25(8), P. 2591 - 2591
Published: April 19, 2025
Autonomous driving technologies for environmental perception are mostly based on visual cues obtained from sensors like cameras, RADAR, or LiDAR. They capture the environment as if seen through “human eyes”. If this information is complemented with auditory information, thereby also providing “ears”, driverless cars can become more reliable and safer. In paper, an Acoustic Event Detection model presented that detect various acoustic events in automotive context along their time of occurrence to create audio scene description. The proposed detection methodology uses pre-trained network Bidirectional Encoder representation Audio Transformers (BEATs) a single-layer neural trained database real recordings collected different cars. performance evaluated parameters datasets. segment-based results duration 1 s show performs well 11 sound classes mean accuracy 0.93 F1-Score 0.39 confidence threshold 0.5. threshold-independent metric mAP has value 0.77. mixtures containing two overlapping accuracy, F1-Score, equal 0.89, 0.42, 0.658, respectively.
Language: Английский
Citations
0IEEE/ACM Transactions on Audio Speech and Language Processing, Journal Year: 2024, Volume and Issue: 32, P. 3947 - 3959
Published: Jan. 1, 2024
Language: Английский
Citations
0Published: Oct. 26, 2024
Recent advances have been witnessed in audio-language joint learning, such as CLAP, that shows much success multi-modal understanding tasks.These models usually aggregate uni-modal local representations, namely frame or word features, into global ones, on which the contrastive loss is employed to reach coarse-grained cross-modal alignment.However, frame-level correspondence with texts may be ignored, making it ill-posed explainability and fine-grained challenges also undermine performances tasks.In this work, we aim improve both coarse-and alignment large-scale pre-training.To unify granularity latent distribution of two modalities, a shared codebook adopted represent features common bases, each codeword regularized encode modality-shared semantics, bridging gap between features.Based it, localityaware block involved purify patterns, hard-negative guided devised boost alignment.Experiments eleven zero-shot tasks suggest our model not only surpasses baseline CLAP significantly but yields superior competitive results compared current SOTA works.
Language: Английский
Citations
0Neurocomputing, Journal Year: 2024, Volume and Issue: unknown, P. 128637 - 128637
Published: Sept. 1, 2024
Language: Английский
Citations
0