LeBenchmark 2.0: A standardized, replicable and enhanced framework for self-supervised representations of French speech DOI
Titouan Parcollet, Ha H. Nguyen,

Solène Evain

и другие.

Computer Speech & Language, Год журнала: 2024, Номер 86, С. 101622 - 101622

Опубликована: Фев. 3, 2024

Язык: Английский

Ten deep learning techniques to address small data problems with remote sensing DOI Creative Commons
Anastasiia Safonova, Gohar Ghazaryan, Stefan Stiller

и другие.

International Journal of Applied Earth Observation and Geoinformation, Год журнала: 2023, Номер 125, С. 103569 - 103569

Опубликована: Ноя. 18, 2023

Researchers and engineers have increasingly used Deep Learning (DL) for a variety of Remote Sensing (RS) tasks. However, data from local observations or via ground truth is often quite limited training DL models, especially when these models represent key socio-environmental problems, such as the monitoring extreme, destructive climate events, biodiversity, sudden changes in ecosystem states. Such cases, also known small pose significant methodological challenges. This review summarises challenges RS domain possibility using emerging techniques to overcome them. We show that problem common challenge across disciplines scales results poor model generalisability transferability. then introduce an overview ten promising techniques: transfer learning, self-supervised semi-supervised few-shot zero-shot active weakly supervised multitask process-aware ensemble learning; we include validation technique spatial k-fold cross validation. Our particular contribution was develop flowchart helps users select which use given by answering few questions. hope our article facilitate applications tackle societally important environmental problems with reference data.

Язык: Английский

Процитировано

70

Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects DOI
Kexin Zhang, Qingsong Wen, Chaoli Zhang

и другие.

IEEE Transactions on Pattern Analysis and Machine Intelligence, Год журнала: 2024, Номер 46(10), С. 6775 - 6794

Опубликована: Апрель 10, 2024

Self-supervised learning (SSL) has recently achieved impressive performance on various time series tasks. The most prominent advantage of SSL is that it reduces the dependence labeled data. Based pre-training and fine-tuning strategy, even a small amount data can achieve high performance. Compared with many published self-supervised surveys computer vision natural language processing, comprehensive survey for still missing. To fill this gap, we review current state-of-the-art methods in article. end, first comprehensively existing related to series, then provide new taxonomy by summarizing them from three perspectives: generative-based, contrastive-based, adversarial-based. These are further divided into ten subcategories detailed reviews discussions about their key intuitions, main frameworks, advantages disadvantages. facilitate experiments validation methods, also summarize datasets commonly used forecasting, classification, anomaly detection, clustering Finally, present future directions analysis.

Язык: Английский

Процитировано

55

Machine Learning Empowering Personalized Medicine: A Comprehensive Review of Medical Image Analysis Methods DOI Open Access
Irena Galić, Marija Habijan, Hrvoje Leventić

и другие.

Electronics, Год журнала: 2023, Номер 12(21), С. 4411 - 4411

Опубликована: Окт. 25, 2023

Artificial intelligence (AI) advancements, especially deep learning, have significantly improved medical image processing and analysis in various tasks such as disease detection, classification, anatomical structure segmentation. This work overviews fundamental concepts, state-of-the-art models, publicly available datasets the field of imaging. First, we introduce types learning problems commonly employed then proceed to present an overview used methods, including convolutional neural networks (CNNs), recurrent (RNNs), generative adversarial (GANs), with a focus on task they are solving, object detection/localization, segmentation, generation, registration. Further, highlight studies conducted application areas, encompassing neurology, brain imaging, retinal analysis, pulmonary digital pathology, breast cardiac bone abdominal musculoskeletal The strengths limitations each method carefully examined, paper identifies pertinent challenges that still require attention, limited availability annotated data, variability images, interpretability issues. Finally, discuss future research directions particular developing explainable methods integrating multi-modal data.

Язык: Английский

Процитировано

34

Comparative Layer-Wise Analysis of Self-Supervised Speech Models DOI Open Access

Ankita Pasad,

Bowen Shi, Karen Livescu

и другие.

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Год журнала: 2023, Номер unknown

Опубликована: Май 5, 2023

Many self-supervised speech models, varying in their pre-training objective, input modality, and data, have been proposed the last few years. Despite impressive successes on downstream tasks, we still a limited understanding of properties encoded by models differences across models. In this work, examine intermediate representations for variety recent Specifically, measure acoustic, phonetic, word-level individual layers, using lightweight analysis tool based canonical correlation (CCA). We find that these evolve layers differently depending model, variations relate to choice objective. further investigate utility our analyses tasks comparing property trends with performance recognition spoken language tasks. discover CCA provide reliable guidance choose interest single-layer often matches or improves upon all suggesting implications more efficient use pre-trained 1

Язык: Английский

Процитировано

31

What Do Self-Supervised Speech Models Know About Words? DOI Creative Commons

Ankita Pasad,

Chung-Ming Chien,

Shane Settle

и другие.

Transactions of the Association for Computational Linguistics, Год журнала: 2024, Номер 12, С. 372 - 391

Опубликована: Янв. 1, 2024

Abstract Many self-supervised speech models (S3Ms) have been introduced over the last few years, improving performance and data efficiency on various tasks. However, these empirical successes alone do not give a complete picture of what is learned during pre-training. Recent work has begun analyzing how S3Ms encode certain properties, such as phonetic speaker information, but we still lack proper understanding knowledge encoded at word level beyond. In this work, use lightweight analysis methods to study segment-level linguistic properties—word identity, boundaries, pronunciation, syntactic features, semantic features—encoded in S3Ms. We present comparative layer-wise representations from ten find that (i) frame-level within each segment are all equally informative, (ii) pre-training objective model size heavily influence accessibility distribution information across layers. also several tasks—word discrimination, segmentation, sentence similarity—S3Ms trained with visual grounding outperform their speech-only counterparts. Finally, our task-based analyses demonstrate improved segmentation acoustic discrimination while using simpler than prior work.1

Язык: Английский

Процитировано

12

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS DOI Open Access
Yifan Yang,

Feiyu Shen,

Chenpeng Du

и другие.

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Год журнала: 2024, Номер unknown

Опубликована: Март 18, 2024

Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into utilizing discrete tokens for speech like recognition and translation, which offer lower storage requirements great potential to employ natural language processing techniques. However, these studies, mainly single-task focused, faced challenges overfitting performance degradation tasks, often at the cost of sacrificing multi-task scenarios. This study presents a comprehensive comparison optimization generated by various leading SSL models synthesis tasks. We aim explore universality across multiple Experimental results demonstrate that achieve comparable against systems trained on FBank features outperform mel-spectrogram subjective objective metrics. These findings suggest universal have enormous Our work is open-source publicly available https://github.com/k2-fsa/icefall.

Язык: Английский

Процитировано

11

Exploring Efficient-Tuning Methods in Self-Supervised Speech Models DOI

Zih-Ching Chen,

Chin-Lun Fu,

Chih-Ying Liu

и другие.

2022 IEEE Spoken Language Technology Workshop (SLT), Год журнала: 2023, Номер unknown, С. 1120 - 1127

Опубликована: Янв. 9, 2023

In this study, we aim to explore efficient tuning methods for speech self-supervised learning. Recent studies show that learning (SSL) can learn powerful representations different tasks. However, fine-tuning pre-trained models each downstream task is parameter-inefficient since SSL are notoriously large with millions of parameters. Adapters lightweight modules commonly used in NLP solve problem. tasks, the parameters frozen, and only adapters trained. Given lack generally exploring effectiveness intend fill gap by adding various adapter models. We performance parity be achieved over 90% parameter reduction, discussed pros cons techniques. This first comprehensive investigation types across

Язык: Английский

Процитировано

21

Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study DOI
Joakim Edin, Alexander Junge, Jakob D. Havtorn

и другие.

Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Год журнала: 2023, Номер unknown, С. 2572 - 2582

Опубликована: Июль 18, 2023

Medical coding is the task of assigning medical codes to clinical free-text documentation. Healthcare professionals manually assign such track patient diagnoses and treatments. Automated can considerably alleviate this administrative burden. In paper, we reproduce, compare, analyze state-of-the-art automated machine learning models. We show that several models underperform due weak configurations, poorly sampled train-test splits, insufficient evaluation. previous work, macro F1 score has been calculated sub-optimally, our correction doubles it. contribute a revised model comparison using stratified sampling identical experimental setups, including hyperparameters decision boundary tuning. prediction errors validate falsify assumptions works. The analysis confirms all struggle with rare codes, while long documents only have negligible impact. Finally, present first comprehensive results on newly released MIMIC-IV dataset reproduced release code, parameters, new MIMIC-III training evaluation pipelines accommodate fair future comparisons.

Язык: Английский

Процитировано

21

Self-Supervised Spatio-Temporal Representation Learning of Satellite Image Time Series DOI Creative Commons
Iris Dumeur, Silvia Valero, Jordi Inglada

и другие.

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Год журнала: 2024, Номер 17, С. 4350 - 4367

Опубликована: Янв. 1, 2024

In this paper, a new self-supervised strategy for learning meaningful representations of complex optical Satellite Image Time Series (SITS) is presented. The methodology proposed, named U-BARN, Unet-BERT spAtio-temporal Representation eNcoder, exploits irregularly sampled SITS. designed architecture allows rich and discriminative features from unlabeled data, enhancing the synergy between spatio-spectral temporal dimensions. To train on time series reconstruction pretext task inspired by BERT but adapted to SITS proposed. A Sentinel-2 large-scale data-set used pre-train U-BARN. During pre-training, U-BARN processes annual composed maximum 100 dates. demonstrate its feature capability, encoded are then fed into shallow classifier generate semantic segmentation maps. Experimental results conducted labeled crop (PASTIS) as well dense land cover (MultiSenGE). Two ways exploiting pre-training considered: either weights frozen or fine-tuned. obtained that given more efficient classification than those supervised-trained linear layer. Then, we observe fine-tuning boosts performances MultiSenGE dataset. Additionally, PASTIS, in scenarios with scarce reference brings significative performance gain compared fully-supervised approaches. We also investigate influence percentage elements masked during quality representation. Eventually, show fully supervised reaches better spatio-temporal baseline (U-TAE) both downstream tasks: segmentation.

Язык: Английский

Процитировано

8

A review on subjective and objective evaluation of synthetic speech DOI Open Access
Erica Cooper, Wen-Chin Huang, Yu Tsao

и другие.

Nippon Onkyo Gakkaishi/Acoustical science and technology/Nihon Onkyo Gakkaishi, Год журнала: 2024, Номер 45(4), С. 161 - 183

Опубликована: Апрель 3, 2024

Evaluating synthetic speech generated by machines is a complicated process, as it involves judging along multiple dimensions including naturalness, intelligibility, and whether the intended purpose fulfilled. While subjective listening tests conducted with human participants have been gold standard for evaluation, its costly process design has also motivated development of automated objective evaluation protocols. In this review, we first provide historical view test methodologies, from early in-lab comprehension to recent large-scale crowdsourcing mean opinion score (MOS) tests. We then recap automatic measures, ranging signal-based metrics model-based approaches that utilize deep neural networks or even latest self-supervised learning techniques. describe VoiceMOS Challenge series, scientific event founded aims promote data-driven evaluation. Finally, insights into unsolved issues in field well future prospective. This review expected serve an entry point academic researchers enrich their knowledge field, synthesis practitioners catch up on developments.

Язык: Английский

Процитировано

8