Computer Speech & Language, Год журнала: 2024, Номер 86, С. 101622 - 101622
Опубликована: Фев. 3, 2024
Язык: Английский
Computer Speech & Language, Год журнала: 2024, Номер 86, С. 101622 - 101622
Опубликована: Фев. 3, 2024
Язык: Английский
International Journal of Applied Earth Observation and Geoinformation, Год журнала: 2023, Номер 125, С. 103569 - 103569
Опубликована: Ноя. 18, 2023
Researchers and engineers have increasingly used Deep Learning (DL) for a variety of Remote Sensing (RS) tasks. However, data from local observations or via ground truth is often quite limited training DL models, especially when these models represent key socio-environmental problems, such as the monitoring extreme, destructive climate events, biodiversity, sudden changes in ecosystem states. Such cases, also known small pose significant methodological challenges. This review summarises challenges RS domain possibility using emerging techniques to overcome them. We show that problem common challenge across disciplines scales results poor model generalisability transferability. then introduce an overview ten promising techniques: transfer learning, self-supervised semi-supervised few-shot zero-shot active weakly supervised multitask process-aware ensemble learning; we include validation technique spatial k-fold cross validation. Our particular contribution was develop flowchart helps users select which use given by answering few questions. hope our article facilitate applications tackle societally important environmental problems with reference data.
Язык: Английский
Процитировано
70IEEE Transactions on Pattern Analysis and Machine Intelligence, Год журнала: 2024, Номер 46(10), С. 6775 - 6794
Опубликована: Апрель 10, 2024
Self-supervised learning (SSL) has recently achieved impressive performance on various time series tasks. The most prominent advantage of SSL is that it reduces the dependence labeled data. Based pre-training and fine-tuning strategy, even a small amount data can achieve high performance. Compared with many published self-supervised surveys computer vision natural language processing, comprehensive survey for still missing. To fill this gap, we review current state-of-the-art methods in article. end, first comprehensively existing related to series, then provide new taxonomy by summarizing them from three perspectives: generative-based, contrastive-based, adversarial-based. These are further divided into ten subcategories detailed reviews discussions about their key intuitions, main frameworks, advantages disadvantages. facilitate experiments validation methods, also summarize datasets commonly used forecasting, classification, anomaly detection, clustering Finally, present future directions analysis.
Язык: Английский
Процитировано
55Electronics, Год журнала: 2023, Номер 12(21), С. 4411 - 4411
Опубликована: Окт. 25, 2023
Artificial intelligence (AI) advancements, especially deep learning, have significantly improved medical image processing and analysis in various tasks such as disease detection, classification, anatomical structure segmentation. This work overviews fundamental concepts, state-of-the-art models, publicly available datasets the field of imaging. First, we introduce types learning problems commonly employed then proceed to present an overview used methods, including convolutional neural networks (CNNs), recurrent (RNNs), generative adversarial (GANs), with a focus on task they are solving, object detection/localization, segmentation, generation, registration. Further, highlight studies conducted application areas, encompassing neurology, brain imaging, retinal analysis, pulmonary digital pathology, breast cardiac bone abdominal musculoskeletal The strengths limitations each method carefully examined, paper identifies pertinent challenges that still require attention, limited availability annotated data, variability images, interpretability issues. Finally, discuss future research directions particular developing explainable methods integrating multi-modal data.
Язык: Английский
Процитировано
34ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Год журнала: 2023, Номер unknown
Опубликована: Май 5, 2023
Many self-supervised speech models, varying in their pre-training objective, input modality, and data, have been proposed the last few years. Despite impressive successes on downstream tasks, we still a limited understanding of properties encoded by models differences across models. In this work, examine intermediate representations for variety recent Specifically, measure acoustic, phonetic, word-level individual layers, using lightweight analysis tool based canonical correlation (CCA). We find that these evolve layers differently depending model, variations relate to choice objective. further investigate utility our analyses tasks comparing property trends with performance recognition spoken language tasks. discover CCA provide reliable guidance choose interest single-layer often matches or improves upon all suggesting implications more efficient use pre-trained 1
Язык: Английский
Процитировано
31Transactions of the Association for Computational Linguistics, Год журнала: 2024, Номер 12, С. 372 - 391
Опубликована: Янв. 1, 2024
Abstract Many self-supervised speech models (S3Ms) have been introduced over the last few years, improving performance and data efficiency on various tasks. However, these empirical successes alone do not give a complete picture of what is learned during pre-training. Recent work has begun analyzing how S3Ms encode certain properties, such as phonetic speaker information, but we still lack proper understanding knowledge encoded at word level beyond. In this work, use lightweight analysis methods to study segment-level linguistic properties—word identity, boundaries, pronunciation, syntactic features, semantic features—encoded in S3Ms. We present comparative layer-wise representations from ten find that (i) frame-level within each segment are all equally informative, (ii) pre-training objective model size heavily influence accessibility distribution information across layers. also several tasks—word discrimination, segmentation, sentence similarity—S3Ms trained with visual grounding outperform their speech-only counterparts. Finally, our task-based analyses demonstrate improved segmentation acoustic discrimination while using simpler than prior work.1
Язык: Английский
Процитировано
12ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Год журнала: 2024, Номер unknown
Опубликована: Март 18, 2024
Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into utilizing discrete tokens for speech like recognition and translation, which offer lower storage requirements great potential to employ natural language processing techniques. However, these studies, mainly single-task focused, faced challenges overfitting performance degradation tasks, often at the cost of sacrificing multi-task scenarios. This study presents a comprehensive comparison optimization generated by various leading SSL models synthesis tasks. We aim explore universality across multiple Experimental results demonstrate that achieve comparable against systems trained on FBank features outperform mel-spectrogram subjective objective metrics. These findings suggest universal have enormous Our work is open-source publicly available https://github.com/k2-fsa/icefall.
Язык: Английский
Процитировано
112022 IEEE Spoken Language Technology Workshop (SLT), Год журнала: 2023, Номер unknown, С. 1120 - 1127
Опубликована: Янв. 9, 2023
In this study, we aim to explore efficient tuning methods for speech self-supervised learning. Recent studies show that learning (SSL) can learn powerful representations different tasks. However, fine-tuning pre-trained models each downstream task is parameter-inefficient since SSL are notoriously large with millions of parameters. Adapters lightweight modules commonly used in NLP solve problem. tasks, the parameters frozen, and only adapters trained. Given lack generally exploring effectiveness intend fill gap by adding various adapter models. We performance parity be achieved over 90% parameter reduction, discussed pros cons techniques. This first comprehensive investigation types across
Язык: Английский
Процитировано
21Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Год журнала: 2023, Номер unknown, С. 2572 - 2582
Опубликована: Июль 18, 2023
Medical coding is the task of assigning medical codes to clinical free-text documentation. Healthcare professionals manually assign such track patient diagnoses and treatments. Automated can considerably alleviate this administrative burden. In paper, we reproduce, compare, analyze state-of-the-art automated machine learning models. We show that several models underperform due weak configurations, poorly sampled train-test splits, insufficient evaluation. previous work, macro F1 score has been calculated sub-optimally, our correction doubles it. contribute a revised model comparison using stratified sampling identical experimental setups, including hyperparameters decision boundary tuning. prediction errors validate falsify assumptions works. The analysis confirms all struggle with rare codes, while long documents only have negligible impact. Finally, present first comprehensive results on newly released MIMIC-IV dataset reproduced release code, parameters, new MIMIC-III training evaluation pipelines accommodate fair future comparisons.
Язык: Английский
Процитировано
21IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Год журнала: 2024, Номер 17, С. 4350 - 4367
Опубликована: Янв. 1, 2024
In this paper, a new self-supervised strategy for learning meaningful representations of complex optical Satellite Image Time Series (SITS) is presented. The methodology proposed, named U-BARN, Unet-BERT spAtio-temporal Representation eNcoder, exploits irregularly sampled SITS. designed architecture allows rich and discriminative features from unlabeled data, enhancing the synergy between spatio-spectral temporal dimensions. To train on time series reconstruction pretext task inspired by BERT but adapted to SITS proposed. A Sentinel-2 large-scale data-set used pre-train U-BARN. During pre-training, U-BARN processes annual composed maximum 100 dates. demonstrate its feature capability, encoded are then fed into shallow classifier generate semantic segmentation maps. Experimental results conducted labeled crop (PASTIS) as well dense land cover (MultiSenGE). Two ways exploiting pre-training considered: either weights frozen or fine-tuned. obtained that given more efficient classification than those supervised-trained linear layer. Then, we observe fine-tuning boosts performances MultiSenGE dataset. Additionally, PASTIS, in scenarios with scarce reference brings significative performance gain compared fully-supervised approaches. We also investigate influence percentage elements masked during quality representation. Eventually, show fully supervised reaches better spatio-temporal baseline (U-TAE) both downstream tasks: segmentation.
Язык: Английский
Процитировано
8Nippon Onkyo Gakkaishi/Acoustical science and technology/Nihon Onkyo Gakkaishi, Год журнала: 2024, Номер 45(4), С. 161 - 183
Опубликована: Апрель 3, 2024
Evaluating synthetic speech generated by machines is a complicated process, as it involves judging along multiple dimensions including naturalness, intelligibility, and whether the intended purpose fulfilled. While subjective listening tests conducted with human participants have been gold standard for evaluation, its costly process design has also motivated development of automated objective evaluation protocols. In this review, we first provide historical view test methodologies, from early in-lab comprehension to recent large-scale crowdsourcing mean opinion score (MOS) tests. We then recap automatic measures, ranging signal-based metrics model-based approaches that utilize deep neural networks or even latest self-supervised learning techniques. describe VoiceMOS Challenge series, scientific event founded aims promote data-driven evaluation. Finally, insights into unsolved issues in field well future prospective. This review expected serve an entry point academic researchers enrich their knowledge field, synthesis practitioners catch up on developments.
Язык: Английский
Процитировано
8