Published: Jan. 1, 2024
Language: Английский
Published: Jan. 1, 2024
Language: Английский
Scientific Reports, Journal Year: 2024, Volume and Issue: 14(1)
Published: April 25, 2024
Abstract Depression, a pervasive global mental disorder, profoundly impacts daily lives. Despite numerous deep learning studies focused on depression detection through speech analysis, the shortage of annotated bulk samples hampers development effective models. In response to this challenge, our research introduces transfer approach for detecting in speech, aiming overcome constraints imposed by limited resources. context feature representation, we obtain depression-related features fine-tuning wav2vec 2.0. By integrating 1D-CNN and attention pooling structures, generate advanced at segment level, thereby enhancing model's capability capture temporal relationships within audio frames. realm prediction results, integrate LSTM self-attention mechanisms. This incorporation assigns greater weights segments associated with depression, augmenting discernment information. The experimental results indicate that model has achieved impressive F1 scores, reaching 79% DAIC-WOZ dataset 90.53% CMDC dataset. It outperforms recent baseline models field speech-based detection. provides promising solution low-resource environments.
Language: Английский
Citations
4Computational Intelligence, Journal Year: 2025, Volume and Issue: 41(1)
Published: Jan. 13, 2025
ABSTRACT Depression, a prevalent mental disorder in modern society, significantly impacts people's daily lives. Recently, there have been advancements developing automated diagnosis models for detecting depression. However, data scarcity, primarily due to privacy concerns, has posed challenge. Traditional speech features limitations representing knowledge depression diagnosis, and the complexity of deep learning algorithms necessitates substantial support. Furthermore, existing multimodal methods based on neural networks overlook heterogeneity gap between different modalities, potentially resulting redundant information. To address these issues, we propose detection model Enhanced Cross‐Attention (ECA) Mechanism. This effectively explores text‐speech interactions while considering modality heterogeneity. Data scarcity mitigated by fine‐tuning pre‐trained models. Additionally, design modal fusion module ECA, which emphasizes similarity responses updates weight each feature information features. extraction, reduced computational integrating multi‐window self‐attention mechanism with Fourier transform. The proposed is evaluated public dataset, DAIC‐WOZ, achieving an accuracy 80.0% average F 1 value improvement 4.3% compared relevant methods.
Language: Английский
Citations
0Journal of Voice, Journal Year: 2025, Volume and Issue: unknown
Published: Jan. 1, 2025
Language: Английский
Citations
0BMC Psychiatry, Journal Year: 2025, Volume and Issue: 25(1)
Published: March 7, 2025
Speech impairments significantly affect communication and are associated with social psychological difficulties, particularly among adults aged 45 years older. This study examines the relationship between speech depression using data from China Health Retirement Longitudinal Study (CHARLS). A total of 67,014 participants older were included in analysis. The baseline characteristics without compared chi-square tests. Multivariable logistic linear regression models employed to assess association depression. Sensitivity subgroup analyses performed explore variations across different demographic lifestyle characteristics. Participants exhibited a greater likelihood depression, adjusted odds ratios (Model II: OR = 2.16, 95% CI: 1.56–2.97, p < 0.0001) higher scores β 3.03, 2.24–3.81, after controlling for confounders. analysis confirmed robustness these findings. Subgroup revealed consistent associations all examined subgroups, statistically significant interaction place residence (p 0.02), indicating stronger urban residents. strongly middle-aged elderly Chinese adults. finding underscores importance targeted mental health interventions support this population, settings. Not applicable.
Language: Английский
Citations
0Journal of Affective Disorders, Journal Year: 2024, Volume and Issue: 352, P. 395 - 402
Published: Feb. 9, 2024
Language: Английский
Citations
3Computer Speech & Language, Journal Year: 2023, Volume and Issue: 86, P. 101605 - 101605
Published: Dec. 26, 2023
Speech signals are valuable biomarkers for assessing an individual's mental health, including identifying Major Depressive Disorder (MDD) automatically. A frequently used approach in this regard is to employ features related speaker identity, such as speaker-embeddings. However, over-reliance on identity health screening systems can compromise patient privacy. Moreover, some aspects of may not be relevant depression detection and could serve a bias factor that hampers system performance. To overcome these limitations, we propose disentangling speaker-identity information from depression-related information. Specifically, present four distinct disentanglement methods achieve - adversarial identification (SID)-loss maximization (ADV), SID-loss equalization with variance (LEV), using Cross-Entropy (LECE) KL divergence (LEKLD). Our experiments, which incorporated diverse input model architectures, have yielded improved F1 scores MDD voice-privacy attributes, quantified by Gain Voice Distinctiveness (GVD) De-Identification Scores (DeID). On the DAIC-WOZ dataset (English), LECE ComparE16 results best F1-Scores 80% represents audio-only SOTA F1-Score along GVD −1.1 dB DeID 85%. EATD (Mandarin), ADV raw-audio signal achieves 72.38% surpassing multi-modal −0.89 51.21%. By reducing dependence speaker-identity-related features, our method offers promising direction speech-based preserves
Language: Английский
Citations
8Journal of Medical Internet Research, Journal Year: 2024, Volume and Issue: 26, P. e58572 - e58572
Published: Sept. 26, 2024
Background While speech analysis holds promise for mental health assessment, research often focuses on single symptoms, despite symptom co-occurrences and interactions. In addition, predictive models in do not properly assess the limitations of speech-based systems, such as uncertainty, or fairness a safe clinical deployment. Objective We investigated potential mobile-collected data detecting estimating depression, anxiety, fatigue, insomnia, focusing other factors than mere accuracy, general population. Methods included 865 healthy adults recorded their answers regarding perceived sleep states. asked how they felt if had slept well lately. Clinically validated questionnaires measuring fatigue severity were also used. developed novel machine learning pipeline involving voice activity detection, feature extraction, model training. automatically modeled with pretrained deep that large, open, free database, we selected best one validation set. Based modeling approach, threshold individual score prediction, uncertainty estimation, performance across demographics (age, sex, education) evaluated. used train-validation-test split all evaluations: to develop our models, select ones, generalizability held-out data. Results The was Whisper M max pooling oversampling method. Our methods achieved good detection depression (Patient Health Questionnaire-9: area under curve [AUC]=0.76; F1-score=0.49 Beck Depression Inventory: AUC=0.78; F1-score=0.65), anxiety (Generalized Anxiety Disorder 7-item scale: AUC=0.77; F1-score=0.50), insomnia (Athens Insomnia Scale: AUC=0.73; F1-score=0.62), (Multidimensional Fatigue Inventory total score: AUC=0.68; F1-score=0.88). system performed when it needed abstain from making predictions, demonstrated by low abstention rates risk-coverage AUCs below 0.4. Individual scores accurately predicted (correlations significant Pearson strengths between 0.31 0.49). Fairness revealed consistent sex (average disparity ratio [DR] 0.86, SD 0.13), lesser extent education level DR 0.47, 0.30), worse age groups 0.33, 0.30). Conclusions This study demonstrates systems multifaceted assessment population, only thresholds but severity. Addressing incorporating estimation selective classification are key contributions can enhance utility responsible implementation systems.
Language: Английский
Citations
2Language and Linguistics Compass, Journal Year: 2024, Volume and Issue: 18(5)
Published: July 18, 2024
Abstract Phonetic data are used in several ways outside of the core field phonetics. This paper offers perspective one such field, sociophonetics, towards another, study acoustic cues to clinical depression. While sociophonetics is interested how, when, and why phonetic variables cue information about world, depression focused on how can be by medical professionals as tools diagnosis. The latter only identifying depression, while former variation anything at all. two fields fundamentally differ with respect ontology, epistemology, methodology, I argue that there are, nonetheless, possible avenues for future engagement, collaboration, investigation. Ultimately, both need engage Crip Linguistics any successful intervention relationship between speech
Language: Английский
Citations
1Frontiers in Digital Health, Journal Year: 2024, Volume and Issue: 6
Published: July 25, 2024
Machine learning (ML) algorithms have been heralded as promising solutions to the realization of assistive systems in digital healthcare, due their ability detect fine-grain patterns that are not easily perceived by humans. Yet, ML also critiqued for treating individuals differently based on demography, thus propagating existing disparities. This paper explores gender and race bias speech-based behavioral mental health outcomes.
Language: Английский
Citations
1medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown
Published: March 20, 2024
Abstract Background While speech analysis holds promise for mental health assessment, research often focuses on single symptoms, despite symptom co-occurrences and interactions. In addition, predictive models in do not properly assess speech-based systems’ limitations, such as uncertainty, or fairness a safe clinical deployment. Objective We investigated the potential of mobile-collected data detecting estimating depression, anxiety, fatigue, insomnia, focusing other factors than mere accuracy, general population. Methods included n=865 healthy adults recorded their answers regarding perceived sleep states. asked how they felt if had slept well lately. Clinically validated questionnaires measuring fatigue severity were also used. developed novel machine learning pipeline involving voice activity detection, feature extraction, model training. automatically analyzed participants’ with fully ML automatic to capture variability. Then, we modelled pretrained deep that pre-trained large open free database selected best one validation set. Based modelling approach, evaluated threshold individual score prediction, uncertainty estimation, performance across demographics (age, sex, education). employed train-validation-test split all evaluations: develop our models, select ones generalizability held-out data. Results The was WhisperM max pooling, oversampling method. Our methods achieved good detection depression (PHQ-9 AUC= 0.76F1=0.49, BDI AUC=0.78, F1=0,65), anxiety (GAD-7 F1=0.50, AUC=0.77) insomnia (AIS AUC=0.73, F1=0.62), (MFI Total Score F1=0.88, AUC=0.68). These strengths maintained Fatigue abstention rates uncertain cases (Risk-Coverage AUCs < 0.4). Individual scores predicted accuracy (Correlations significant, Pearson between 0.31 0.49). Fairness revealed consistent sex (average Disparity Ratio (DR) = 0.86), lesser extent education level 0.47) worse age groups 0.33). Conclusions This study demonstrates systems multifaceted assessment population, only thresholds but severity. Addressing incorporating estimation selective classification are key contributions can enhance utility responsible implementation systems. approach offers more accurate nuanced assessments, benefiting both patients clinicians.
Language: Английский
Citations
0