Stacking Ensemble Learning with 1D CNN and CapsuleNet for Speech Emotion Recognition DOI
Bhanusree Yalamanchili,

B. Janardhana Rao,

K. Revathi

et al.

Cognitive science and technology, Journal Year: 2025, Volume and Issue: unknown, P. 69 - 80

Published: Jan. 1, 2025

Language: Английский

Authorship identification using ensemble learning DOI Creative Commons
Ahmed Abbasi, Abdul Rehman Javed, Farkhund Iqbal

et al.

Scientific Reports, Journal Year: 2022, Volume and Issue: 12(1)

Published: June 9, 2022

Abstract With time, textual data is proliferating, primarily through the publications of articles. this rapid increase in data, anonymous content also increasing. Researchers are searching for alternative strategies to identify author an unknown text. There a need develop system actual texts based on given set writing samples. This study presents novel approach ensemble learning, DistilBERT , and conventional machine learning techniques authorship identification. The proposed extracts valuable characteristics using count vectorizer bi-gram Term frequency-inverse document frequency (TF-IDF). An extensive detailed dataset, “All news” used experimentation. dataset divided into three subsets (article1, article2, article3). We limit scope selected ten authors first 20 second experimental results provide better performance all dataset. In scope, prove that from 10 provides accuracy gain 3.14% 2.44% article1 Similarly, authors, 5.25% 7.17% which than previous state-of-the-art studies.

Language: Английский

Citations

43

Speech emotion recognition via graph-based representations DOI Creative Commons
Anastasia Pentari, George P. Kafentzis, Manolis Tsiknakis

et al.

Scientific Reports, Journal Year: 2024, Volume and Issue: 14(1)

Published: Feb. 23, 2024

Abstract Speech emotion recognition (SER) has gained an increased interest during the last decades as part of enriched affective computing. As a consequence, variety engineering approaches have been developed addressing challenge SER problem, exploiting different features, learning algorithms, and datasets. In this paper, we propose application graph theory for classifying emotionally-colored speech signals. Graph provides tools extracting statistical well structural information from any time series. We to use mentioned novel feature set. Furthermore, suggest setting unique feature-based identity each belonging speaker. The classification is performed by Random Forest classifier in Leave-One-Speaker-Out Cross Validation (LOSO-CV) scheme. proposed method compared with two state-of-the-art involving known hand-crafted features deep architectures operating on mel-spectrograms. Experimental results three datasets, EMODB (German, acted) AESDD (Greek, acted), DEMoS (Italian, in-the-wild), reveal that our outperforms comparative methods these Specifically, observe average UAR increase almost $$18\%$$ 18 % , $$8\%$$ 8 $$13\%$$ 13 respectively.

Language: Английский

Citations

10

Multimodal Emotion Recognition Using Bi-LG-GCN for MELD Dataset DOI Open Access
Hussein Farooq Tayeb Al-Saadawi, Resul Daş

Balkan Journal of Electrical and Computer Engineering, Journal Year: 2024, Volume and Issue: 12(1), P. 36 - 46

Published: March 1, 2024

Emotion recognition using multimodal data is a widely adopted approach due to its potential enhance human interactions and various applications. By leveraging for emotion recognition, the quality of can be significantly improved. We present Multimodal Lines Dataset (MELD) novel method bi-lateral gradient graph neural network (Bi-LG-GNN) feature extraction pre-processing. The dataset uses fine-grained labeling textual, audio, visual modalities. This work aims identify affective computing states successfully concealed in textual audio sentiment analysis. use pre-processing techniques improve consistency increase dataset’s usefulness. process also includes noise removal, normalization, linguistic processing deal with variances background discourse. Kernel Principal Component Analysis (K-PCA) employed extraction, aiming derive valuable attributes from each modality encode labels array values. propose Bi-LG-GCN-based architecture explicitly tailored effectively fusing Bi-LG-GCN system takes modality's feature-extracted pre-processed representation as input generator network, generating realistic synthetic samples that capture relationships. These generated samples, reflecting relationships, serve inputs discriminator which has been trained distinguish genuine data. With this approach, model learn discriminative features make accurate predictions regarding subsequent emotional states. Our was evaluated on MELD dataset, yielding notable results terms accuracy (80%), F1-score (81%), precision recall (81%) when dataset. steps discrimination. featuring synthesis, outperforms contemporary techniques, thus demonstrating practical utility.

Language: Английский

Citations

10

TLEFuzzyNet: Fuzzy Rank-Based Ensemble of Transfer Learning Models for Emotion Recognition From Human Speeches DOI Creative Commons
Karam Kumar Sahoo,

Ishan Dutta,

Muhammad Fazal Ijaz

et al.

IEEE Access, Journal Year: 2021, Volume and Issue: 9, P. 166518 - 166530

Published: Jan. 1, 2021

Human speech is not only a verbose medium of communication but it also conveys emotions. The past decade has seen lot research going on with data which becomes especially important for human-computer interaction and healthcare, security entertainment. This paper proposes the TLEFuzzyNet model, three-stage pipeline emotion recognition from speech. first stage includes feature extraction by augmentation signals Mel spectrograms, followed use three pre-trained transfer learning CNN models namely, ResNet18, Inception_v3 GoogleNet whose prediction scores are fed to third stage. In final stage, we assign Fuzzy Ranks using modified Gompertz function gives after considering individual models. We have used Surrey Audio-Visual Expressed Emotion (SAVEE), Ryerson Database Emotional Speech Song (RAVDESS) Berlin (EmoDB) datasets evaluate model achieved state-of-the-art performance hence dependable framework recognition(SER). All codes available GitHub link: https://github.com/KaramSahoo/SpeechEmotionRecognitionFuzzy.

Language: Английский

Citations

53

GM-TCNet: Gated Multi-scale Temporal Convolutional Network using Emotion Causality for Speech Emotion Recognition DOI
Jiaxin Ye, Xin-Cheng Wen,

Xuan-Ze Wang

et al.

Speech Communication, Journal Year: 2022, Volume and Issue: 145, P. 21 - 35

Published: Sept. 15, 2022

Language: Английский

Citations

36

Short-Text Classification Detector: A Bert-Based Mental Approach DOI Open Access
Yongjun Hu, Jia Ding, Zixin Dou

et al.

Computational Intelligence and Neuroscience, Journal Year: 2022, Volume and Issue: 2022, P. 1 - 11

Published: March 10, 2022

With the continuous development of Internet, social media based on short text has become popular. However, sparsity and shortness essays will restrict accuracy classification. Therefore, Bert model, we capture mental feature reviewers apply them for classification to improve its accuracy. Specifically, construct a model at language level fine tune better embed features. To verify this method, compare variety machine learning methods, such as support vector machine, convolution neural networks, recurrent networks. The results show following: (1) Through comparison, it is found that features can significantly (2) Combining input vectors provide more than separating two independent vectors. (3) be integrate text. results. This help promote

Language: Английский

Citations

35

A novel convolutional neural network with gated recurrent unit for automated speech emotion recognition and classification DOI
P. Prakash,

Durairaj Anuradha,

Javid Iqbal

et al.

Journal of Control and Decision, Journal Year: 2022, Volume and Issue: 10(1), P. 54 - 63

Published: June 13, 2022

Automated Speech Emotion Recognition (SER) becomes more popular and has increased applicability. SER concentrates on the automatic identification of emotional state a human being using speech signals. It mainly depends upon in-depth analysis signal, extracts features containing details from utilises pattern recognition techniques for identification. The major problem in is to extract discriminate, powerful, salient acoustical content proposed model aims detect classify three states such as happy, neutral, sad. presented makes use Convolution neural network – Gated Recurrent unit (CNN-GRU) based feature extraction technique which derives set vectors. A comprehensive simulation takes place Berlin German Database SJTU Chinese comprises numerous audio files under collection different emotion labels.

Language: Английский

Citations

31

The Emotion Probe: On the Universality of Cross-Linguistic and Cross-Gender Speech Emotion Recognition via Machine Learning DOI Creative Commons
Giovanni Costantini, Emilia Parada‐Cabaleiro, Daniele Casali

et al.

Sensors, Journal Year: 2022, Volume and Issue: 22(7), P. 2461 - 2461

Published: March 23, 2022

Machine Learning (ML) algorithms within a human–computer framework are the leading force in speech emotion recognition (SER). However, few studies explore cross-corpora aspects of SER; this work aims to feasibility and characteristics cross-linguistic, cross-gender SER. Three ML classifiers (SVM, Naïve Bayes MLP) applied acoustic features, obtained through procedure based on Kononenko’s discretization correlation-based feature selection. The system encompasses five emotions (disgust, fear, happiness, anger sadness), using Emofilm database, comprised short clips English movies respective Italian Spanish dubbed versions, for total 1115 annotated utterances. results see MLP as most effective classifier, with accuracies higher than 90% single-language approaches, while cross-language classifier still yields 80%. show tasks be more difficult those involving two languages, suggesting greater differences between expressed by male versus female subjects different languages. Four domains, namely, RASTA, F0, MFCC spectral energy, algorithmically assessed effective, refining existing literature approaches standard sets. To our knowledge, is one first encompassing cross-linguistic assessments

Language: Английский

Citations

26

Fusion of spectral and prosody modelling for multilingual speech emotion conversion DOI
Susmitha Vekkot, Deepa Gupta

Knowledge-Based Systems, Journal Year: 2022, Volume and Issue: 242, P. 108360 - 108360

Published: Feb. 9, 2022

Language: Английский

Citations

24

Ensemble Learning by High-Dimensional Acoustic Features for Emotion Recognition from Speech Audio Signal DOI Open Access
Mukkoti Maruthi Venkata Chalapathi, M. Rudra Kumar, Neeraj Sharma

et al.

Security and Communication Networks, Journal Year: 2022, Volume and Issue: 2022, P. 1 - 10

Published: Feb. 28, 2022

In the recent past, handling high dimensionality demonstrated in auditory features of speech signals has been a primary focus for machine learning (ML-)based emotion recognition. The incorporation high-dimensional characteristics training datasets phase ML models influences contemporary approaches to prediction with significant false alerting. curse excessive corpus is addressed majority models. Modern models, on other hand, place greater emphasis merging many classifiers, which can only increase recognition accuracy even when contains data points. “Ensemble Learning by High-Dimensional Acoustic Features (EL-HDAF)” an innovative ensemble model that leverages diversity assessment feature values spanned over diversified classes recommend best features. Furthermore, proposed technique employs one-of-a-kind clustering process limit impact values. experimental inquiry evaluates and compares forecasting using spoken audio current methods use Fourfold cross-validation used performance analysis standard corpus.

Language: Английский

Citations

23