Cited by Bibliometric analysis of natural language processing using CiteSpace and VOSviewer

Multimodal fusion-powered English speaking robot DOI

Rong Pan

Frontiers in Neurorobotics, Journal Year: 2024, Volume and Issue: 18

Published: Nov. 15, 2024

Introduction Speech recognition and multimodal learning are two critical areas in machine learning. Current speech systems often encounter challenges such as high computational demands model complexity. Methods To overcome these issues, we propose a novel framework-EnglishAL-Net, Multimodal Fusion-powered English Speaking Robot. This framework leverages the ALBEF model, optimizing it for real-time interaction, incorporates newly designed text image editor to fuse visual textual information. The robot processes dynamic spoken input through integration of Neural Machine Translation (NMT), enhancing its ability understand respond language. Results discussion In experimental section, constructed dataset containing various scenarios oral instructions testing. results show that compared traditional unimodal processing methods, our significantly improves both language understanding accuracy response time. research not only enhances performance interaction robots but also opens up new possibilities applications robotic technology education, rescue, customer service, other fields, holding significant theoretical practical value.

Language: Английский

Citations

Assessing Audio Hallucination in Large Multimodal Models DOI

Sakuto Hanamaki,

Namesa Kirishima,

Sora Narumi

et al.

Published: June 10, 2024

Speech recognition systems have become increasingly integral in various applications, from virtual assistants to automated transcription services, necessitating the development of models capable accurately processing and transcribing spoken language. The introduction multimodal like ChatGPT-4 Gemini 1.5 Flash represents a significant advancement this field, yet challenges such as audio hallucination, pronunciation handling, punctuation placement remain critical hurdles. This study provides comprehensive evaluation Flash, focusing on their performance English inputs under varying conditions. By employing rigorous statistical qualitative analysis, including metrics Word Error Rate (WER) Character (CER), reveals that exhibits superior accuracy reliability handling complex speech patterns. Detailed examination further elucidates specific areas where each model excels or faces challenges. findings demonstrate importance continuous refinement enhancement improve practical applicability real-world scenarios. research contributes valuable insights into strengths limitations leading technologies, providing benchmark for future developments field.

Language: Английский

Citations

Deep learning techniques for hand vein biometrics: A comprehensive review DOI

Mustapha Hemis, Hamza Kheddar, Sami Bourouis

et al.

Information Fusion, Journal Year: 2024, Volume and Issue: unknown, P. 102716 - 102716

Published: Sept. 1, 2024

Language: Английский

Citations

Lightweight Multi-Domain Fusion Model for Through-Wall Human Activity Recognition Using IR-UWB Radar DOI

Ling Huang, Lei Dong, Bowen Zheng

et al.

Applied Sciences, Journal Year: 2024, Volume and Issue: 14(20), P. 9522 - 9522

Published: Oct. 18, 2024

Impulse radio ultra-wideband (IR-UWB) radar, operating in the low-frequency band, can penetrate walls and utilize its high range resolution to recognize different human activities. Complex deep neural networks have demonstrated significant performance advantages classifying radar spectrograms of various actions, but at cost a substantial computational overhead. In response, this paper proposes lightweight model named TG2-CAFNet. First, clutter suppression time–frequency analysis are used obtain range–time micro-Doppler feature maps Then, leveraging GhostV2 convolution, extraction module, TG2, suitable for is constructed. Using parallel structure, features two extracted separately. Finally, further explore correlation between enhance representation capabilities, an improved nonlinear fusion method called coordinate attention (CAF) proposed based on (AFF). This extends adaptive weighting AFF spatial distribution, effectively capturing subtle relationships spectrograms. Experiments showed that achieved degree lightweightness, while also achieving recognition accuracy 99.1%.

Language: Английский

Citations

Parametric extended physics-informed neural networks for solid mechanics with complex mixed boundary conditions DOI

Geyong Cao, Xiaojun Wang

Journal of the Mechanics and Physics of Solids, Journal Year: 2024, Volume and Issue: 194, P. 105944 - 105944

Published: Nov. 14, 2024

Language: Английский

Citations

MFFR-net: Multi-scale feature fusion and attentive recalibration network for deep neural speech enhancement DOI

Nasir Saleem, Sami Bourouis

Digital Signal Processing, Journal Year: 2024, Volume and Issue: 156, P. 104870 - 104870

Published: Nov. 14, 2024

Language: Английский

Citations

Detection of hate: speech tweets based convolutional neural network and machine learning algorithms DOI

Hameda A. Sennary,

Ghada Y. Abozaid, Mohamed Eid Hussein

et al.

Scientific Reports, Journal Year: 2024, Volume and Issue: 14(1)

Published: Nov. 21, 2024

There is no doubt that social media sites have provided many benefits to humanity, such as sharing information continuously and communicating with others easily. It also seems advantages, but in addition these there are disadvantages we always strive find a solution. One of hate speech. In our study, we're discussing way solve this phenomenon by using Term Frequency-Inverse Document Frequency (TF-IDF) based approach feature engineering on eleven classifiers for machine deep learning can automatically identify Three different databases were used, the first which "Hate speech offensive tweets Davidson et al.", second called "Twitter speech" finally merged data (Cyberbullying dataset (toxicity_parsed_dataset)". The involved Logistic Regression (LR), Naive Bayes (NB), Multi-layer Perceptron (MLP), Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), K-Means, Decision Tree (DT), Gradient Boosting classifier (GBC), Extra Trees (ET) convolutional neural network (CNN). Maximum accuracy was attained, exceeded 99%.

Language: Английский

Citations

Artificial Intelligence in Healthcare: A Fusion of Technologies DOI

Eric Ayintareba Akolgo,

Dennis Redeemer Korda, Emmanuel Oteng Dapaah

et al.

Journal of Computer and Communications, Journal Year: 2024, Volume and Issue: 12(12), P. 116 - 133

Published: Jan. 1, 2024

Language: Английский

Citations

Integrating international Chinese visualization teaching and vocational skills training: leveraging attention-connectionist temporal classification models DOI

Yuan Yao,

Zhujun Dai,

Muhammad Qaiser Shahbaz

et al.

PeerJ Computer Science, Journal Year: 2024, Volume and Issue: 10, P. e2223 - e2223

Published: July 31, 2024

The teaching of Chinese as a second language has become increasingly crucial for promoting cross-cultural exchange and mutual learning worldwide. However, traditional approaches to international have limitations that hinder their effectiveness, such outdated materials, lack qualified instructors, limited access facilities. To overcome these challenges, it is imperative develop intelligent visually engaging methods learners. In this article, we propose leveraging speech recognition technology within artificial intelligence create an oral assistance platform provides visualized pinyin-formatted feedback Additionally, system can identify accent errors provide vocational skills training improve learners’ communication abilities. achieve this, the Attention-Connectionist Temporal Classification (CTC) model, which utilizes specific temporal convolutional neural network capture location information necessary accurate recognition. Our experimental results demonstrate model outperforms similar approaches, with significant reductions in error rates both validation test sets, compared original Attention Claim, Evidence, Reasoning (CER) reduced by 0.67%. Overall, our proposed approach potential enhancing efficiency effectiveness

Language: Английский

Citations

AugMixSpeech: A Data Augmentation Method and Consistency Regularization for Mandarin Automatic Speech Recognition DOI

Yang Jiang, Chen Jun, Kai Han

et al.

Lecture notes in computer science, Journal Year: 2024, Volume and Issue: unknown, P. 145 - 157

Published: Oct. 31, 2024

Language: Английский

Citations