
Natural Language Processing Journal, Journal Year: 2024, Volume and Issue: unknown, P. 100123 - 100123
Published: Dec. 1, 2024
Language: Английский
Natural Language Processing Journal, Journal Year: 2024, Volume and Issue: unknown, P. 100123 - 100123
Published: Dec. 1, 2024
Language: Английский
Frontiers in Neurorobotics, Journal Year: 2024, Volume and Issue: 18
Published: Nov. 15, 2024
Introduction Speech recognition and multimodal learning are two critical areas in machine learning. Current speech systems often encounter challenges such as high computational demands model complexity. Methods To overcome these issues, we propose a novel framework-EnglishAL-Net, Multimodal Fusion-powered English Speaking Robot. This framework leverages the ALBEF model, optimizing it for real-time interaction, incorporates newly designed text image editor to fuse visual textual information. The robot processes dynamic spoken input through integration of Neural Machine Translation (NMT), enhancing its ability understand respond language. Results discussion In experimental section, constructed dataset containing various scenarios oral instructions testing. results show that compared traditional unimodal processing methods, our significantly improves both language understanding accuracy response time. research not only enhances performance interaction robots but also opens up new possibilities applications robotic technology education, rescue, customer service, other fields, holding significant theoretical practical value.
Language: Английский
Citations
3Published: June 10, 2024
Speech recognition systems have become increasingly integral in various applications, from virtual assistants to automated transcription services, necessitating the development of models capable accurately processing and transcribing spoken language. The introduction multimodal like ChatGPT-4 Gemini 1.5 Flash represents a significant advancement this field, yet challenges such as audio hallucination, pronunciation handling, punctuation placement remain critical hurdles. This study provides comprehensive evaluation Flash, focusing on their performance English inputs under varying conditions. By employing rigorous statistical qualitative analysis, including metrics Word Error Rate (WER) Character (CER), reveals that exhibits superior accuracy reliability handling complex speech patterns. Detailed examination further elucidates specific areas where each model excels or faces challenges. findings demonstrate importance continuous refinement enhancement improve practical applicability real-world scenarios. research contributes valuable insights into strengths limitations leading technologies, providing benchmark for future developments field.
Language: Английский
Citations
1Information Fusion, Journal Year: 2024, Volume and Issue: unknown, P. 102716 - 102716
Published: Sept. 1, 2024
Language: Английский
Citations
1Applied Sciences, Journal Year: 2024, Volume and Issue: 14(20), P. 9522 - 9522
Published: Oct. 18, 2024
Impulse radio ultra-wideband (IR-UWB) radar, operating in the low-frequency band, can penetrate walls and utilize its high range resolution to recognize different human activities. Complex deep neural networks have demonstrated significant performance advantages classifying radar spectrograms of various actions, but at cost a substantial computational overhead. In response, this paper proposes lightweight model named TG2-CAFNet. First, clutter suppression time–frequency analysis are used obtain range–time micro-Doppler feature maps Then, leveraging GhostV2 convolution, extraction module, TG2, suitable for is constructed. Using parallel structure, features two extracted separately. Finally, further explore correlation between enhance representation capabilities, an improved nonlinear fusion method called coordinate attention (CAF) proposed based on (AFF). This extends adaptive weighting AFF spatial distribution, effectively capturing subtle relationships spectrograms. Experiments showed that achieved degree lightweightness, while also achieving recognition accuracy 99.1%.
Language: Английский
Citations
1Journal of the Mechanics and Physics of Solids, Journal Year: 2024, Volume and Issue: 194, P. 105944 - 105944
Published: Nov. 14, 2024
Language: Английский
Citations
1Digital Signal Processing, Journal Year: 2024, Volume and Issue: 156, P. 104870 - 104870
Published: Nov. 14, 2024
Language: Английский
Citations
1Scientific Reports, Journal Year: 2024, Volume and Issue: 14(1)
Published: Nov. 21, 2024
There is no doubt that social media sites have provided many benefits to humanity, such as sharing information continuously and communicating with others easily. It also seems advantages, but in addition these there are disadvantages we always strive find a solution. One of hate speech. In our study, we're discussing way solve this phenomenon by using Term Frequency-Inverse Document Frequency (TF-IDF) based approach feature engineering on eleven classifiers for machine deep learning can automatically identify Three different databases were used, the first which "Hate speech offensive tweets Davidson et al.", second called "Twitter speech" finally merged data (Cyberbullying dataset (toxicity_parsed_dataset)". The involved Logistic Regression (LR), Naive Bayes (NB), Multi-layer Perceptron (MLP), Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), K-Means, Decision Tree (DT), Gradient Boosting classifier (GBC), Extra Trees (ET) convolutional neural network (CNN). Maximum accuracy was attained, exceeded 99%.
Language: Английский
Citations
1Journal of Computer and Communications, Journal Year: 2024, Volume and Issue: 12(12), P. 116 - 133
Published: Jan. 1, 2024
Language: Английский
Citations
0PeerJ Computer Science, Journal Year: 2024, Volume and Issue: 10, P. e2223 - e2223
Published: July 31, 2024
The teaching of Chinese as a second language has become increasingly crucial for promoting cross-cultural exchange and mutual learning worldwide. However, traditional approaches to international have limitations that hinder their effectiveness, such outdated materials, lack qualified instructors, limited access facilities. To overcome these challenges, it is imperative develop intelligent visually engaging methods learners. In this article, we propose leveraging speech recognition technology within artificial intelligence create an oral assistance platform provides visualized pinyin-formatted feedback Additionally, system can identify accent errors provide vocational skills training improve learners’ communication abilities. achieve this, the Attention-Connectionist Temporal Classification (CTC) model, which utilizes specific temporal convolutional neural network capture location information necessary accurate recognition. Our experimental results demonstrate model outperforms similar approaches, with significant reductions in error rates both validation test sets, compared original Attention Claim, Evidence, Reasoning (CER) reduced by 0.67%. Overall, our proposed approach potential enhancing efficiency effectiveness
Language: Английский
Citations
0Lecture notes in computer science, Journal Year: 2024, Volume and Issue: unknown, P. 145 - 157
Published: Oct. 31, 2024
Language: Английский
Citations
0