Two-Stream Modality-Based Deep Learning Approach for Enhanced Two-Person Human Interaction Recognition in Videos DOI Creative Commons

Hemel Sharker Akash,

Md Abdur Rahim, Abu Saleh Musa Miah

и другие.

Sensors, Год журнала: 2024, Номер 24(21), С. 7077 - 7077

Опубликована: Ноя. 3, 2024

Human interaction recognition (HIR) between two people in videos is a critical field computer vision and pattern recognition, aimed at identifying understanding human actions for applications such as healthcare, surveillance, human–computer interaction. Despite its significance, video-based HIR faces challenges achieving satisfactory performance due to the complexity of actions, variations motion, different viewpoints, environmental factors. In study, we proposed two-stream deep learning-based system address these improve accuracy reliability systems. process, streams extract hierarchical features based on skeleton RGB information, respectively. first stream, utilised YOLOv8-Pose pose extraction, then extracted with three stacked LSM modules enhanced them dense layer that considered final feature stream. second SAM input videos, after filtering Segment Anything Model (SAM) feature, employed integrated LSTM GRU long-range dependency was stream module. Here, segmented mesh generation, ImageNet used extraction from images or meshes, focusing extracting relevant sequential image data. Moreover, newly created custom filter function enhance computational efficiency eliminate irrelevant keypoints components dataset. We concatenated produced fed into classification The extensive experiment benchmark datasets model achieved 96.56% 96.16% accuracy, high-performance proved superiority.

Язык: Английский

Hand gesture recognition using sEMG signals with a multi-stream time-varying feature enhancement approach DOI Creative Commons

Jungpil Shin,

Abu Saleh Musa Miah,

Sota Konnai

и другие.

Scientific Reports, Год журнала: 2024, Номер 14(1)

Опубликована: Сен. 27, 2024

Язык: Английский

Процитировано

8

Korean Sign Language Alphabet Recognition Through the Integration of Handcrafted and Deep Learning-Based Two-Stream Feature Extraction Approach DOI Creative Commons
Jungpil Shin, Abu Saleh Musa Miah,

Yuto Akiba

и другие.

IEEE Access, Год журнала: 2024, Номер 12, С. 68303 - 68318

Опубликована: Янв. 1, 2024

Recognizing sign language plays a crucial role in improving communication accessibility for the Deaf and hard-of-hearing communities. In Korea, many individuals facing hearing speech challenges depend on Korean Sign Language (KSL) as their primary means of communication. Many researchers have been working to develop recognition system other languages, but little research has done KSL alphabet recognition. However, existing systems faced significant performance limitations due ineffectiveness features. To address these issues, we introduce an innovative employing strategic fusion approach. this study, combined joint skeleton-based handcrafted features pixel-based resnet101 transfer learning overcome traditional systems. Our proposed consists two distinct streams: first stream extracts essential features, placing emphasis capturing hand orientation information within gestures. second stream, concurrently, employed deep learning-based module capture hierarchical representations sign. By combining from with generate multiple levels fused goal forming comprehensive representation Finally, fed concatenated feature into classification classification.We conducted extensive experiments newly created dataset, digit ArSL ASL benchmark datasets. model undeniably shows that our approach substantially improves high-performance accuracy both cases, which proves system's superiority.

Язык: Английский

Процитировано

7

Fall recognition using a three stream spatio temporal GCN model with adaptive feature aggregation DOI Creative Commons

Jungpil Shin,

Abu Saleh Musa Miah,

Rei Egawa

и другие.

Scientific Reports, Год журнала: 2025, Номер 15(1)

Опубликована: Март 27, 2025

The prevention of falls is paramount in modern healthcare, particularly for the elderly, as can lead to severe injuries or even fatalities. Additionally, growing incidence among coupled with urgent need prevent suicide attempts resulting from medication overdose, underscores critical importance accurate and efficient methods detecting a fall. This makes computer-aided fall detection system necessary save elderly people's lives worldwide. Many researchers have been working develop systems. However, existing systems often struggle problems such unsatisfactory accuracy, limited robustness, high computational complexity, sensitivity environmental factors. In response these challenges, this paper proposes novel three-stream spatio-temporal feature-based human system. Our incorporates joint skeleton-based spatial temporal Graph Convolutional Network (GCN) features, motion-based GCN residual connections-based features. Each stream employs adaptive graph-based feature aggregation consecutive separable convolutional neural networks (Sep-TCN), significantly reducing complexity number parameters model compared prior Experimental results on multiple datasets demonstrate superior effectiveness efficiency our proposed system, accuracies 99.68%, 99.97%, 99.47 % 98.97% achieved ImViA, Fall-UP, FU-Kinect UR-Fall datasets, respectively. remarkable performance highlights its superiority, efficiency, generalizability real-world scenarios, offering significant advancements healthcare societal well-being.

Язык: Английский

Процитировано

1

Anomaly Detection in Weakly Supervised Videos Using Multistage Graphs and General Deep Learning Based Spatial-Temporal Feature Enhancement DOI Creative Commons
Jungpil Shin, Yuta Kaneko, Abu Saleh Musa Miah

и другие.

IEEE Access, Год журнала: 2024, Номер 12, С. 65213 - 65227

Опубликована: Янв. 1, 2024

Язык: Английский

Процитировано

5

Multimodal Fall Detection Using Spatial–Temporal Attention and Bi-LSTM-Based Feature Fusion DOI Creative Commons
Jungpil Shin, Abu Saleh Musa Miah,

Rei Egawa

и другие.

Future Internet, Год журнала: 2025, Номер 17(4), С. 173 - 173

Опубликована: Апрель 15, 2025

Human fall detection is a significant healthcare concern, particularly among the elderly, due to its links muscle weakness, cardiovascular issues, and locomotive syndrome. Accurate crucial for timely intervention injury prevention, which has led many researchers work on developing effective systems. However, existing unimodal systems that rely solely skeleton or sensor data face challenges such as poor robustness, computational inefficiency, sensitivity environmental conditions. While some multimodal approaches have been proposed, they often struggle capture long-range dependencies effectively. In order address these challenges, we propose framework integrates data. The system uses Graph-based Spatial-Temporal Convolutional Attention Neural Network (GSTCAN) spatial temporal relationships from motion information in stream-1, while Bi-LSTM with Channel (CA) processes stream-2, extracting both features. GSTCAN model AlphaPose extraction, calculates between consecutive frames, applies graph convolutional network (GCN) CA mechanism focus relevant features suppressing noise. parallel, inertial signals, capturing refining feature representations. branches are fused passed through fully connected layer classification, providing comprehensive understanding of human motion. proposed was evaluated Fall Up UR datasets, achieving classification accuracy 99.09% 99.32%, respectively, surpassing methods. This robust efficient demonstrates strong potential accurate continuous monitoring.

Язык: Английский

Процитировано

0

Multi-view Isolated sign language recognition based on cross-view and multi-level transformer DOI

Zhong Guan,

Yongli Hu, Huajie Jiang

и другие.

Multimedia Systems, Год журнала: 2025, Номер 31(3)

Опубликована: Май 1, 2025

Язык: Английский

Процитировано

0

Pakistan Sign Language Recognition: From Videos to Images DOI

Hafiz Muhammad Hamza,

Aamir Wali

Signal Image and Video Processing, Год журнала: 2025, Номер 19(8)

Опубликована: Июнь 2, 2025

Язык: Английский

Процитировано

0

Artificial intelligence in sign language recognition: A comprehensive bibliometric and visual analysis DOI
Yanqiong Zhang, Han Yu, Zhaosong Zhu

и другие.

Computers & Electrical Engineering, Год журнала: 2024, Номер 120, С. 109854 - 109854

Опубликована: Ноя. 14, 2024

Язык: Английский

Процитировано

1

Sign Language Interpreting System Using Recursive Neural Networks DOI Creative Commons

Erick A. Borges-Galindo,

Nayely Morales-Ramírez,

Mario González-Lee

и другие.

Applied Sciences, Год журнала: 2024, Номер 14(18), С. 8560 - 8560

Опубликована: Сен. 23, 2024

According to the World Health Organization (WHO), 5% of people around world have hearing disabilities, which limits their capacity communicate with others. Recently, scientists proposed systems based on deep learning techniques create a sign language-to-text translator, expecting this help deaf communicate; however, performance such is still low for practical scenarios. Furthermore, are language-oriented, leads particular problems related signs each language. For reason, address problem, in paper, we propose system Recursive Neural Network (RNN) focused Mexican Sign Language (MSL) that uses spatial tracking hands and facial expressions predict word person intends communicate. To achieve this, trained four RNN-based models using dataset 600 clips were 30 s long; included clips. We conducted two experiments; tailored first experiment determine most well-suited model target application measure accuracy resulting offline mode; second experiment, measured online mode. assessed system’s following metrics: precision, recall, F1-score, number errors during scenarios, results computed indicate an 0.93 mode higher operating compared previously approaches. These underscore potential scheme scenarios as teaching, learning, commercial transactions, daily communications among non-deaf people.

Язык: Английский

Процитировано

0

Two-Stream Modality-Based Deep Learning Approach for Enhanced Two-Person Human Interaction Recognition in Videos DOI Creative Commons

Hemel Sharker Akash,

Md Abdur Rahim, Abu Saleh Musa Miah

и другие.

Sensors, Год журнала: 2024, Номер 24(21), С. 7077 - 7077

Опубликована: Ноя. 3, 2024

Human interaction recognition (HIR) between two people in videos is a critical field computer vision and pattern recognition, aimed at identifying understanding human actions for applications such as healthcare, surveillance, human–computer interaction. Despite its significance, video-based HIR faces challenges achieving satisfactory performance due to the complexity of actions, variations motion, different viewpoints, environmental factors. In study, we proposed two-stream deep learning-based system address these improve accuracy reliability systems. process, streams extract hierarchical features based on skeleton RGB information, respectively. first stream, utilised YOLOv8-Pose pose extraction, then extracted with three stacked LSM modules enhanced them dense layer that considered final feature stream. second SAM input videos, after filtering Segment Anything Model (SAM) feature, employed integrated LSTM GRU long-range dependency was stream module. Here, segmented mesh generation, ImageNet used extraction from images or meshes, focusing extracting relevant sequential image data. Moreover, newly created custom filter function enhance computational efficiency eliminate irrelevant keypoints components dataset. We concatenated produced fed into classification The extensive experiment benchmark datasets model achieved 96.56% 96.16% accuracy, high-performance proved superiority.

Язык: Английский

Процитировано

0