Custom Dataset Text Classification: An Ensemble Approach with Machine Learning and Deep Learning Models DOI

Deekshitha Valluri,

Suneetha Manne,

Nikitha Tripuraneni

et al.

Published: Dec. 21, 2023

In academic institutions, commercial enterprises, research centers, technology-heavy businesses, and government funding agencies, maintaining consistent data is a major difficulty. For an entity, which might be anything from object to place or thing, most are irregular. These days, identify significant patterns that represent the data, entity links in dataset investigated by text mining analytics. With this knowledge, alternatives then taken. Analytics creates finds turning words into numbers. end, better organization results conclusions. However, classifying processing each piece of hand difficult. As result, domain Natural Language Processing (NLP), looks at grammatical lexical patterns, intelligent systems have emerged. Before mining, it's imperative examine comprehend nature data. Text categorization requires automation because increasing volume requirement for accuracy precision. It interesting study opportunity develop automatic texts with deep learning methods handle difficult NLP tasks semantic constraints. founded on analytics, can facilitate information discovery. The majority advantages obtained applying these insights emerging applications support decision-making, improve resources. Improved techniques parameter optimization demonstrating effective knowledge discovery will focus future studies.

Language: Английский

Rapid and accurate quality evaluation of Angelicae Sinensis Radix based on near-infrared spectroscopy and Bayesian optimized LSTM network DOI

Lei Bai,

Zhi‐Tong Zhang,

Huanhuan Guan

et al.

Talanta, Journal Year: 2024, Volume and Issue: 275, P. 126098 - 126098

Published: April 12, 2024

Language: Английский

Citations

9

AI-based disease category prediction model using symptoms from low-resource Ethiopian language: Afaan Oromo text DOI Creative Commons
Etana Fikadu Dinsa, Mrinal Kanti Das,

Teklu Urgessa Abebe

et al.

Scientific Reports, Journal Year: 2024, Volume and Issue: 14(1)

Published: May 16, 2024

Automated disease diagnosis and prediction, powered by AI, play a crucial role in enabling medical professionals to deliver effective care patients. While such predictive tools have been extensively explored resource-rich languages like English, this manuscript focuses on predicting categories automatically from symptoms documented the Afaan Oromo language, employing various classification algorithms. This study encompasses machine learning techniques as support vector machines, random forests, logistic regression, Naïve Bayes, well deep approaches including LSTM, GRU, Bi-LSTM. Due unavailability of standard corpus, we prepared three data sets with different numbers patient arranged into 10 categories. The two feature representations, TF-IDF word embedding, were employed. performance proposed methodology has evaluated using accuracy, recall, precision, F1 score. experimental results show that, among models, SVM model had highest accuracy score 94.7%, while LSTM word2vec embedding showed an rate 95.7% 96.0% models. To enhance optimal each model, several hyper-parameter tuning settings used. shows that verifies be best all other models over entire dataset.

Language: Английский

Citations

5

Optimising window size of semantic of classification model for identification of in-text citations based on context and intent DOI Creative Commons
Arshad Iqbal, Abdul Shahid, Muhammad Roman

et al.

PLoS ONE, Journal Year: 2025, Volume and Issue: 20(3), P. e0309862 - e0309862

Published: March 24, 2025

Citations in scientific literature act as channels for the sharing, transfer, and development of knowledge. However, not all citations hold same significance. Numerous taxonomies machine learning models have been developed to analyze citations, but they often overlook internal context these citations. Moreover, it is worth noting that selecting appropriate word embedding classification crucial achieving superior results. Word embeddings offer n-dimensional distributed representations text, striving capture nuanced meanings words. Deep learning-based techniques garnered significant attention found application various Natural Language Processing (NLP) tasks, including text classification, sentiment analysis, citation analysis. Current state-of-the-art use small datasets with fixed window sizes, resulting loss contextual meaning. This study leverages two benchmark encompassing a substantial volume in-text guide selection an optimal size approaches. A comparative analysis sizes conducted identify effectively. Additionally, Word2Vec employed conjunction deep such Convolutional Neural Networks (CNNs), Gated Recurrent Units (GRUs), Long Short-Term Memory (LSTM) networks, Support Vector Machines (SVM), Decision Trees, Naive Bayes.The evaluation employs precision, recall, F1-score, accuracy metrics each combination sizes. The findings reveal that, particularly lengthy larger windows are more adept at capturing semantic essence references. Within scope this study, 10 achieve precision both models.

Language: Английский

Citations

0

Enhancing Text Classification Through Grammar-Based Feature Engineering and Learning Models DOI Creative Commons
Alaa Mohasseb, Andreas Kanavos, Eslam Amer

et al.

Information, Journal Year: 2025, Volume and Issue: 16(6), P. 424 - 424

Published: May 22, 2025

Text classification remains a challenging task in natural language processing (NLP) due to linguistic complexity and data imbalance. This study proposes hybrid approach that integrates grammar-based feature engineering with deep learning transformer models enhance performance. A dataset of factoid non-factoid questions, further categorized into causal, choice, confirmation, hypothetical, list types, is used evaluate several models, including CNNs, BiLSTMs, MLPs, BERT, DistilBERT, Electra, GPT-2. Grammatical domain-specific features are explicitly extracted leveraged improve multi-class classification. To address class imbalance, the SMOTE algorithm applied, significantly boosting recall F1-score for minority classes. Experimental results show DistilBERT achieves highest binary accuracy, equal 94%, while BiLSTM CNN outperform transformers settings, reaching up 92% accuracy. These findings confirm provide critical syntactic semantic insights, enhancing model robustness interpretability beyond conventional embeddings.

Language: Английский

Citations

0

Automatic categorization of medical documents in Afaan Oromo using ensemble machine learning techniques DOI Creative Commons
Etana Fikadu Dinsa, Mrinal Kanti Das,

Teklu Urgessa Abebe

et al.

Deleted Journal, Journal Year: 2024, Volume and Issue: 6(11)

Published: Oct. 28, 2024

Automatic medical document classification using machine learning techniques can enhance the productivity of healthcare services by reducing processing time and cost. This work proposes an ensemble approach to develop a model that classifies electronic documents in Afaan Oromo. The main tasks this are preparing corpus, pre-processing, training models, process. We used term frequency-inverse frequency (TF-IDF) bag words (BOW) feature extraction methods. An technique is it creates multiple individual classifier predictions from naïve Bayes, random forest, SVM, logistic regression then combines them advance reliable more accurate classifier. Evaluation measures were employed accuracy, F1-score, recall, precision for performance comparison. efficiency proposed method compared with two existing boosting approaches, namely gradient adaboost. experimental result shows BOW over TF-IDF on our dataset. These results also illustrated effectiveness scoring 94.81% accuracy 94.84% F1-score. significantly contributes technological enhancement service delivery, managing through methods, advancing data systems sectors.

Language: Английский

Citations

3

Multimodal Religiously Hateful Social Media Memes Classification Based on Textual and Image Data DOI Open Access
Ameer Hamza, Abdul Rehman Javed, Farkhund Iqbal

et al.

ACM Transactions on Asian and Low-Resource Language Information Processing, Journal Year: 2023, Volume and Issue: 23(8), P. 1 - 19

Published: Sept. 16, 2023

Multimodal hateful social media meme detection is an important and challenging problem in the vision-language domain. Recent studies show high accuracy for such multimodal tasks due to datasets that provide better joint embedding narrow semantic gap. Religiously not extensively explored among published datasets. While there a need higher on religiously memes, deep learning–based models often suffer from inductive bias. This issue addressed this work with following contributions. First, memes dataset created publicly advance religious research. Over 2000 images are collected their corresponding text. The proposed approach compares fine-tunes VisualBERT pre-trained Conceptual Caption (CC) downstream classification task. We also extend Facebook dataset. extract visual features using ResNeXT-152 Aggregated Residual Transformations–based Masked Regions Convolutional Neural Networks (R-CNN) Bidirectional Encoder Representations Transformers (BERT) uncased textual encoding early fusion model. use primary evaluation metric of Area Under Operator Characters Curve (AUROC) measure model separability. Results has AUROC score 78%, proving model’s separability performance 70%. It shows comparatively superior considering size against ensemble-based machine learning approaches.

Language: Английский

Citations

7

Transfer Learning-based Forensic Analysis and Classification of E-Mail Content DOI Open Access
Farkhund Iqbal, Abdul Rehman Javed, Rutvij H. Jhaveri

et al.

ACM Transactions on Asian and Low-Resource Language Information Processing, Journal Year: 2023, Volume and Issue: unknown

Published: June 28, 2023

research-article Free Access Share on Transfer Learning-based Forensic Analysis and Classification of E-Mail ContentJust Accepted Authors: Farkhund Iqbal College Technological Innovation Zayed University, UAE UAEView Profile , Abdul Rehman Javed Department Electrical Computer Engineering Lebanese American Lebanon LebanonView Rutvij H. Jhaveri Science Engineering, School Technology Pandit Deendayal Energy India IndiaView Ahmad Almadhor Networks, Information Sciences Jouf Saudi Arabia ArabiaView Umar Farooq National University Emerging Sciences, Pakistan PakistanView Authors Info & Claims ACM Transactions Asian Low-Resource Language ProcessingAccepted June 2023https://doi.org/10.1145/3604592Published:28 2023Publication History 0citation78DownloadsMetricsTotal Citations0Total Downloads78Last 12 Months78Last 6 weeks78 Get Citation AlertsNew Alert added!This alert has been successfully added will be sent to:You notified whenever a record that you have chosen cited.To manage your preferences, click the button below.Manage my Alert!Please log in to account Save BinderSave BinderCreate New BinderNameCancelCreateExport CitationPublisher SiteeReaderPDF

Language: Английский

Citations

5

Comprehensive Review of Multimodal Medical Data Analysis: Open Issues and Future Research Directions DOI Creative Commons
Shashank Shetty,

V. S. Ananthanarayana,

Ajit Mahale

et al.

Acta Informatica Pragensia, Journal Year: 2022, Volume and Issue: 11(3), P. 423 - 457

Published: Dec. 26, 2022

Over the past few decades, enormous expansion of medical data has led to searching for ways analysis in smart healthcare systems.Acquisition from pictures, archives, communication systems, electronic health records, online documents, radiology reports and clinical records different styles with specific numerical information given rise concept multimodality need machine learning deep techniques system.Medical play a vital role education diagnosis; determining dependency between distinct modalities is essential.This paper gives gist current their various approaches frameworks representation classification.A brief outline existing multimodal processing work presented.The main objective this study spot gaps surveyed area list future tasks challenges radiology.The Preferred Reporting Items Systematic Reviews Meta-Analysis (or PRISMA) guidelines were incorporated effective article search investigate several relevant scientific publications.The systematic review was carried out on highlighted advantages, limitations strategies.The inherent benefit domain powered artificial intelligence significant impact performance disease diagnosis frameworks.

Language: Английский

Citations

8

Text Classification Using Deep Learning Models: A Comparative Review DOI Open Access
Muhammad Zulqarnain,

Rubab Sheikh,

Shahid M. Hussain

et al.

Cloud Computing and Data Science, Journal Year: 2023, Volume and Issue: unknown, P. 80 - 96

Published: Oct. 16, 2023

With the fast popularization and continued development of web pages on Internet, text classification has become a very serious problem in organizing managing large amounts digital data documents. The deep learning approaches have been applied several areas with comparative outstanding results. In this article, we analyzed gave comprehensive reviews different models for tasks. Based literature review survey, paper addresses three various declares their gaps limitations. We evaluated applications small discussion available Deep Neural Networks (DNN) frameworks implementation datasets. work presents guidance future research to regulate more significance that can be distributed better area research. summary, our study presented main implications, identified potential directions research, highlighted challenges within specific field. Additionally, aim is acquaint readers subtasks relevant related process. By engaging discussion, aspire inspire explore novel enhanced techniques classification, applicable across diverse domains.

Language: Английский

Citations

4

Research Paper Classification Using Machine and Deep Learning Techniques DOI
Joann Galopo Perez, Melvin A. Ballera

Published: Feb. 23, 2024

Language: Английский

Citations

0