Active Learning for News Article’s Authorship Identification DOI Creative Commons
Sidra Abbas, Shtwai Alsubai, Gabriel Avelino Sampedro

et al.

IEEE Access, Journal Year: 2023, Volume and Issue: 11, P. 98415 - 98426

Published: Jan. 1, 2023

Over time, the amount of textual data has increased drastically, especially due to publication articles. As a consequence, there been rise in anonymous content. Research is being conducted determine alternative methods for identifying unknown text authors. To this end, system be developed accurately author texts, given group writing samples. Active Learning utilized study because it iteratively selects most informative samples include training set, which enables more precise and accurate authorship identification approach with fewer examples. Makes useful analyzing rising content This proposes novel that utilizes active learning (AL) based machine deep models, namely Logistic Regression (AL-LR), Random Forest (AL-RF), XGboost (AL-XGB), Multilayer Perceptron (AL-MLP) identification. The proposed extracts valuable characteristics writer using Term Frequency-Inverse Document Frequency (TF-IDF). study's selected comprehensive dataset, "All news," divided into three subsets: Article 1, 2, 3. We have restricted dataset's scope top 50 authors our experimentation. experimental outcomes reveal AL-XGB model achieves superior performance on 1 news" dataset. Further, AL-LR permed well AL-MLP performed results suggest

Language: Английский

A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects DOI Creative Commons
Ibomoiye Domor Mienye, Yanxia Sun

IEEE Access, Journal Year: 2022, Volume and Issue: 10, P. 99129 - 99149

Published: Jan. 1, 2022

Ensemble learning techniques have achieved state-of-the-art performance in diverse machine applications by combining the predictions from two or more base models. This paper presents a concise overview of ensemble learning, covering three main methods: bagging, boosting, and stacking, their early development to recent algorithms. The study focuses on widely used algorithms, including random forest, adaptive boosting (AdaBoost), gradient extreme (XGBoost), light (LightGBM), categorical (CatBoost). An attempt is made concisely cover mathematical algorithmic representations, which lacking existing literature would be beneficial researchers practitioners.

Language: Английский

Citations

513

Deepfake Audio Detection via MFCC Features Using Machine Learning DOI
Ameer Hamza, Abdul Rehman Javed, Farkhund Iqbal

et al.

IEEE Access, Journal Year: 2022, Volume and Issue: 10, P. 134018 - 134028

Published: Jan. 1, 2022

Deepfake content is created or altered synthetically using artificial intelligence (AI) approaches to appear real. It can include synthesizing audio, video, images, and text. Deepfakes may now produce natural-looking content, making them harder identify. Much progress has been achieved in identifying video deepfakes recent years; nevertheless, most investigations detecting audio have employed the ASVSpoof AVSpoof dataset various machine learning, deep learning algorithms. This research uses learning-based identify deepfake audio. Mel-frequency cepstral coefficients (MFCCs) technique used acquire useful information from We choose Fake-or-Real dataset, which benchmark dataset. The was with a text-to-speech model divided into four sub-datasets: for-rece, for-2-sec, for-norm for-original. These datasets are classified sub-datasets mentioned above according length bit rate. experimental results show that support vector (SVM) outperformed other (ML) models terms of accuracy on for-rece for-2-sec datasets, while gradient boosting performed very well VGG-16 produced highly encouraging when applied for-original outperforms state-of-the-art approaches.

Language: Английский

Citations

86

Prediction and explanation of debris flow velocity based on multi-strategy fusion Stacking ensemble learning model DOI
Tianlong Wang, Keying Zhang, Zhenghua Liu

et al.

Journal of Hydrology, Journal Year: 2024, Volume and Issue: 638, P. 131347 - 131347

Published: June 12, 2024

Language: Английский

Citations

11

LSTM‐DGWO‐Based Sentiment Analysis Framework for Analyzing Online Customer Reviews DOI Creative Commons
Kousik Barik, Sanjay Misra,

Ajoy Kumar Ray

et al.

Computational Intelligence and Neuroscience, Journal Year: 2023, Volume and Issue: 2023(1)

Published: Jan. 1, 2023

Sentiment analysis furnishes consumer concerns regarding products, enabling product enhancement development. Existing sentiment using machine learning techniques is computationally intensive and less reliable. Deep in approaches such as long short term memory has adequately evolved, the selection of optimal hyperparameters a significant issue. This study combines LSTM with differential grey wolf optimization (LSTM-DGWO) deep model. The app review dataset processed bidirectional encoder representations from transformers (BERT) framework for efficient word embeddings. Then, features are extracted by genetic algorithm (GA), feature set firefly (FA). Finally, LSTM-DGWO model categorizes reviews, DGWO optimizes proposed outperformed conventional methods greater accuracy 98.89%. findings demonstrate that can be practically applied to understand customer's perception enhancing products business perspective.

Language: Английский

Citations

20

Understanding writing style in social media with a supervised contrastively pre-trained transformer DOI Creative Commons
Javier Huertas‐Tato, Alejandro Martín, David Camacho

et al.

Knowledge-Based Systems, Journal Year: 2024, Volume and Issue: 296, P. 111867 - 111867

Published: April 29, 2024

We introduce the Style Transformer for Authorship Representations (STAR) to detect and characterize writing style in social media. The model is trained on a heterogeneous large corpus derived from public sources with 4.5⋅106 authored texts 70k authors leveraging Supervised Contrastive Loss minimize distance between by same individual. This pretext pre-training task yields competitive performance at zero-shot PAN challenges attribution clustering. attain promising results verification using STAR as feature extractor. Finally, we present our test partition Reddit, where support base of 8 documents 512 tokens, can discern sets up 1616 least 80% accuracy. share pre-trained huggingface AIDA-UPM/star code available jahuerta92/star.

Language: Английский

Citations

4

An ensemble learning model for predicting the intention to quit among employees using classification algorithms DOI Creative Commons
A. K. Biswas,

R. Seethalakshmi,

Prabha Mariappan

et al.

Decision Analytics Journal, Journal Year: 2023, Volume and Issue: 9, P. 100335 - 100335

Published: Oct. 5, 2023

Employees are often more likely to use social media for job searching, which sometimes causes withdrawal behaviour. This study proposes an ensemble learning model predicting the intention quit (IQ) based on selected features, such as Involvement (JI), organizational commitment (OC), activities professional networking sites (APNS), and updating profiles portals (PJP). The Receiver Operator Curve (ROC) examines model's accuracy. We show best relationship predict is between one's media. Seven classification algorithms of Gradient Boosting, Random Forest, K-Nearest Neighbour, Logistic Regression, Neural Network, Support Vector Machine, Naïve Bayes used build model. In addition, four combinations above-mentioned methods construct performance comparison indicates that combination Neighbour produced quit. study's contribution incorporates stimulus organism response theory through information gain, emphasizing features. Based these tool utilized identify those who intend resign do not.

Language: Английский

Citations

10

Ensemble-learning for pressure prediction in vacuum circuit breaker using feature fusion of laser-induced plasma spectra and images DOI
Kexiang Wei, Jianbin Pan, Huan Yuan

et al.

Spectrochimica Acta Part B Atomic Spectroscopy, Journal Year: 2025, Volume and Issue: unknown, P. 107137 - 107137

Published: Jan. 1, 2025

Language: Английский

Citations

0

Building intelligence identification system via large language model watermarking: a survey and beyond DOI Creative Commons

Xuhong Wang,

Haoyu Jiang, Yi Yu

et al.

Artificial Intelligence Review, Journal Year: 2025, Volume and Issue: 58(8)

Published: May 15, 2025

Language: Английский

Citations

0

An efficient approach for textual data classification using deep learning DOI Creative Commons
Abdullah Alqahtani, Habib Ullah Khan, Shtwai Alsubai

et al.

Frontiers in Computational Neuroscience, Journal Year: 2022, Volume and Issue: 16

Published: Sept. 15, 2022

Text categorization is an effective activity that can be accomplished using a variety of classification algorithms. In machine learning, the classifier built by learning features categories from set preset training data. Similarly, deep offers enormous benefits for text since they execute highly accurately with lower-level engineering and processing. This paper employs techniques to classify textual Textual data contains much useless information must pre-processed. We clean data, impute missing values, eliminate repeated columns. Next, we employ algorithms: logistic regression, random forest, K-nearest neighbors (KNN), long short-term memory (LSTM), artificial neural network (ANN), gated recurrent unit (GRU) classification. Results reveal LSTM achieves 92% accuracy outperforming all other model baseline studies.

Language: Английский

Citations

13

A Movie Recommender System Based on User Profile and Artificial Bee Colony Optimization DOI Creative Commons
Faezeh Rajabi Kouchi,

Sahar Oftadeh Balani,

Amirhossein Esmaeilpour

et al.

Computational Intelligence and Neuroscience, Journal Year: 2023, Volume and Issue: 2023(1)

Published: Jan. 1, 2023

In this study, a new algorithm for recommending movies to viewers has been proposed. To do this, the suggested method employs data mining techniques. The proposed includes three steps generating recommendations: "preprocessing of user profile information," "feature extraction," and "recommendation." first step method, information will be examined transformed into form that can handled in next phases. second attributes are then extracted as collection their individual qualities, well average rating each various genres. bee colony optimization is used select optimal features. Finally, third ratings similar users utilized offer target user, similarities between determined using characteristics calculated them, Euclidean distance criteria. was evaluated MovieLens database, its output assessed terms precision recall criteria; these results show increase by an 1.39% 0.8% compared algorithms.

Language: Английский

Citations

7