In
academic
institutions,
commercial
enterprises,
research
centers,
technology-heavy
businesses,
and
government
funding
agencies,
maintaining
consistent
data
is
a
major
difficulty.
For
an
entity,
which
might
be
anything
from
object
to
place
or
thing,
most
are
irregular.
These
days,
identify
significant
patterns
that
represent
the
data,
entity
links
in
dataset
investigated
by
text
mining
analytics.
With
this
knowledge,
alternatives
then
taken.
Analytics
creates
finds
turning
words
into
numbers.
end,
better
organization
results
conclusions.
However,
classifying
processing
each
piece
of
hand
difficult.
As
result,
domain
Natural
Language
Processing
(NLP),
looks
at
grammatical
lexical
patterns,
intelligent
systems
have
emerged.
Before
mining,
it's
imperative
examine
comprehend
nature
data.
Text
categorization
requires
automation
because
increasing
volume
requirement
for
accuracy
precision.
It
interesting
study
opportunity
develop
automatic
texts
with
deep
learning
methods
handle
difficult
NLP
tasks
semantic
constraints.
founded
on
analytics,
can
facilitate
information
discovery.
The
majority
advantages
obtained
applying
these
insights
emerging
applications
support
decision-making,
improve
resources.
Improved
techniques
parameter
optimization
demonstrating
effective
knowledge
discovery
will
focus
future
studies.
Scientific Reports,
Journal Year:
2024,
Volume and Issue:
14(1)
Published: May 16, 2024
Automated
disease
diagnosis
and
prediction,
powered
by
AI,
play
a
crucial
role
in
enabling
medical
professionals
to
deliver
effective
care
patients.
While
such
predictive
tools
have
been
extensively
explored
resource-rich
languages
like
English,
this
manuscript
focuses
on
predicting
categories
automatically
from
symptoms
documented
the
Afaan
Oromo
language,
employing
various
classification
algorithms.
This
study
encompasses
machine
learning
techniques
as
support
vector
machines,
random
forests,
logistic
regression,
Naïve
Bayes,
well
deep
approaches
including
LSTM,
GRU,
Bi-LSTM.
Due
unavailability
of
standard
corpus,
we
prepared
three
data
sets
with
different
numbers
patient
arranged
into
10
categories.
The
two
feature
representations,
TF-IDF
word
embedding,
were
employed.
performance
proposed
methodology
has
evaluated
using
accuracy,
recall,
precision,
F1
score.
experimental
results
show
that,
among
models,
SVM
model
had
highest
accuracy
score
94.7%,
while
LSTM
word2vec
embedding
showed
an
rate
95.7%
96.0%
models.
To
enhance
optimal
each
model,
several
hyper-parameter
tuning
settings
used.
shows
that
verifies
be
best
all
other
models
over
entire
dataset.
PLoS ONE,
Journal Year:
2025,
Volume and Issue:
20(3), P. e0309862 - e0309862
Published: March 24, 2025
Citations
in
scientific
literature
act
as
channels
for
the
sharing,
transfer,
and
development
of
knowledge.
However,
not
all
citations
hold
same
significance.
Numerous
taxonomies
machine
learning
models
have
been
developed
to
analyze
citations,
but
they
often
overlook
internal
context
these
citations.
Moreover,
it
is
worth
noting
that
selecting
appropriate
word
embedding
classification
crucial
achieving
superior
results.
Word
embeddings
offer
n-dimensional
distributed
representations
text,
striving
capture
nuanced
meanings
words.
Deep
learning-based
techniques
garnered
significant
attention
found
application
various
Natural
Language
Processing
(NLP)
tasks,
including
text
classification,
sentiment
analysis,
citation
analysis.
Current
state-of-the-art
use
small
datasets
with
fixed
window
sizes,
resulting
loss
contextual
meaning.
This
study
leverages
two
benchmark
encompassing
a
substantial
volume
in-text
guide
selection
an
optimal
size
approaches.
A
comparative
analysis
sizes
conducted
identify
effectively.
Additionally,
Word2Vec
employed
conjunction
deep
such
Convolutional
Neural
Networks
(CNNs),
Gated
Recurrent
Units
(GRUs),
Long
Short-Term
Memory
(LSTM)
networks,
Support
Vector
Machines
(SVM),
Decision
Trees,
Naive
Bayes.The
evaluation
employs
precision,
recall,
F1-score,
accuracy
metrics
each
combination
sizes.
The
findings
reveal
that,
particularly
lengthy
larger
windows
are
more
adept
at
capturing
semantic
essence
references.
Within
scope
this
study,
10
achieve
precision
both
models.
Information,
Journal Year:
2025,
Volume and Issue:
16(6), P. 424 - 424
Published: May 22, 2025
Text
classification
remains
a
challenging
task
in
natural
language
processing
(NLP)
due
to
linguistic
complexity
and
data
imbalance.
This
study
proposes
hybrid
approach
that
integrates
grammar-based
feature
engineering
with
deep
learning
transformer
models
enhance
performance.
A
dataset
of
factoid
non-factoid
questions,
further
categorized
into
causal,
choice,
confirmation,
hypothetical,
list
types,
is
used
evaluate
several
models,
including
CNNs,
BiLSTMs,
MLPs,
BERT,
DistilBERT,
Electra,
GPT-2.
Grammatical
domain-specific
features
are
explicitly
extracted
leveraged
improve
multi-class
classification.
To
address
class
imbalance,
the
SMOTE
algorithm
applied,
significantly
boosting
recall
F1-score
for
minority
classes.
Experimental
results
show
DistilBERT
achieves
highest
binary
accuracy,
equal
94%,
while
BiLSTM
CNN
outperform
transformers
settings,
reaching
up
92%
accuracy.
These
findings
confirm
provide
critical
syntactic
semantic
insights,
enhancing
model
robustness
interpretability
beyond
conventional
embeddings.
Deleted Journal,
Journal Year:
2024,
Volume and Issue:
6(11)
Published: Oct. 28, 2024
Automatic
medical
document
classification
using
machine
learning
techniques
can
enhance
the
productivity
of
healthcare
services
by
reducing
processing
time
and
cost.
This
work
proposes
an
ensemble
approach
to
develop
a
model
that
classifies
electronic
documents
in
Afaan
Oromo.
The
main
tasks
this
are
preparing
corpus,
pre-processing,
training
models,
process.
We
used
term
frequency-inverse
frequency
(TF-IDF)
bag
words
(BOW)
feature
extraction
methods.
An
technique
is
it
creates
multiple
individual
classifier
predictions
from
naïve
Bayes,
random
forest,
SVM,
logistic
regression
then
combines
them
advance
reliable
more
accurate
classifier.
Evaluation
measures
were
employed
accuracy,
F1-score,
recall,
precision
for
performance
comparison.
efficiency
proposed
method
compared
with
two
existing
boosting
approaches,
namely
gradient
adaboost.
experimental
result
shows
BOW
over
TF-IDF
on
our
dataset.
These
results
also
illustrated
effectiveness
scoring
94.81%
accuracy
94.84%
F1-score.
significantly
contributes
technological
enhancement
service
delivery,
managing
through
methods,
advancing
data
systems
sectors.
ACM Transactions on Asian and Low-Resource Language Information Processing,
Journal Year:
2023,
Volume and Issue:
23(8), P. 1 - 19
Published: Sept. 16, 2023
Multimodal
hateful
social
media
meme
detection
is
an
important
and
challenging
problem
in
the
vision-language
domain.
Recent
studies
show
high
accuracy
for
such
multimodal
tasks
due
to
datasets
that
provide
better
joint
embedding
narrow
semantic
gap.
Religiously
not
extensively
explored
among
published
datasets.
While
there
a
need
higher
on
religiously
memes,
deep
learning–based
models
often
suffer
from
inductive
bias.
This
issue
addressed
this
work
with
following
contributions.
First,
memes
dataset
created
publicly
advance
religious
research.
Over
2000
images
are
collected
their
corresponding
text.
The
proposed
approach
compares
fine-tunes
VisualBERT
pre-trained
Conceptual
Caption
(CC)
downstream
classification
task.
We
also
extend
Facebook
dataset.
extract
visual
features
using
ResNeXT-152
Aggregated
Residual
Transformations–based
Masked
Regions
Convolutional
Neural
Networks
(R-CNN)
Bidirectional
Encoder
Representations
Transformers
(BERT)
uncased
textual
encoding
early
fusion
model.
use
primary
evaluation
metric
of
Area
Under
Operator
Characters
Curve
(AUROC)
measure
model
separability.
Results
has
AUROC
score
78%,
proving
model’s
separability
performance
70%.
It
shows
comparatively
superior
considering
size
against
ensemble-based
machine
learning
approaches.
ACM Transactions on Asian and Low-Resource Language Information Processing,
Journal Year:
2023,
Volume and Issue:
unknown
Published: June 28, 2023
research-article
Free
Access
Share
on
Transfer
Learning-based
Forensic
Analysis
and
Classification
of
E-Mail
ContentJust
Accepted
Authors:
Farkhund
Iqbal
College
Technological
Innovation
Zayed
University,
UAE
UAEView
Profile
,
Abdul
Rehman
Javed
Department
Electrical
Computer
Engineering
Lebanese
American
Lebanon
LebanonView
Rutvij
H.
Jhaveri
Science
Engineering,
School
Technology
Pandit
Deendayal
Energy
India
IndiaView
Ahmad
Almadhor
Networks,
Information
Sciences
Jouf
Saudi
Arabia
ArabiaView
Umar
Farooq
National
University
Emerging
Sciences,
Pakistan
PakistanView
Authors
Info
&
Claims
ACM
Transactions
Asian
Low-Resource
Language
ProcessingAccepted
June
2023https://doi.org/10.1145/3604592Published:28
2023Publication
History
0citation78DownloadsMetricsTotal
Citations0Total
Downloads78Last
12
Months78Last
6
weeks78
Get
Citation
AlertsNew
Alert
added!This
alert
has
been
successfully
added
will
be
sent
to:You
notified
whenever
a
record
that
you
have
chosen
cited.To
manage
your
preferences,
click
the
button
below.Manage
my
Alert!Please
log
in
to
account
Save
BinderSave
BinderCreate
New
BinderNameCancelCreateExport
CitationPublisher
SiteeReaderPDF
Acta Informatica Pragensia,
Journal Year:
2022,
Volume and Issue:
11(3), P. 423 - 457
Published: Dec. 26, 2022
Over
the
past
few
decades,
enormous
expansion
of
medical
data
has
led
to
searching
for
ways
analysis
in
smart
healthcare
systems.Acquisition
from
pictures,
archives,
communication
systems,
electronic
health
records,
online
documents,
radiology
reports
and
clinical
records
different
styles
with
specific
numerical
information
given
rise
concept
multimodality
need
machine
learning
deep
techniques
system.Medical
play
a
vital
role
education
diagnosis;
determining
dependency
between
distinct
modalities
is
essential.This
paper
gives
gist
current
their
various
approaches
frameworks
representation
classification.A
brief
outline
existing
multimodal
processing
work
presented.The
main
objective
this
study
spot
gaps
surveyed
area
list
future
tasks
challenges
radiology.The
Preferred
Reporting
Items
Systematic
Reviews
Meta-Analysis
(or
PRISMA)
guidelines
were
incorporated
effective
article
search
investigate
several
relevant
scientific
publications.The
systematic
review
was
carried
out
on
highlighted
advantages,
limitations
strategies.The
inherent
benefit
domain
powered
artificial
intelligence
significant
impact
performance
disease
diagnosis
frameworks.
Cloud Computing and Data Science,
Journal Year:
2023,
Volume and Issue:
unknown, P. 80 - 96
Published: Oct. 16, 2023
With
the
fast
popularization
and
continued
development
of
web
pages
on
Internet,
text
classification
has
become
a
very
serious
problem
in
organizing
managing
large
amounts
digital
data
documents.
The
deep
learning
approaches
have
been
applied
several
areas
with
comparative
outstanding
results.
In
this
article,
we
analyzed
gave
comprehensive
reviews
different
models
for
tasks.
Based
literature
review
survey,
paper
addresses
three
various
declares
their
gaps
limitations.
We
evaluated
applications
small
discussion
available
Deep
Neural
Networks
(DNN)
frameworks
implementation
datasets.
work
presents
guidance
future
research
to
regulate
more
significance
that
can
be
distributed
better
area
research.
summary,
our
study
presented
main
implications,
identified
potential
directions
research,
highlighted
challenges
within
specific
field.
Additionally,
aim
is
acquaint
readers
subtasks
relevant
related
process.
By
engaging
discussion,
aspire
inspire
explore
novel
enhanced
techniques
classification,
applicable
across
diverse
domains.