In
this
research
paper,
we
undertake
a
comprehensive
examination
of
several
pivotal
factors
that
impact
the
performance
Arabic
Disinformation
Detection
in
ArAIEval’2023
shared
task.
Our
exploration
encompasses
influence
surface
preprocessing,
morphological
FastText
vector
model,
and
weighted
fusion
TF-IDF
features.
To
carry
out
classification
tasks,
employ
Linear
Support
Vector
Classification
(LSVC)
model.
evaluation
phase,
our
system
showcases
significant
results,
achieving
an
F1
micro
score
76.70%
50.46%
for
binary
multiple
scenarios,
respectively.
These
accomplishments
closely
correspond
to
average
scores
achieved
by
other
systems
submitted
second
subtask,
standing
at
77.96%
64.85%
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022),
Год журнала:
2023,
Номер
unknown
Опубликована: Янв. 1, 2023
We
describe
SemEval-2023
task
3
on
Detecting
the
Category,
Framing,
and
Persuasion
Techniques
in
Online
News
a
Multilingual
Setup:
dataset,
organization
process,
evaluation
setup,
results,
participating
systems.
The
focused
news
articles
nine
languages
(six
known
to
participants
upfront:
English,
French,
German,
Italian,
Polish,
Russian),
three
additional
ones
revealed
at
testing
phase:
Spanish,
Greek,
Georgian).
featured
subtasks:
(1)
determining
genre
of
article
(opinion,
reporting,
or
satire),
(2)
identifying
one
more
frames
used
an
from
pool
14
generic
frames,
(3)
identify
persuasion
techniques
each
paragraph
article,
using
taxonomy
23
techniques.
This
was
very
popular
task:
total
181
teams
registered
participate,
41
eventually
made
official
submission
test
set.
Scientific Reports,
Год журнала:
2025,
Номер
15(1)
Опубликована: Янв. 13, 2025
During
the
Covid-19
pandemic,
widespread
use
of
social
media
platforms
has
facilitated
dissemination
information,
fake
news,
and
propaganda,
serving
as
a
vital
source
self-reported
symptoms
related
to
Covid-19.
Existing
graph-based
models,
such
Graph
Neural
Networks
(GNNs),
have
achieved
notable
success
in
Natural
Language
Processing
(NLP).
However,
utilizing
GNN-based
models
for
propaganda
detection
remains
challenging
because
challenges
mining
distinct
word
interactions
storing
nonconsecutive
broad
contextual
data.
In
this
study,
we
propose
Hierarchical
Graph-based
Integration
Network
(H-GIN)
designed
detecting
text
within
defined
domain
using
multilabel
classification.
H-GIN
is
extracted
build
bi-layer
graph
inter-intra-channel,
Residual-driven
Enhancement
(RDEP)
Attention-driven
Multichannel
feature
Fusing
(ADMF)
with
suitable
labels
at
two
classification
levels.
First,
RDEP
procedures
facilitate
information
between
distant
nodes.
Second,
by
employing
these
guidelines,
ADMF
standardizes
Tri-Channels
3-S
(sequence,
semantic,
syntactic)
layer,
enabling
effective
through
unrelated
propagation
news
representations
into
classifier
from
existing
ProText,
Qprop,
PTC
datasets,
thereby
ensuring
its
availability
public.
The
model
demonstrated
exceptional
performance,
achieving
an
impressive
82%
accuracy
surpassing
current
leading
models.
Notably,
model's
capacity
identify
previously
unseen
examples
across
diverse
openness
scenarios
ProText
dataset
was
particularly
significant.
We
present
an
overview
of
the
ArAIEval
shared
task,
organized
as
part
first
ArabicNLP
2023
conference
co-located
with
EMNLP
2023.
offers
two
tasks
over
Arabic
text:
(1)
persuasion
technique
detection,
focusing
on
identifying
techniques
in
tweets
and
news
articles,
(2)
disinformation
detection
binary
multiclass
setups
tweets.
A
total
20
teams
participated
final
evaluation
phase,
14
16
participating
Task
1
2,
respectively.
Across
both
tasks,
we
observe
that
fine-tuning
transformer
models
such
AraBERT
is
core
majority
systems.
provide
a
description
task
setup,
including
datasets
construction
setup.
also
brief
All
scripts
from
are
released
to
research
community.
hope
this
will
enable
further
important
within
NLP
IEEE Access,
Год журнала:
2023,
Номер
11, С. 132516 - 132531
Опубликована: Янв. 1, 2023
The
performance
of
learning
models
heavily
relies
on
the
availability
and
adequacy
training
data.
To
address
dataset
issue,
researchers
have
extensively
explored
data
augmentation
(DA)
as
a
promising
approach.
DA
generates
new
instances
through
transformations
applied
to
available
data,
thereby
increasing
size
variability.
This
approach
has
enhanced
model
accuracy,
particularly
in
addressing
class
imbalance
problems
classification
tasks.
However,
few
studies
for
Arabic
language,
relying
traditional
approaches
such
paraphrasing
or
noising-based
techniques.
In
this
paper,
we
propose
method
that
employs
recent
powerful
modeling
technique,
namely
AraGPT-2,
process.
generated
sentences
are
evaluated
terms
context,
semantics,
diversity,
novelty
using
Euclidean,
cosine,
Jaccard,
BLEU
distances.
Finally,
AraBERT
transformer
is
used
sentiment
tasks
evaluate
augmented
dataset.
experiments
were
conducted
four
datasets:
AraSarcasm,
ASTD,
ATT,
MOVIE.
selected
datasets
vary
size,
label
number,
unbalanced
classes.
results
show
proposed
methodology
text
all
with
an
increase
F1
score
by
7%
8%
11%
13%
Jakub
Piskorski,
Nicolas
Stefanovitch,
Nikolaos
Nikolaidis,
Giovanni
Da
San
Martino,
Preslav
Nakov.
Proceedings
of
the
61st
Annual
Meeting
Association
for
Computational
Linguistics
(Volume
1:
Long
Papers).
2023.
Abstract
The
detection
of
toxic
language
in
the
Arabic
has
emerged
as
an
active
area
research
recent
years,
and
reviewing
existing
datasets
employed
for
training
developed
solutions
become
a
pressing
need.
This
paper
offers
comprehensive
survey
focused
on
online
language.
We
systematically
gathered
total
54
available
their
corresponding
papers
conducted
thorough
analysis,
considering
18
criteria
across
four
primary
dimensions:
availability
details,
content,
annotation
process,
reusability.
analysis
enabled
us
to
identify
gaps
make
recommendations
future
works.
For
convenience
community,
list
analysed
is
maintained
GitHub
repository.