Intelligent visual question answering in TCM education: An innovative application of IoT and multimodal fusion
Wei Bi,
No information about this author
Qingyu Xiong,
No information about this author
Xingyi Chen
No information about this author
et al.
Alexandria Engineering Journal,
Journal Year:
2025,
Volume and Issue:
118, P. 325 - 336
Published: Jan. 23, 2025
Language: Английский
Figurative-cum-Commonsense Knowledge Infusion for Multimodal Mental Health Meme Classification
Published: April 22, 2025
Language: Английский
DecodEM-X: advancing multimodal meme moderation with robust AI frameworks
Knowledge and Information Systems,
Journal Year:
2025,
Volume and Issue:
unknown
Published: May 13, 2025
Language: Английский
Multimodal Hateful Meme Classification Based on Transfer Learning and a Cross-Mask Mechanism
Fan Wu,
No information about this author
Chen Guolian,
No information about this author
Junkuo Cao
No information about this author
et al.
Electronics,
Journal Year:
2024,
Volume and Issue:
13(14), P. 2780 - 2780
Published: July 15, 2024
Hateful
memes
are
malicious
and
biased
sentiment
information
widely
spread
on
the
internet.
Detecting
hateful
differs
from
traditional
multimodal
tasks
because,
in
conventional
tasks,
visual
textual
align
semantically.
However,
challenge
detecting
lies
their
unique
nature,
where
images
text
may
be
weak
or
unrelated,
requiring
models
to
understand
content
perform
reasoning.
To
address
this
issue,
we
introduce
a
fine-grained
detection
model
named
“TCAM”.
The
leverages
advanced
encoding
techniques
TweetEval
CLIP
introduces
enhanced
Cross-Attention
Cross-Mask
Mechanisms
(CAM)
feature
fusion
stage
improve
correlations.
It
effectively
embeds
features
of
data
image
descriptions
into
through
transfer
learning.
This
paper
uses
Area
Under
Receiver
Operating
Characteristic
Curve
(AUROC)
as
primary
metric
evaluate
model’s
discriminatory
ability.
approach
achieved
an
AUROC
score
0.8362
accuracy
0.764
Facebook
Memes
Challenge
(FHMC)
dataset,
confirming
its
high
capability.
TCAM
demonstrates
relatively
superior
performance
compared
ensemble
machine
learning
methods.
Language: Английский
Enhancing Multimodal Understanding With LIUS
Chunlai Song
No information about this author
Journal of Organizational and End User Computing,
Journal Year:
2024,
Volume and Issue:
36(1), P. 1 - 17
Published: Jan. 12, 2024
VQA
(visual
question
and
answer)
is
the
task
of
enabling
a
computer
to
generate
accurate
textual
answers
based
on
given
images
related
questions.
It
integrates
vision
natural
language
processing
requires
model
that
able
understand
not
only
image
content
but
also
in
order
appropriate
linguistic
answers.
However,
current
limitations
cross-modal
understanding
often
result
models
struggle
accurately
capture
complex
relationships
between
questions,
leading
inaccurate
or
ambiguous
This
research
aims
address
this
challenge
through
multifaceted
approach
combines
strengths
processing.
By
introducing
innovative
LIUS
framework,
specialized
module
was
built
process
information
fuse
features
using
multiple
scales.
The
insights
gained
from
are
integrated
with
“reasoning
module”
(LLM)
Language: Английский
Flexible margins and multiple samples learning to enhance lexical semantic similarity
Engineering Applications of Artificial Intelligence,
Journal Year:
2024,
Volume and Issue:
133, P. 108275 - 108275
Published: March 18, 2024
Language: Английский
A Hybrid Deep BiLSTM-CNN for Hate Speech Detection in Multi-social media
ACM Transactions on Asian and Low-Resource Language Information Processing,
Journal Year:
2024,
Volume and Issue:
23(8), P. 1 - 22
Published: May 6, 2024
Nowadays,
means
of
communication
among
people
have
changed
due
to
advancements
in
information
technology
and
the
rise
online
multi-social
media.
Many
express
their
feelings,
ideas,
emotions
on
social
media
sites
such
as
Instagram,
Twitter,
Gab,
Reddit,
Facebook,
YouTube.
However,
misused
send
hateful
messages
specific
individuals
or
groups
create
chaos.
For
various
governance
authorities,
manually
identifying
hate
speech
platforms
is
a
difficult
task
avoid
In
this
study,
hybrid
deep-learning
model,
where
bidirectional
long
short-term
memory
(BiLSTM)
convolutional
neural
network
(CNN)
are
used
classify
textual
data,
proposed.
This
model
incorporates
GLOVE-based
word
embedding
approach,
dropout,
L2
regularization,
global
max
pooling
get
impressive
results.
Further,
proposed
BiLSTM-CNN
has
been
evaluated
datasets
achieve
state-of-the-art
performance
that
superior
traditional
existing
machine
learning
methods
terms
accuracy,
precision,
recall,
F1-score.
Language: Английский