Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,
Journal Year:
2022,
Volume and Issue:
unknown
Published: Jan. 1, 2022
Multimodal
sentiment
analysis
(MSA)
and
emotion
recognition
in
conversation
(ERC)
are
key
research
topics
for
computers
to
understand
human
behaviors.
From
a
psychological
perspective,
emotions
the
expression
of
affect
or
feelings
during
short
period,
while
sentiments
formed
held
longer
period.
However,
most
existing
works
study
separately
do
not
fully
exploit
complementary
knowledge
behind
two.
In
this
paper,
we
propose
multimodal
knowledge-sharing
framework
(UniMSE)
that
unifies
MSA
ERC
tasks
from
features,
labels,
models.
We
perform
modality
fusion
at
syntactic
semantic
levels
introduce
contrastive
learning
between
modalities
samples
better
capture
difference
consistency
emotions.
Experiments
on
four
public
benchmark
datasets,
MOSI,
MOSEI,
MELD,
IEMOCAP,
demonstrate
effectiveness
proposed
method
achieve
consistent
improvements
compared
with
state-of-the-art
methods.
IEEE Geoscience and Remote Sensing Magazine,
Journal Year:
2021,
Volume and Issue:
9(2), P. 52 - 87
Published: April 6, 2021
Hyperspectral
(HS)
imaging,
also
known
as
image
spectrometry,
is
a
landmark
technique
in
geoscience
and
remote
sensing
(RS).
In
the
past
decade,
enormous
efforts
have
been
made
to
process
analyze
these
HS
products,
mainly
by
seasoned
experts.
However,
with
an
ever-growing
volume
of
data,
bulk
costs
manpower
material
resources
poses
new
challenges
for
reducing
burden
manual
labor
improving
efficiency.
For
this
reason,
it
urgent
that
more
intelligent
automatic
approaches
various
RS
applications
be
developed.
Machine
learning
(ML)
tools
convex
optimization
successfully
undertaken
tasks
numerous
artificial
intelligence
(AI)-related
applications;
however,
their
ability
handle
complex
practical
problems
remains
limited,
particularly
due
effects
spectral
variabilities
imaging
complexity
redundancy
higher-dimensional
signals.
Compared
models,
nonconvex
modeling,
which
capable
characterizing
real
scenes
providing
model
interpretability
technically
theoretically,
has
proven
feasible
solution
reduces
gap
between
challenging
vision
currently
advanced
data
processing
models.
IEEE/ACM Transactions on Audio Speech and Language Processing,
Journal Year:
2021,
Volume and Issue:
29, P. 1368 - 1396
Published: Jan. 1, 2021
Speech
enhancement
and
speech
separation
are
two
related
tasks,
whose
purpose
is
to
extract
either
one
or
more
target
signals,
respectively,
from
a
mixture
of
sounds
generated
by
several
sources.
Traditionally,
these
tasks
have
been
tackled
using
signal
processing
machine
learning
techniques
applied
the
available
acoustic
signals.
Since
visual
aspect
essentially
unaffected
environment,
information
speakers,
such
as
lip
movements
facial
expressions,
has
also
used
for
systems.
In
order
efficiently
fuse
information,
researchers
exploited
flexibility
data-driven
approaches,
specifically
deep
learning,
achieving
strong
performance.
The
ceaseless
proposal
large
number
features
multimodal
highlighted
need
an
overview
that
comprehensively
describes
discusses
audio-visual
based
on
learning.
this
paper,
we
provide
systematic
survey
research
topic,
focusing
main
elements
characterise
systems
in
literature:
features;
methods;
fusion
techniques;
training
targets
objective
functions.
addition,
review
deep-learning-based
methods
reconstruction
silent
videos
sound
source
non-speech
since
can
be
less
directly
separation.
Finally,
commonly
employed
datasets,
given
their
central
role
development
evaluation
methods,
because
they
generally
compare
different
determine
IEEE Transactions on Knowledge and Data Engineering,
Journal Year:
2016,
Volume and Issue:
28(8), P. 2027 - 2040
Published: April 14, 2016
Domain
adaptation
generalizes
a
learning
model
across
source
domain
and
target
that
are
sampled
from
different
distributions.
It
is
widely
applied
to
cross-domain
data
mining
for
reusing
labeled
information
mitigating
labeling
consumption.
Recent
studies
reveal
deep
neural
networks
can
learn
abstract
feature
representation,
which
reduce,
but
not
remove,
the
discrepancy.
To
enhance
invariance
of
representation
make
it
more
transferable
domains,
we
propose
unified
framework
jointly
classifier
enable
scalable
adaptation,
by
taking
advantages
both
optimal
two-sample
matching.
The
constitutes
two
inter-dependent
paradigms,
unsupervised
pre-training
effective
training
models
using
denoising
autoencoders,
supervised
fine-tuning
exploitation
discriminative
networks,
learned
embedding
representations
reproducing
kernel
Hilbert
spaces
(RKHSs)
optimally
matching
learning,
develop
linear-time
algorithm
unbiased
estimate
scales
linearly
large
samples.
Extensive
empirical
results
show
proposed
significantly
outperforms
state
art
methods
on
diverse
tasks:
sentiment
polarity
prediction,
email
spam
filtering,
newsgroup
content
categorization,
visual
object
recognition.
IEEE Transactions on Robotics,
Journal Year:
2020,
Volume and Issue:
36(3), P. 582 - 596
Published: March 20, 2020
Contact-rich
manipulation
tasks
in
unstructured
environments
often
require
both
haptic
and
visual
feedback.
It
is
nontrivial
to
manually
design
a
robot
controller
that
combines
these
modalities,
which
have
very
different
characteristics.
While
deep
reinforcement
learning
has
shown
success
control
policies
for
high-dimensional
inputs,
algorithms
are
generally
intractable
train
directly
on
real
robots
due
sample
complexity.
In
this
article,
we
use
self-supervision
learn
compact
multimodal
representation
of
our
sensory
can
then
be
used
improve
the
efficiency
policy
learning.
Evaluating
method
peg
insertion
task,
show
it
generalizes
over
varying
geometries,
configurations,
clearances,
while
being
robust
external
perturbations.
We
also
systematically
study
self-supervised
objectives
architectures.
Results
presented
simulation
physical
robot.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,
Journal Year:
2022,
Volume and Issue:
unknown
Published: Jan. 1, 2022
Multimodal
sentiment
analysis
(MSA)
and
emotion
recognition
in
conversation
(ERC)
are
key
research
topics
for
computers
to
understand
human
behaviors.
From
a
psychological
perspective,
emotions
the
expression
of
affect
or
feelings
during
short
period,
while
sentiments
formed
held
longer
period.
However,
most
existing
works
study
separately
do
not
fully
exploit
complementary
knowledge
behind
two.
In
this
paper,
we
propose
multimodal
knowledge-sharing
framework
(UniMSE)
that
unifies
MSA
ERC
tasks
from
features,
labels,
models.
We
perform
modality
fusion
at
syntactic
semantic
levels
introduce
contrastive
learning
between
modalities
samples
better
capture
difference
consistency
emotions.
Experiments
on
four
public
benchmark
datasets,
MOSI,
MOSEI,
MELD,
IEMOCAP,
demonstrate
effectiveness
proposed
method
achieve
consistent
improvements
compared
with
state-of-the-art
methods.