IEEE Transactions on Knowledge and Data Engineering,
Год журнала:
2022,
Номер
unknown, С. 1 - 1
Опубликована: Янв. 1, 2022
As
a
powerful
expression
of
human
knowledge
in
structural
form,
graph
(KG)
has
drawn
great
attention
from
both
the
academia
and
industry
large
number
construction
application
technologies
have
been
proposed.
Large-scale
graphs
such
as
DBpedia,
YAGO
Wikidata
are
published
widely
used
various
tasks.
However,
most
them
far
perfect
many
quality
issues.
For
example,
they
may
contain
inaccurate
or
outdated
entries
do
not
cover
enough
facts,
which
limits
their
credibility
further
utility.
Data
long
research
history
field
traditional
relational
data
recently
attracts
more
experts.
In
this
paper,
we
provide
systematic
comprehensive
review
management
on
graphs,
covering
overall
topics
about
only
issues,
dimentions
metrics,
but
also
processes
assessment
error
detection,
to
correction
KG
completion.
We
categorize
existing
works
terms
target
goals
methods
for
better
understanding.
end,
discuss
some
key
issues
possible
directions
research.
Lots
of
learning
tasks
require
dealing
with
graph
data
which
contains
rich
relation
information
among
elements.
Modeling
physics
systems,
molecular
fingerprints,
predicting
protein
interface,
and
classifying
diseases
demand
a
model
to
learn
from
inputs.
In
other
domains
such
as
non-structural
like
texts
images,
reasoning
on
extracted
structures
(like
the
dependency
trees
sentences
scene
graphs
images)
is
an
important
research
topic
also
needs
models.
Graph
neural
networks
(GNNs)
are
models
that
capture
dependence
via
message
passing
between
nodes
graphs.
recent
years,
variants
GNNs
convolutional
network
(GCN),
attention
(GAT),
recurrent
(GRN)
have
demonstrated
ground-breaking
performances
many
deep
tasks.
this
survey,
we
propose
general
design
pipeline
for
GNN
discuss
each
component,
systematically
categorize
applications,
four
open
problems
future
research.
Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval,
Год журнала:
2021,
Номер
unknown, С. 726 - 735
Опубликована: Июль 11, 2021
Representation
learning
on
user-item
graph
for
recommendation
has
evolved
from
using
single
ID
or
interaction
history
to
exploiting
higher-order
neighbors.
This
leads
the
success
of
convolution
networks
(GCNs)
such
as
PinSage
and
LightGCN.
Despite
effectiveness,
we
argue
that
they
suffer
two
limitations:
(1)
high-degree
nodes
exert
larger
impact
representation
learning,
deteriorating
recommendations
low-degree
(long-tail)
items;
(2)
representations
are
vulnerable
noisy
interactions,
neighborhood
aggregation
scheme
further
enlarges
observed
edges.
arXiv (Cornell University),
Год журнала:
2020,
Номер
unknown
Опубликована: Янв. 1, 2020
Generalizable,
transferrable,
and
robust
representation
learning
on
graph-structured
data
remains
a
challenge
for
current
graph
neural
networks
(GNNs).
Unlike
what
has
been
developed
convolutional
(CNNs)
image
data,
self-supervised
pre-training
are
less
explored
GNNs.
In
this
paper,
we
propose
contrastive
(GraphCL)
framework
unsupervised
representations
of
data.
We
first
design
four
types
augmentations
to
incorporate
various
priors.
then
systematically
study
the
impact
combinations
multiple
datasets,
in
different
settings:
semi-supervised,
unsupervised,
transfer
as
well
adversarial
attacks.
The
results
show
that,
even
without
tuning
augmentation
extents
nor
using
sophisticated
GNN
architectures,
our
GraphCL
can
produce
similar
or
better
generalizability,
transferrability,
robustness
compared
state-of-the-art
methods.
also
investigate
parameterized
patterns,
observe
further
performance
gains
preliminary
experiments.
Our
codes
available
at
https://github.com/Shen-Lab/GraphCL.
IEEE Transactions on Knowledge and Data Engineering,
Год журнала:
2021,
Номер
unknown, С. 1 - 1
Опубликована: Янв. 1, 2021
Deep
supervised
learning
has
achieved
great
success
in
the
last
decade.
However,
its
deficiencies
of
dependence
on
manual
labels
and
vulnerability
to
attacks
have
driven
people
explore
a
better
solution.
As
an
alternative,
self-supervised
attracts
many
researchers
for
soaring
performance
representation
several
years.
Self-supervised
leverages
input
data
itself
as
supervision
benefits
almost
all
types
downstream
tasks.
In
this
survey,
we
take
look
into
new
methods
computer
vision,
natural
language
processing,
graph
learning.
We
comprehensively
review
existing
empirical
summarize
them
three
main
categories
according
their
objectives:
generative,
contrastive,
generative-contrastive
(adversarial).
further
investigate
related
theoretical
analysis
work
provide
deeper
thoughts
how
works.
Finally,
briefly
discuss
open
problems
future
directions
An
outline
slide
survey
is
provided.
IEEE Transactions on Knowledge and Data Engineering,
Год журнала:
2022,
Номер
unknown, С. 1 - 1
Опубликована: Янв. 1, 2022
Deep
learning
on
graphs
has
attracted
significant
interests
recently.
However,
most
of
the
works
have
focused
(semi-)
supervised
learning,
resulting
in
shortcomings
including
heavy
label
reliance,
poor
generalization,
and
weak
robustness.
To
address
these
issues,
self-supervised
(SSL),
which
extracts
informative
knowledge
through
well-designed
pretext
tasks
without
relying
manual
labels,
become
a
promising
trending
paradigm
for
graph
data.
Different
from
SSL
other
domains
like
computer
vision
natural
language
processing,
an
exclusive
background,
design
ideas,
taxonomies.
Under
umbrella
we
present
timely
comprehensive
review
existing
approaches
employ
techniques
We
construct
unified
framework
that
mathematically
formalizes
SSL.
According
to
objectives
tasks,
divide
into
four
categories:
generation-based,
auxiliary
property-based,
contrast-based,
hybrid
approaches.
further
describe
applications
across
various
research
fields
summarize
commonly
used
datasets,
evaluation
benchmark,
performance
comparison
open-source
codes
Finally,
discuss
remaining
challenges
potential
future
directions
this
field.
arXiv (Cornell University),
Год журнала:
2020,
Номер
unknown
Опубликована: Янв. 1, 2020
How
to
obtain
informative
representations
of
molecules
is
a
crucial
prerequisite
in
AI-driven
drug
design
and
discovery.
Recent
researches
abstract
as
graphs
employ
Graph
Neural
Networks
(GNNs)
for
molecular
representation
learning.
Nevertheless,
two
issues
impede
the
usage
GNNs
real
scenarios:
(1)
insufficient
labeled
supervised
training;
(2)
poor
generalization
capability
new-synthesized
molecules.
To
address
them
both,
we
propose
novel
framework,
GROVER,
which
stands
Representation
frOm
self-superVised
mEssage
passing
tRansformer.
With
carefully
designed
self-supervised
tasks
node-,
edge-
graph-level,
GROVER
can
learn
rich
structural
semantic
information
from
enormous
unlabelled
data.
Rather,
encode
such
complex
information,
integrates
Message
Passing
into
Transformer-style
architecture
deliver
class
more
expressive
encoders
The
flexibility
allows
it
be
trained
efficiently
on
large-scale
dataset
without
requiring
any
supervision,
thus
being
immunized
mentioned
above.
We
pre-train
with
100
million
parameters
10
--
biggest
GNN
largest
training
then
leverage
pre-trained
property
prediction
followed
by
task-specific
fine-tuning,
where
observe
huge
improvement
(more
than
6%
average)
current
state-of-the-art
methods
11
challenging
benchmarks.
insights
gained
are
that
well-designed
self-supervision
losses
largely-expressive
models
enjoy
significant
potential
performance
boosting.
IEEE Signal Processing Magazine,
Год журнала:
2022,
Номер
39(3), С. 42 - 62
Опубликована: Май 1, 2022
Self-supervised
representation
learning
methods
aim
to
provide
powerful
deep
feature
without
the
requirement
of
large
annotated
datasets,
thus
alleviating
annotation
bottleneck
that
is
one
main
barriers
practical
deployment
today.
These
have
advanced
rapidly
in
recent
years,
with
their
efficacy
approaching
and
sometimes
surpassing
fully
supervised
pre-training
alternatives
across
a
variety
data
modalities
including
image,
video,
sound,
text
graphs.
This
article
introduces
this
vibrant
area
key
concepts,
four
families
approach
associated
state
art,
how
self-supervised
are
applied
diverse
data.
We
further
discuss
considerations
workflows,
transferability,
compute
cost.
Finally,
we
survey
major
open
challenges
field
fertile
ground
for
future
work.
arXiv (Cornell University),
Год журнала:
2020,
Номер
unknown
Опубликована: Янв. 1, 2020
GNNs
and
chemical
fingerprints
are
the
predominant
approaches
to
representing
molecules
for
property
prediction.
However,
in
NLP,
transformers
have
become
de-facto
standard
representation
learning
thanks
their
strong
downstream
task
transfer.
In
parallel,
software
ecosystem
around
is
maturing
rapidly,
with
libraries
like
HuggingFace
BertViz
enabling
streamlined
training
introspection.
this
work,
we
make
one
of
first
attempts
systematically
evaluate
on
molecular
prediction
tasks
via
our
ChemBERTa
model.
scales
well
pretraining
dataset
size,
offering
competitive
performance
MoleculeNet
useful
attention-based
visualization
modalities.
Our
results
suggest
that
offer
a
promising
avenue
future
work
To
facilitate
these
efforts,
release
curated
77M
SMILES
from
PubChem
suitable
large-scale
self-supervised
pretraining.
Angewandte Chemie International Edition,
Год журнала:
2019,
Номер
59(52), С. 23414 - 23436
Опубликована: Сен. 25, 2019
This
two-part
review
examines
how
automation
has
contributed
to
different
aspects
of
discovery
in
the
chemical
sciences.
In
this
second
part,
we
reflect
on
a
selection
exemplary
studies.
It
is
increasingly
important
articulate
what
role
and
computation
been
scientific
process
that
or
not
accelerated
discovery.
One
can
argue
even
best
automated
systems
have
yet
``discover''
despite
being
incredibly
useful
as
laboratory
assistants.
We
must
carefully
consider
they
be
applied
future
problems
order
effectively
design
interact
with
autonomous
platforms.
The
majority
article
defines
large
set
open
research
directions,
including
improving
our
ability
work
complex
data,
build
empirical
models,
automate
both
physical
computational
experiments
for
validation,
select
experiments,
evaluate
whether
are
making
progress
toward
ultimate
goal
Addressing
these
practical
methodological
challenges
will
greatly
advance
extent
which
make
meaningful
discoveries.