2021 IEEE/CVF International Conference on Computer Vision (ICCV),
Journal Year:
2021,
Volume and Issue:
unknown, P. 15818 - 15827
Published: Oct. 1, 2021
Semantic
labelling
is
highly
correlated
with
geometry
and
radiance
reconstruction,
as
scene
entities
similar
shape
appearance
are
more
likely
to
come
from
classes.
Recent
implicit
neural
reconstruction
techniques
appealing
they
do
not
require
prior
training
data,
but
the
same
fully
self-supervised
approach
possible
for
semantics
because
labels
human-defined
properties.We
extend
fields
(NeRF)
jointly
encode
geometry,
so
that
complete
accurate
2D
semantic
can
be
achieved
using
a
small
amount
of
in-place
annotations
specific
scene.
The
intrinsic
multi-view
consistency
smoothness
NeRF
benefit
by
enabling
sparse
efficiently
propagate.
We
show
this
when
either
or
very
noisy
in
room-scale
scenes.
demonstrate
its
advantageous
properties
various
interesting
applications
such
an
efficient
tool,
novel
view
synthesis,
label
denoising,
super-resolution,
interpolation
fusion
visual
mapping
systems.
IEEE Transactions on Pattern Analysis and Machine Intelligence,
Journal Year:
2021,
Volume and Issue:
unknown, P. 1 - 1
Published: Jan. 1, 2021
The
field
of
meta-learning,
or
learning-to-learn,
has
seen
a
dramatic
rise
in
interest
recent
years.
Contrary
to
conventional
approaches
AI
where
given
task
is
solved
from
scratch
using
fixed
learning
algorithm,
meta-learning
aims
improve
the
algorithm
itself,
experience
multiple
episodes.
This
paradigm
provides
an
opportunity
tackle
many
challenges
deep
learning,
including
data
and
computation
bottlenecks,
as
well
fundamental
issue
generalization.
In
this
survey
we
describe
contemporary
landscape.
We
first
discuss
definitions
position
it
with
respect
related
fields,
such
transfer
multi-task
hyperparameter
optimization.
then
propose
new
taxonomy
that
more
comprehensive
breakdown
space
methods
today.
promising
applications
successes
few-shot
reinforcement
architecture
search.
Finally,
outstanding
areas
for
future
research.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2021,
Volume and Issue:
unknown, P. 4576 - 4585
Published: June 1, 2021
We
propose
pixelNeRF,
a
learning
framework
that
predicts
continuous
neural
scene
representation
conditioned
on
one
or
few
input
images.
The
existing
approach
for
constructing
radiance
fields
[27]
involves
optimizing
the
to
every
independently,
requiring
many
calibrated
views
and
significant
compute
time.
take
step
towards
resolving
these
shortcomings
by
introducing
an
architecture
conditions
NeRF
image
inputs
in
fully
convolutional
manner.
This
allows
network
be
trained
across
multiple
scenes
learn
prior,
enabling
it
perform
novel
view
synthesis
feed-forward
manner
from
sparse
set
of
(as
as
one).
Leveraging
volume
rendering
NeRF,
our
model
can
directly
images
with
no
explicit
3D
supervision.
conduct
extensive
experiments
ShapeNet
benchmarks
single
tasks
held-out
objects
well
entire
unseen
categories.
further
demonstrate
flexibility
pixelNeRF
demonstrating
multi-object
real
DTU
dataset.
In
all
cases,
outperforms
current
state-of-the-art
baselines
reconstruction.
For
video
code,
please
visit
project
website:https://alexyu.net/pixelnerf.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2022,
Volume and Issue:
unknown, P. 16102 - 16112
Published: June 1, 2022
Unsupervised
generation
of
high-quality
multi-view-consistent
images
and
3D
shapes
using
only
collections
single-view
2D
photographs
has
been
a
long-standing
challenge.
Existing
GANs
are
either
compute
intensive
or
make
approximations
that
not
3D-consistent;
the
former
limits
quality
resolution
generated
latter
adversely
affects
multi-view
consistency
shape
quality.
In
this
work,
we
improve
computational
efficiency
image
without
overly
relying
on
these
approximations.
We
introduce
an
expressive
hybrid
explicit
implicit
network
architecture
that,
together
with
other
design
choices,
synthesizes
high-resolution
in
real
time
but
also
produces
geometry.
By
decoupling
feature
neural
rendering,
our
framework
is
able
to
leverage
state-of-the-art
CNN
generators,
such
as
StyleGAN2,
inherit
their
expressiveness.
demonstrate
3D-aware
synthesis
FFHQ
AFHQ
Cats,
among
experiments.
arXiv (Cornell University),
Journal Year:
2019,
Volume and Issue:
unknown
Published: Jan. 1, 2019
We
propose
an
approach
to
self-supervised
representation
learning
based
on
maximizing
mutual
information
between
features
extracted
from
multiple
views
of
a
shared
context.
For
example,
one
could
produce
local
spatio-temporal
context
by
observing
it
different
locations
(e.g.,
camera
positions
within
scene),
and
via
modalities
tactile,
auditory,
or
visual).
Or,
ImageNet
image
provide
which
produces
repeatedly
applying
data
augmentation.
Maximizing
these
requires
capturing
about
high-level
factors
whose
influence
spans
--
e.g.,
presence
certain
objects
occurrence
events.
Following
our
proposed
approach,
we
develop
model
learns
representations
that
significantly
outperform
prior
methods
the
tasks
consider.
Most
notably,
using
learning,
achieve
68.1%
accuracy
standard
linear
evaluation.
This
beats
results
over
12%
concurrent
7%.
When
extend
use
mixture-based
representations,
segmentation
behaviour
emerges
as
natural
side-effect.
Our
code
is
available
online:
https://github.com/Philip-Bachman/amdim-public.
Trends in Cognitive Sciences,
Journal Year:
2019,
Volume and Issue:
23(5), P. 408 - 422
Published: April 17, 2019
Recent
AI
research
has
given
rise
to
powerful
techniques
for
deep
reinforcement
learning.
In
their
combination
of
representation
learning
with
reward-driven
behavior,
would
appear
have
inherent
interest
psychology
and
neuroscience.
One
reservation
been
that
procedures
demand
large
amounts
training
data,
suggesting
these
algorithms
may
differ
fundamentally
from
those
underlying
human
While
this
concern
applies
the
initial
wave
RL
techniques,
subsequent
work
established
methods
allow
systems
learn
more
quickly
efficiently.
Two
particularly
interesting
promising
center,
respectively,
on
episodic
memory
meta-learning.
Alongside
as
leveraging
meta-learning
direct
implications
subtle
but
critically
important
insight
which
bring
into
focus
is
fundamental
connection
between
fast
slow
forms
Deep
(RL)
driven
impressive
advances
in
artificial
intelligence
recent
years,
exceeding
performance
domains
ranging
Atari
Go
no-limit
poker.
This
progress
drawn
attention
cognitive
scientists
interested
understanding
However,
raised
be
too
sample-inefficient
–
is,
it
simply
provide
a
plausible
model
how
humans
learn.
present
review,
we
counter
critique
by
describing
recently
developed
operate
nimbly,
solving
problems
much
than
previous
methods.
Although
were
an
context,
propose
they
rich
A
key
insight,
arising
methods,
concerns
slower,
incremental
Over
just
past
few
revolutionary
occurred
(AI)
research,
where
resurgence
neural
network
or
'deep
learning'
[1LeCun
Y.
et
al.Deep
learning.Nature.
2015;
521:
436Crossref
PubMed
Scopus
(42113)
Google
Scholar,
2Goodfellow
I.
Learning.
Vol.
1.
MIT
Press,
2016Google
Scholar]
fueled
breakthroughs
image
[3Krizhevsky
A.
al.Imagenet
classification
convolutional
networks.Adv.
Neural
Inf.
Process.
Syst.
2012;
:
1097-1105Google
4Eslami
S.M.A.
al.Neural
scene
rendering.Science.
2018;
360:
1204-1210Crossref
(264)
Scholar],
natural
language
processing
[5Bahdanau
D.
machine
translation
jointly
align
translate.arXiv.
2014;
1409.0473Google
6Van
Den
Oord
al.Wavenet:
generative
raw
audio.arXiv.
2016;
1609.03499Google
many
other
areas.
These
developments
attracted
growing
psychologists,
psycholinguists,
neuroscientists,
curious
about
whether
might
suggest
new
hypotheses
concerning
cognition
brain
function
[7Marblestone
A.H.
al.Toward
integration
neuroscience.Front.
Comput.
Neurosci.
10:
94Crossref
(316)
8Song
H.F.
al.Reward-based
recurrent
networks
value-based
tasks.eLife.
2017;
6:
e21492Crossref
(1)
9Yamins
D.L.K.
DiCarlo
J.J.
Using
goal-driven
models
understand
sensory
cortex.Nat.
19:
356Crossref
(650)
10Sussillo
al.A
finds
naturalistic
solution
production
muscle
activity.Nat.
18:
1025Crossref
(229)
11Khaligh-Razavi
S.-M.
Kriegeskorte
N.
supervised,
not
unsupervised,
explain
cortical
representation.PLoS
Biol.
e1003915Crossref
(554)
Scholar].
area
appears
inviting
perspective
(Box
1).
marries
modeling
(see
Glossary)
learning,
set
rewards
punishments
rather
explicit
instruction
[12Sutton
R.S.
Barto
A.G.
Reinforcement
Learning:
An
Introduction.
2018Google
After
decades
aspirational
practical
idea,
within
5
years
exploded
one
most
intense
areas
generating
super-human
tasks
video
games
[13Mnih
V.
al.Human-level
control
through
518:
529Crossref
(13741)
poker
[14Moravčík
M.
al.Deepstack:
expert-level
heads-up
poker.Science.
356:
508-513Crossref
(431)
multiplayer
contests
[15Jaderberg
first-person
population-based
learning.arXiv.
1807.01281Google
complex
board
games,
including
go
chess
[16Silver
al.Mastering
game
tree
search.Nature.
529:
484Crossref
(8554)
17Silver
shogi
self-play
general
algorithm.arXiv.
1712.01815Google
18Silver
without
knowledge.Nature.
550:
354Crossref
(4476)
19Silver
algorithm
masters
chess,
shogi,
self-play.Science.
362:
1140-1144Crossref
(1270)
Scholar].Box
1Deep
LearningRL
centers
problem
behavioral
policy,
mapping
states
situations
actions,
maximizes
cumulative
long-term
reward
simple
settings,
policy
can
represented
look-up
table,
listing
appropriate
action
any
state.
richer
environments,
however,
kind
infeasible,
must
instead
encoded
implicitly
parameterized
function.
Pioneering
1990s
showed
could
approximated
using
multilayer
(or
deep)
([78Tesauro
G.
Temporal
difference
td-gammon.Commun.
ACM.
1995;
38:
58-68Crossref
(964)
L.J.
Lin,
PhD
Thesis,
Carnegie
Melon
University,
1993),
allowing
gradient-descent
discover
rich,
nonlinear
mappings
perceptual
inputs
actions
panel
below).
technical
challenges
prevented
until
2015,
when
breakthrough
demonstrated
made
such
Figure
IB
Since
then,
rapid
toward
improving
scaling
[79Hessel
al.Rainbow:
combining
improvements
1710.02298Google
its
application
task
Capture
Flag
[80Jaderberg
al.Population
based
networks.arXiv.
1711.09846Google
cases,
later
involved
integrating
architectural
algorithmic
complements,
search
slot-based,
episodic-like
[52Graves
al.Hybrid
computing
dynamic
external
memory.Nature.
538:
471Crossref
(801)
IC
Other
focused
goal
speed,
make
observations,
reviewed
main
text.The
figure
illustrates
evolution
starting
Tesauro's
groundbreaking
backgammon-playing
system
'TD-gammon'
[78Tesauro
centered
took
input
learned
output
estimate
'state
value,'
defined
expected
future
rewards,
here
equal
estimated
probability
eventually
winning
current
position.
Panel
B
shows
Atari-playing
DQN
reported
Mnih
colleagues
Here,
Scholar])
takes
screen
pixels
learns
joystick
actions.
C
schematic
state-of-the
art
Wayne
[51Wayne
al.Unsupervised
predictive
goal-directed
agent.arXiv.
1803.10760Google
full
description
detailed
'wiring'
agent
beyond
scope
paper
(but
found
Scholar]).
indicates,
architecture
comprises
multiple
modules,
leverages
predict
upcoming
events,
'speaks'
reinforcement-learning
module
selects
predictor
module's
The
learns,
among
tasks,
perform
navigation
maze-like
shown
text.
Beyond
topic,
hold
special
mechanisms
drive
originally
inspired
animal
conditioning
[20Sutton
Toward
modern
theory
adaptive
networks:
expectation
prediction.Psychol.
Rev.
1981;
88:
135Crossref
(924)
are
believed
relate
closely
reward-based
centering
dopamine
[21Schultz
W.
substrate
prediction
reward.Science.
1997;
275:
1593-1599Crossref
(5895)
At
same
time,
representations
support
generalization
transfer,
abilities
biological
brains.
Given
connections,
offer
source
ideas
researchers
both
at
neuroscientific
levels.
And
indeed,
started
take
notice
commentary
first
also
sounded
note
caution.
On
blush
fashion
quite
different
humans.
hallmark
difference,
argued,
lies
sample
efficiency
versus
RL.
Sample
refers
amount
data
required
attain
chosen
target
level
performance.
measure,
indeed
drastically
learners.
To
expert
human-level
orders
magnitude
experts
themselves
[22Tsividis
P.A.
al.Human
Atari.2017
AAAI
Spring
Symposium
Series.
short,
RL,
least
incarnation,
Or
so
argument
gone
[23Lake
B.M.
concept
probabilistic
program
induction.Science.
350:
1332-1338Crossref
(1414)
24Marcus
learning:
critical
appraisal.arXiv.
1801.00631Google
applicable
beginning
around
2013
(e.g.,
[25Mnih
al.Playing
atari
2013;
1312.5602Google
even
short
time
since
innovations
show
dramatically
increased.
mitigate
original
demands
huge
effectively
fast.
emergence
computational
revives
candidate
consider
two
problem:
meta-RL.
We
examine
enable
potential
point
considering
why
fact
slow.
describe
primary
sources
inefficiency.
end
paper,
will
circle
back
constellations
issues
described
concepts
connected.
slowness
requirement
parameter
adjustment.
Initial
(which
still
very
widely
used
research)
employed
gradient
descent
sculpt
connectivity
outputs
As
discussed
only
[26Kumaran
al.What
do
intelligent
agents
need?
complementary
updated.Trends
Cogn.
Sci.
20:
512-534Abstract
Full
Text
PDF
(278)
adjustments
during
form
small,
order
maximize
[27Hardt
al.Train
faster,
generalize
better:
stability
stochastic
descent.arXiv.
1509.01240Google
avoid
overwriting
effects
earlier
(an
effect
sometimes
referred
'catastrophic
interference').
small
step-sizes
proposed
second
weak
inductive
bias.
basic
lesson
procedure
necessarily
faces
bias–variance
trade-off:
stronger
assumptions
makes
patterns
(i.e.,
bias
procedure)
less
accomplished
(assuming
matches
what's
data!).
able
master
wider
range
(greater
variance),
sample-efficient
[28Bishop
C.M.
Pattern
Recognition
Machine
Learning
(information
science
statistics).
Springer-Verlag,
2006Google
effect,
strong
what
allows
considers
narrow
interpreting
incoming
will,
perforce,
hone
correct
hypothesis
rapidly
weaker
biases
(again,
assuming
falls
range).
Importantly,
generic
extremely
low-bias
systems;
parameters
(connection
weights)
capable
fit
wide
data.
dictated
trade-off,
means
networks,
1)
tend
sample-inefficient,
requiring
Together,
factors—incremental
adjustment
bias—explain
first-generation
models.
clear
factors
mitigated,
proceed
manner.
follows,
specific
confronts
problem,
addition
field,
bear
suggestive
links
neuroscience,
shall
detail.
If
then
way
faster
updating.
Naively
increasing
rate
governing
optimization
leads
catastrophic
interference.
there
another
accomplish
goal,
keep
record
use
directly
reference
making
decisions.
[29Pritzel
control.arXiv.
1703.01988Google
30Gershman
S.J.
Daw
N.D.
animals:
integrative
framework.Annu.
Psychol.
68:
101-128Crossref
(47)
42Lengyel
Dayan
P.
Hippocampal
contributions
control:
third
way.Adv.
2008;
889-896Google
parallels
'non-parametric'
approaches
resembles
'instance-'
'exemplar-based'
theories
[31Logan
G.D.
instance
automatization.Psychol.
1988;
95:
492Crossref
(2336)
32Smith
E.E.
Medin
D.L.
Categories
Concepts.
9.
Harvard
University
1981Google
When
situation
encountered
decision
take,
compare
internal
stored
situations.
associated
highest
value,
outcomes
similar
present.
state
computed
network,
refer
resulting
'episodic
RL'.
explanation
mechanics
presented
Box
2.Box
2Episodic
RLEpisodic
value
memories
[30Gershman
43Bornstein
A.M.
al.Reminders
choices
decisions
humans.Nat.
Commun.
8:
15958Crossref
(106)
44Bornstein
Norman
K.A.
Reinstated
context
guides
sampling-based
reward.Nat.
997Crossref
(82)
Consider,
example,
valuation
depicted
I,
wherein
stores
each
along
discounted
sum
obtained
next
n
steps.
items
comprise
followed.
state,
computes
weighted
similarity
(sim.)
extended
values
recording
taken
sums
store,
querying
store
find
to-be-evaluated
was
taken.
fact,
[81Blundell
C.
al.Model-free
1606.04460Google
achieve
games.The
success
depends
compute
similarity.
follow-up
Pritzel
al.
improved
gradually
shaping
results
57
Environment
showcasing
benefits
(representation)
(value)
Episodic
games.
unlike
standard
approach,
information
gained
experienced
event
leveraged
immediately
guide
behavior.
whereas
'fast'
went
'slow,'
twist
story:
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2021,
Volume and Issue:
unknown
Published: June 1, 2021
Deep
generative
models
allow
for
photorealistic
image
synthesis
at
high
resolutions.
But
many
applications,
this
is
not
enough:
content
creation
also
needs
to
be
controllable.
While
several
recent
works
investigate
how
disentangle
underlying
factors
of
variation
in
the
data,
most
them
operate
2D
and
hence
ignore
that
our
world
three-dimensional.
Further,
only
few
consider
compositional
nature
scenes.
Our
key
hypothesis
incorporating
a
3D
scene
representation
into
model
leads
more
controllable
synthesis.
Representing
scenes
as
neural
feature
fields
allows
us
one
or
multiple
objects
from
background
well
individual
objects'
shapes
appearances
while
learning
unstructured
unposed
collections
without
any
additional
supervision.
Combining
with
rendering
pipeline
yields
fast
realistic
model.
As
evidenced
by
experiments,
able
translating
rotating
changing
camera
pose.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2019,
Volume and Issue:
unknown, P. 2432 - 2441
Published: June 1, 2019
In
this
work,
we
address
the
lack
of
3D
understanding
generative
neural
networks
by
introducing
a
persistent
feature
embedding
for
view
synthesis.
To
end,
propose
DeepVoxels,
learned
representation
that
encodes
view-dependent
appearance
scene
without
having
to
explicitly
model
its
geometry.
At
core,
our
approach
is
based
on
Cartesian
grid
embedded
features
learn
make
use
underlying
structure.
Our
combines
insights
from
geometric
computer
vision
with
recent
advances
in
learning
image-to-image
mappings
adversarial
loss
functions.
DeepVoxels
supervised,
requiring
reconstruction
scene,
using
2D
re-rendering
and
enforces
perspective
multi-view
geometry
principled
manner.
We
apply
problem
novel
synthesis
demonstrating
high-quality
results
variety
challenging
scenes.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2021,
Volume and Issue:
unknown, P. 5795 - 5805
Published: June 1, 2021
We
have
witnessed
rapid
progress
on
3D-aware
image
synthesis,
leveraging
recent
advances
in
generative
visual
models
and
neural
rendering.
Existing
approaches
how-ever
fall
short
two
ways:
first,
they
may
lack
an
under-lying
3D
representation
or
rely
view-inconsistent
rendering,
hence
synthesizing
images
that
are
not
multi-view
consistent;
second,
often
depend
upon
network
architectures
expressive
enough,
their
results
thus
quality.
propose
a
novel
model,
named
Periodic
Implicit
Generative
Adversarial
Networks
(π-GAN
pi-GAN),
for
high-quality
synthesis.
π-GAN
leverages
representations
with
periodic
activation
functions
volumetric
rendering
to
represent
scenes
as
view-consistent
radiance
fields.
The
proposed
approach
obtains
state-of-the-art
synthesis
multiple
real
synthetic
datasets.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2021,
Volume and Issue:
unknown, P. 6494 - 6504
Published: June 1, 2021
We
present
a
method
to
perform
novel
view
and
time
synthesis
of
dynamic
scenes,
requiring
only
monocular
video
with
known
camera
poses
as
input.
To
do
this,
we
introduce
Neural
Scene
Flow
Fields,
new
representation
that
models
the
scene
time-variant
continuous
function
appearance,
geometry,
3D
motion.
Our
is
optimized
through
neural
network
fit
observed
input
views.
show
our
can
be
used
for
varieties
in-the-wild
including
thin
structures,
view-dependent
effects,
complex
degrees
conduct
number
experiments
demonstrate
approach
significantly
outperforms
recent
methods,
qualitative
results
space-time
on
variety
real-world
videos.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2019,
Volume and Issue:
unknown, P. 1438 - 1447
Published: June 1, 2019
Most
image
completion
methods
produce
only
one
result
for
each
masked
input,
although
there
may
be
many
reasonable
possibilities.
In
this
paper,
we
present
an
approach
pluralistic
-
the
task
of
generating
multiple
and
diverse
plausible
solutions
completion.
A
major
challenge
faced
by
learning-based
approaches
is
that
usually
ground
truth
training
instance
per
label.
As
such,
sampling
from
conditional
VAEs
still
leads
to
minimal
diversity.
To
overcome
this,
propose
a
novel
probabilistically
principled
framework
with
two
parallel
paths.
One
reconstructive
path
utilizes
given
get
prior
distribution
missing
parts
rebuild
original
distribution.
The
other
generative
which
coupled
obtained
in
path.
Both
are
supported
GANs.
We
also
introduce
new
short+long
term
attention
layer
exploits
distant
relations
among
decoder
encoder
features,
improving
appearance
consistency.
When
tested
on
datasets
buildings
(Paris),
faces
(CelebA-HQ),
natural
images
(ImageNet),
our
method
not
generated
higher-quality
results,
but
outputs.