2021 IEEE/CVF International Conference on Computer Vision (ICCV),
Journal Year:
2021,
Volume and Issue:
unknown, P. 15818 - 15827
Published: Oct. 1, 2021
Semantic
labelling
is
highly
correlated
with
geometry
and
radiance
reconstruction,
as
scene
entities
similar
shape
appearance
are
more
likely
to
come
from
classes.
Recent
implicit
neural
reconstruction
techniques
appealing
they
do
not
require
prior
training
data,
but
the
same
fully
self-supervised
approach
possible
for
semantics
because
labels
human-defined
properties.We
extend
fields
(NeRF)
jointly
encode
geometry,
so
that
complete
accurate
2D
semantic
can
be
achieved
using
a
small
amount
of
in-place
annotations
specific
scene.
The
intrinsic
multi-view
consistency
smoothness
NeRF
benefit
by
enabling
sparse
efficiently
propagate.
We
show
this
when
either
or
very
noisy
in
room-scale
scenes.
demonstrate
its
advantageous
properties
various
interesting
applications
such
an
efficient
tool,
novel
view
synthesis,
label
denoising,
super-resolution,
interpolation
fusion
visual
mapping
systems.
2021 IEEE/CVF International Conference on Computer Vision (ICCV),
Journal Year:
2021,
Volume and Issue:
unknown, P. 5721 - 5731
Published: Oct. 1, 2021
Neural
Radiance
Fields
(NeRF)
[31]
have
recently
gained
a
surge
of
interest
within
the
computer
vision
community
for
its
power
to
synthesize
photorealistic
novel
views
real-world
scenes.
One
limitation
NeRF,
however,
is
requirement
accurate
camera
poses
learn
scene
representations.
In
this
paper,
we
propose
Bundle-Adjusting
(BARF)
training
NeRF
from
imperfect
(or
even
unknown)
—
joint
problem
learning
neural
3D
representations
and
registering
frames.
We
establish
theoretical
connection
classical
image
alignment
show
that
coarse-to-fine
registration
also
applicable
NeRF.
Furthermore,
naïvely
applying
positional
encoding
in
has
negative
impact
on
with
synthesis-based
objective.
Experiments
synthetic
data
BARF
can
effectively
optimize
resolve
large
pose
misalignment
at
same
time.
This
enables
view
synthesis
localization
video
sequences
unknown
poses,
opening
up
new
avenues
visual
systems
(e.g.
SLAM)
potential
applications
dense
mapping
reconstruction.
arXiv (Cornell University),
Journal Year:
2019,
Volume and Issue:
unknown
Published: Jan. 1, 2019
Learned
world
models
summarize
an
agent's
experience
to
facilitate
learning
complex
behaviors.
While
from
high-dimensional
sensory
inputs
is
becoming
feasible
through
deep
learning,
there
are
many
potential
ways
for
deriving
behaviors
them.
We
present
Dreamer,
a
reinforcement
agent
that
solves
long-horizon
tasks
images
purely
by
latent
imagination.
efficiently
learn
propagating
analytic
gradients
of
learned
state
values
back
trajectories
imagined
in
the
compact
space
model.
On
20
challenging
visual
control
tasks,
Dreamer
exceeds
existing
approaches
data-efficiency,
computation
time,
and
final
performance.
Computer Graphics Forum,
Journal Year:
2020,
Volume and Issue:
39(2), P. 701 - 727
Published: May 1, 2020
Abstract
Efficient
rendering
of
photo‐realistic
virtual
worlds
is
a
long
standing
effort
computer
graphics.
Modern
graphics
techniques
have
succeeded
in
synthesizing
images
from
hand‐crafted
scene
representations.
However,
the
automatic
generation
shape,
materials,
lighting,
and
other
aspects
scenes
remains
challenging
problem
that,
if
solved,
would
make
more
widely
accessible.
Concurrently,
progress
vision
machine
learning
given
rise
to
new
approach
image
synthesis
editing,
namely
deep
generative
models.
Neural
rapidly
emerging
field
that
combines
with
physical
knowledge
graphics,
e.g.,
by
integration
differentiable
into
network
training.
With
plethora
applications
vision,
neural
poised
become
area
community,
yet
no
survey
this
exists.
This
state‐of‐the‐art
report
summarizes
recent
trends
rendering.
We
focus
on
approaches
combine
classic
models
obtain
controllable
photorealistic
outputs.
Starting
an
overview
underlying
concepts,
we
discuss
critical
approaches.
Specifically,
our
emphasis
type
control,
i.e.,
how
control
provided,
which
parts
pipeline
are
learned,
explicit
vs.
implicit
generalization,
stochastic
deterministic
synthesis.
The
second
half
focused
many
important
use
cases
for
described
algorithms
such
as
novel
view
synthesis,
semantic
photo
manipulation,
facial
body
reenactment,
relighting,
free‐viewpoint
video,
creation
avatars
augmented
reality
telepresence.
Finally,
conclude
discussion
social
implications
technology
investigate
open
research
problems.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2020,
Volume and Issue:
unknown
Published: June 1, 2020
View
synthesis
allows
for
the
generation
of
new
views
a
scene
given
one
or
more
images.
This
is
challenging;
it
requires
comprehensively
understanding
3D
from
As
result,
current
methods
typically
use
multiple
images,
train
on
ground-truth
depth,
are
limited
to
synthetic
data.
We
propose
novel
end-to-end
model
this
task
using
single
image
at
test
time;
trained
real
images
without
any
information.
To
end,
we
introduce
differentiable
point
cloud
renderer
that
used
transform
latent
features
into
target
view.
The
projected
decoded
by
our
refinement
network
inpaint
missing
regions
and
generate
realistic
output
image.
component
inside
generative
interpretable
manipulation
feature
space
time,
e.g.
can
animate
trajectories
Additionally,
high
resolution
generalise
other
input
resolutions.
outperform
baselines
prior
work
Matterport,
Replica,
RealEstate10K
datasets.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2019,
Volume and Issue:
unknown
Published: June 1, 2019
We
explore
the
problem
of
view
synthesis
from
a
narrow
baseline
pair
images,
and
focus
on
generating
high-quality
extrapolations
with
plausible
disocclusions.
Our
method
builds
upon
prior
work
in
predicting
multiplane
image
(MPI),
which
represents
scene
content
as
set
RGBA
planes
within
reference
frustum
renders
novel
views
by
projecting
this
into
target
viewpoints.
present
theoretical
analysis
showing
how
range
that
can
be
rendered
an
MPI
increases
linearly
disparity
sampling
frequency,
well
prediction
procedure
theoretically
enables
up
to
4
times
lateral
viewpoint
movement
allowed
work.
ameliorates
two
specific
issues
limit
renderable
methods:
1)
expand
without
depth
discretization
artifacts
using
3D
convolutional
network
architecture
along
randomized-resolution
training
allow
our
model
predict
MPIs
increased
frequency.
2)
reduce
repeated
texture
seen
disocclusions
enforcing
constraint
appearance
hidden
at
any
must
drawn
visible
or
behind
depth.
2021 IEEE/CVF International Conference on Computer Vision (ICCV),
Journal Year:
2021,
Volume and Issue:
unknown, P. 12939 - 12950
Published: Oct. 1, 2021
We
present
Non-Rigid
Neural
Radiance
Fields
(NR-NeRF),
a
reconstruction
and
novel
view
synthesis
approach
for
general
non-rigid
dynamic
scenes.
Our
takes
RGB
images
of
scene
as
input
(e.g.,
from
monocular
video
recording),
creates
high-quality
space-time
geometry
appearance
representation.
show
that
single
handheld
consumer-grade
camera
is
sufficient
to
synthesize
sophisticated
renderings
virtual
views,
e.g.
'bullet-time'
effect.
NR-NeRF
disentangles
the
into
canonical
volume
its
deformation.
Scene
deformation
implemented
ray
bending,
where
straight
rays
are
deformed
non-rigidly.
also
propose
rigidity
network
better
constrain
rigid
regions
scene,
leading
more
stable
results.
The
bending
trained
without
explicit
supervision.
formulation
enables
dense
correspondence
estimation
across
views
time,
compelling
editing
applications
such
motion
exaggeration.
code
will
be
open
sourced.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2021,
Volume and Issue:
unknown, P. 9416 - 9426
Published: June 1, 2021
We
present
a
method
that
learns
spatiotemporal
neural
irradiance
field
for
dynamic
scenes
from
single
video.
Our
learned
representation
enables
free-viewpoint
rendering
of
the
input
builds
upon
recent
advances
in
implicit
representations.
Learning
video
poses
significant
challenges
because
contains
only
one
observation
scene
at
any
point
time.
The
3D
geometry
can
be
legitimately
represented
numerous
ways
since
varying
(motion)
explained
with
appearance
and
vice
versa.
address
this
ambiguity
by
constraining
time-varying
our
using
depth
estimated
estimation
methods,
aggregating
contents
individual
frames
into
global
representation.
provide
an
extensive
quantitative
evaluation
demonstrate
compelling
results.