2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
25, P. 16610 - 16620
Published: June 1, 2023
Volumetric
scene
representations
enable
photorealistic
view
synthesis
for
static
scenes
and
form
the
basis
of
several
existing
6-DoF
video
techniques.
However,
volume
rendering
procedures
that
drive
these
necessitate
careful
trade-offs
in
terms
quality,
speed,
memory
efficiency.
In
particular,
methods
fail
to
simultaneously
achieve
real-time
performance,
small
footprint,
high-quality
challenging
real-world
scenes.
To
address
issues,
we
present
HyperReel―a
novel
representation.
The
two
core
components
HyperReel
are:
(1)
a
ray-conditioned
sample
prediction
network
enables
high-fidelity,
high
frame
rate
at
resolutions
(2)
compact
memory-efficient
dynamic
Our
pipeline
achieves
best
performance
compared
prior
contemporary
approaches
visual
quality
with
requirements,
while
also
up
18
frames-per-second
megapixel
resolution
without
any
custom
CUDA
code.
2021 IEEE/CVF International Conference on Computer Vision (ICCV),
Journal Year:
2023,
Volume and Issue:
unknown, P. 9264 - 9275
Published: Oct. 1, 2023
We
introduce
Zero-1-to-3,
a
framework
for
changing
the
camera
viewpoint
of
an
object
given
just
single
RGB
image.
To
perform
novel
view
synthesis
in
this
under-constrained
setting,
we
capitalize
on
geometric
priors
that
large-scale
diffusion
models
learn
about
natural
images.
Our
conditional
model
uses
synthetic
dataset
to
controls
relative
viewpoint,
which
allow
new
images
be
generated
same
under
specified
transformation.
Even
though
it
is
trained
dataset,
our
retains
strong
zero-shot
generalization
ability
out-of-distribution
datasets
as
well
in-the-wild
images,
including
impressionist
paintings.
viewpoint-conditioned
approach
can
further
used
task
3D
reconstruction
from
Qualitative
and
quantitative
experiments
show
method
significantly
outperforms
state-of-the-art
single-view
by
leveraging
Internet-scale
pre-training.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
unknown, P. 12479 - 12488
Published: June 1, 2023
We
introduce
k-planes,
a
white-box
model
for
radiance
fields
in
arbitrary
dimensions.
Our
uses
planes
to
represent
d-dimensional
scene,
providing
seamless
way
go
from
static
(d
=
3)
dynamic
(d=
4)
scenes.
This
planar
factorization
makes
adding
dimension-specific
priors
easy,
e.g.
temporal
smoothness
and
multi-resolution
spatial
structure,
induces
natural
decomposition
of
components
scene.
use
linear
feature
decoder
with
learned
color
basis
that
yields
similar
performance
as
nonlinear
black-box
MLP
decoder.
Across
range
synthetic
real,
dynamic,
fixed
varying
appearance
scenes,
k-planes
competitive
often
state-of-the-art
recon-
struction
fidelity
low
memory
usage,
achieving
1000x
compression
over
full
4D
grid,
fast
optimization
pure
PyTorch
implementation.
For
video
results
code,
please
see
sarafridov.github.io/K-Planes.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
unknown, P. 12619 - 12629
Published: June 1, 2023
A
diffusion
model
learns
to
predict
a
vector
field
of
gradients.
We
propose
apply
chain
rule
on
the
learned
gradients,
and
back-propagate
score
through
Jacobian
differentiable
renderer,
which
we
instantiate
be
voxel
radiance
field.
This
setup
aggregates
2D
scores
at
multiple
camera
viewpoints
into
3D
score,
re-purposes
pretrained
for
data
generation.
identify
technical
challenge
distribution
mismatch
that
arises
in
this
application,
novel
estimation
mechanism
resolve
it.
run
our
algorithm
several
off-the-shelf
image
generative
models,
including
recently
released
Stable
Diffusion
trained
large-scale
LAION
5B
dataset.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
unknown
Published: June 1, 2023
Neural
Radiance
Fields
(NeRFs)
have
demonstrated
amazing
ability
to
synthesize
images
of
3D
scenes
from
novel
views.
However,
they
rely
upon
specialized
volumetric
rendering
algorithms
based
on
ray
marching
that
are
mismatched
the
capabilities
widely
deployed
graphics
hardware.
This
paper
introduces
a
new
NeRF
representation
textured
polygons
can
efficiently
with
standard
pipelines.
The
is
represented
as
set
textures
representing
binary
opacities
and
feature
vectors.
Traditional
z-buffer
yields
an
image
features
at
every
pixel,
which
interpreted
by
small,
view-dependent
MLP
running
in
fragment
shader
produce
final
pixel
color.
approach
enables
NeRFs
be
rendered
traditional
polygon
rasterization
pipeline,
provides
massive
pixel-level
parallelism,
achieving
interactive
frame
rates
wide
range
compute
platforms,
including
mobile
phones.
Project
page:
https://mobile-nerf.github.io
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
unknown
Published: June 1, 2023
Modeling
and
re-rendering
dynamic
3D
scenes
is
a
challenging
task
in
vision.
Prior
approaches
build
on
NeRF
rely
implicit
representations.
This
slow
since
it
requires
many
MLP
evaluations,
constraining
real-world
applications.
We
show
that
can
be
explicitly
represented
by
six
planes
of
learned
features,
leading
to
an
elegant
solution
we
call
HexPlane.
A
HexPlane
computes
features
for
points
spacetime
fusing
vectors
extracted
from
each
plane,
which
highly
efficient.
Pairing
with
tiny
regress
output
colors
training
via
volume
rendering
gives
impressive
results
novel
view
synthesis
scenes,
matching
the
image
quality
prior
work
but
reducing
time
more
than
100×.
Extensive
ablations
confirm
our
design
robust
different
feature
fusion
mechanisms,
coordinate
systems,
decoding
mechanisms.
simple
effective
representing
4D
volumes,
hope
they
broadly
contribute
modeling
scenes.
1
Project
page:
https://caoang327.github.io/HexPlane.
Neural
radiance
fields
(NeRF)
have
shown
great
success
in
modeling
3D
scenes
and
synthesizing
novel-view
images.
However,
most
previous
NeRF
methods
take
much
time
to
optimize
one
single
scene.
Explicit
data
structures,
e.g.
voxel
features,
show
potential
accelerate
the
training
process.
features
face
two
big
challenges
be
applied
dynamic
scenes,
i.e.
temporal
information
capturing
different
scales
of
point
motions.
We
propose
a
field
framework
by
representing
with
time-aware
named
as
TiNeuVox.
A
tiny
coordinate
deformation
network
is
introduced
model
coarse
motion
trajectories
further
enhanced
network.
multi-distance
interpolation
method
proposed
on
both
small
large
Our
significantly
accelerates
optimization
while
maintaining
high
rendering
quality.
Empirical
evaluation
performed
synthetic
real
scenes.
TiNeuVox
completes
only
8
minutes
8-MB
storage
cost
showing
similar
or
even
better
performance
than
methods.
IEEE Transactions on Visualization and Computer Graphics,
Journal Year:
2023,
Volume and Issue:
29(5), P. 2732 - 2742
Published: Feb. 22, 2023
Visually
exploring
in
a
real-world
4D
spatiotemporal
space
freely
VR
has
been
long-term
quest.
The
task
is
especially
appealing
when
only
few
or
even
single
RGB
cameras
are
used
for
capturing
the
dynamic
scene.
To
this
end,
we
present
an
efficient
framework
capable
of
fast
reconstruction,
compact
modeling,
and
streamable
rendering.
First,
propose
to
decompose
according
temporal
characteristics.
Points
associated
with
probabilities
belonging
three
categories:
static,
deforming,
new
areas.
Each
area
represented
regularized
by
separate
neural
field.
Second,
hybrid
representations
based
feature
streaming
scheme
efficiently
modeling
fields.
Our
approach,
coined
NeRFPlayer,
evaluated
on
scenes
captured
hand-held
multi-camera
arrays,
achieving
comparable
superior
rendering
performance
terms
quality
speed
recent
state-of-the-art
methods,
reconstruction
10
seconds
per
frame
interactive
Project
website:
https://bit.ly/nerfplayer.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
unknown, P. 9223 - 9232
Published: June 1, 2023
Modern
methods
for
vision-centric
autonomous
driving
perception
widely
adopt
the
bird's-eye-view
(BEV)
representation
to
describe
a
3D
scene.
Despite
its
better
efficiency
than
voxel
representation,
it
has
difficulty
describing
fine-grained
structure
of
scene
with
single
plane.
To
address
this,
we
propose
tri-perspective
view
(TPV)
which
accompanies
BEV
two
additional
perpendicular
planes.
We
model
each
point
in
space
by
summing
projected
features
on
three
lift
image
TPV
space,
further
transformer-based
encoder
(TPVFormer)
obtain
effectively.
employ
attention
mechanism
aggregate
corresponding
query
Experiments
show
that
our
trained
sparse
supervision
effectively
predicts
semantic
occupancy
all
voxels.
demonstrate
first
time
using
only
camera
inputs
can
achieve
comparable
performance
LiDAR-based
LiDAR
segmentation
task
nuScenes.
Code:
https://github.com/wzzheng/TPVFormer.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
unknown
Published: June 1, 2023
We
propose
SparseFusion,
a
sparse
view
3D
reconstruction
approach
that
unifies
recent
advances
in
neural
rendering
and
probabilistic
image
generation.
Existing
approaches
typically
build
on
with
reprojected
features
but
fail
to
generate
unseen
regions
or
handle
uncertainty
under
large
viewpoint
changes.
Alternate
methods
treat
this
as
(probabilistic)
2D
synthesis
task,
while
they
can
plausible
images,
do
not
infer
consistent
underlying
3D.
However,
we
find
trade-off
between
consistency
generation
does
need
exist.
In
fact,
show
geometric
generative
inference
be
complementary
mode-seeking
behavior.
By
distilling
scene
representation
from
view-conditioned
latent
diffusion
model,
are
able
recover
whose
renderings
both
accurate
realistic.
evaluate
our
across
51
categories
the
CO3D
dataset
it
outperforms
existing
methods,
distortion
perception
metrics,
for
sparse-view
novel
synthesis.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
unknown
Published: June 1, 2023
Training
a
Neural
Radiance
Field
(NeRF)
without
precomputed
camera
poses
is
challenging.
Recent
advances
in
this
direction
demonstrate
the
possibility
of
jointly
optimising
NeRF
and
forward-facing
scenes.
However,
these
methods
still
face
difficulties
during
dramatic
movement.
We
tackle
challenging
problem
by
incorporating
undistorted
monocular
depth
priors.
These
priors
are
generated
correcting
scale
shift
parameters
training,
with
which
we
then
able
to
constrain
relative
between
consecutive
frames.
This
constraint
achieved
using
our
proposed
novel
loss
functions.
Experiments
on
real-world
indoor
outdoor
scenes
show
that
method
can
handle
trajectories
outperforms
existing
terms
view
rendering
quality
pose
estimation
accuracy.
Our
project
page
https://nope-nerf.active.vision.