We
present
an
end-to-end
system
for
the
high-fidelity
capture,
model
reconstruction,
and
real-time
rendering
of
walkable
spaces
in
virtual
reality
using
neural
radiance
fields.
To
this
end,
we
designed
built
a
custom
multi-camera
rig
to
densely
capture
high
fidelity
with
multi-view
dynamic
range
images
unprecedented
quality
density.
extend
instant
graphics
primitives
novel
perceptual
color
space
learning
accurate
HDR
appearance,
efficient
mip-mapping
mechanism
level-of-detail
anti-aliasing,
while
carefully
optimizing
trade-off
between
speed.
Our
multi-GPU
renderer
enables
volume
our
field
at
full
VR
resolution
dual
2K$\times$2K
36
Hz
on
demo
machine.
demonstrate
results
challenging
datasets,
compare
method
datasets
existing
baselines.
release
dataset
project
website.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
unknown, P. 13142 - 13153
Published: June 1, 2023
Massive
data
corpora
like
WebText,
Wikipedia,
Conceptual
Captions,
WebImageText,
and
LAION
have
propelled
recent
dramatic
progress
in
AI.
Large
neural
models
trained
on
such
datasets
produce
impressive
results
top
many
of
today's
benchmarks.
A
notable
omisslion
within
this
family
large-scale
is
3D
data.
Despite
considerable
interest
potential
applications
vision,
high-fidelity
continue
to
be
mid-sized
with
limited
diversity
object
categories.
Addressing
gap,
we
present
Objaverse
1.0,
a
large
dataset
objects
800K
+
(and
growing)
descriptive
captions,
tags,
animations.
improves
upon
day
repositories
terms
scale,
number
categories,
the
visual
instances
category.
We
demonstrate
via
four
diverse
applications:
training
generative
models,
improving
tail
category
segmentation
LVIS
benchmark,
open-vocabulary
object-navigation
for
Embodied
AI,
creating
new
benchmark
robustness
analysis
vision
models.
can
open
directions
research
enable
across
field
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
unknown, P. 4150 - 4159
Published: June 1, 2023
This
paper
presents
a
novel
grid-based
NeRF
called
F
2
-
(Fast-Free-NeRF)
for
view
synthesis,
which
enables
arbitrary
input
camera
trajectories
and
only
costs
few
minutes
training.
Existing
fast
training
frameworks,
like
Instant-NGP,
Plenoxels,
DVGO,
or
TensoRF,
are
mainly
designed
bounded
scenes
rely
on
space
warping
to
handle
unbounded
scenes.
two
widely-used
space-warping
methods
the
forward-facing
trajectory
360°
object-centric
but
cannot
process
trajectories.
In
this
paper,
we
delve
deep
into
mechanism
of
Based
our
analysis,
further
propose
method
perspective
warping,
allows
us
in
framework.
Extensive
experiments
demonstrate
that
-NeRF
is
able
use
same
render
high-quality
images
standard
datasets
new
free
dataset
collected
by
us.
Project
page:
totoro97.github.io/projects/f2-nerf.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
unknown
Published: June 1, 2023
We
present
an
algorithm
for
reconstructing
the
radiance
field
of
a
large-scale
scene
from
single
casually
captured
video.
The
task
poses
two
core
challenges.
First,
most
existing
reconstruction
approaches
rely
on
accurate
pre-estimated
camera
Structure-from-Motion
algorithms,
which
frequently
fail
in-the-wild
videos.
Second,
using
single,
global
with
finite
representational
capacity
does
not
scale
to
longer
trajectories
in
unbounded
scene.
For
handling
unknown
poses,
we
jointly
estimate
progressive
manner.
show
that
optimization
significantly
improves
robustness
reconstruction.
large
scenes,
dynamically
allocate
new
local
fields
trained
frames
within
temporal
window.
This
further
(e.g.,
performs
well
even
under
moderate
pose
drifts)
and
allows
us
scenes.
Our
extensive
evaluation
TANKS
AND
TEMPLES
dataset
our
collected
outdoor
dataset,
STATIC
HIKES,
approach
compares
favorably
state-of-the-art.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
35, P. 12375 - 12385
Published: June 1, 2023
We
extend
neural
radiance
fields
(NeRFs)
to
dynamic
large-scale
urban
scenes.
Prior
work
tends
reconstruct
single
video
clips
of
short
durations
(up
10
seconds).
Two
reasons
are
that
such
methods
(a)
tend
scale
linearly
with
the
number
moving
objects
and
input
videos
because
a
separate
model
is
built
for
each
(b)
require
supervision
via
3D
bounding
boxes
panoptic
labels,
obtained
manually
or
category-specific
models.
As
step
towards
truly
open-world
reconstructions
cities,
we
introduce
two
key
innovations:
factorize
scene
into
three
hash
table
data
structures
efficiently
encode
static,
dynamic,
far-field
fields,
make
use
unlabeled
target
signals
consisting
RGB
images,
sparse
LiDAR,
off-the-shelf
self-supervised
2D
descriptors,
most
importantly,
optical
flow.
Operationalizing
inputs
photometric,
geometric,
feature-metric
reconstruction
losses
enables
SUDS
decompose
scenes
static
background,
individual
objects,
their
motions.
When
combined
our
multi-branch
representation,
can
be
scaled
tens
thousands
across
1.2
million
frames
from
1700
spanning
geospatial
footprints
hundreds
kilometers,
(to
knowledge)
largest
NeRF
date.
present
qualitative
initial
results
on
variety
tasks
enabled
by
representations,
including
novel-view
synthesis
scenes,
unsupervised
instance
segmentation,
cuboid
detection.
To
compare
prior
work,
also
evaluate
KITTI
Virtual
2,
surpassing
state-of-the-art
rely
ground
truth
box
annotations
while
being
10x
quicker
train.
We
present
Loc-NeRF,
a
real-time
vision-based
robot
localization
approach
that
combines
Monte
Carlo
and
Neural
Radiance
Fields
(NeRF).
Our
system
uses
pre-trained
NeRF
model
as
the
map
of
an
environment
can
localize
itself
in
using
RGB
camera
only
exteroceptive
sensor
onboard
robot.
While
neural
radiance
fields
have
seen
significant
applications
for
visual
rendering
computer
vision
graphics,
they
found
limited
use
robotics.
Existing
approaches
NeRF-based
require
both
good
initial
pose
guess
computation,
making
them
impractical
robotics
applications.
By
workhorse
to
estimate
poses
model,
LocNeRF
is
able
perform
faster
than
state
art
without
relying
on
estimate.
In
addition
testing
synthetic
data,
we
also
run
our
real
data
collected
by
Clearpath
Jackal
UGV
demonstrate
first
time
ability
global
(albeit
over
small
workspace)
with
fields.
make
code
publicly
available
at
https://github.com/MIT-SPARK/Loc-NeRF.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
unknown, P. 8296 - 8306
Published: June 1, 2023
Purely
MLP-based
neural
radiance
fields
(NeRF-based
methods)
often
suffer
from
underfitting
with
blurred
renderings
on
large-scale
scenes
due
to
limited
model
capacity.
Recent
approaches
propose
geographically
divide
the
scene
and
adopt
multiple
sub-NeRFs
each
region
individually,
leading
linear
scale-up
in
training
costs
number
of
as
expands.
An
alternative
solution
is
use
a
feature
grid
representation,
which
computationally
efficient
can
naturally
scale
large
increased
resolutions.
However,
tends
be
less
constrained
reaches
suboptimal
solutions,
producing
noisy
artifacts
renderings,
especially
regions
complex
geometry
texture.
In
this
work,
we
present
new
framework
that
realizes
high-fidelity
rendering
urban
while
being
efficient.
We
compact
multi-resolution
ground
plane
representation
coarsely
capture
scene,
complement
it
positional
encoding
inputs
through
another
NeRF
branch
for
joint
learning
fashion.
show
such
an
integration
utilize
advantages
two
solutions:
light-weighted
sufficient,
under
guidance
render
photorealistic
novel
views
fine
details;
jointly
optimized
planes,
meanwhile
gain
further
refinements,
forming
more
accurate
space
output
much
natural
results.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
unknown
Published: June 1, 2023
Reconstruction
and
intrinsic
decomposition
of
scenes
from
captured
imagery
would
enable
many
applications
such
as
relighting
virtual
object
insertion.
Recent
NeRF
based
methods
achieve
impressive
fidelity
3D
reconstruction,
but
bake
the
lighting
shadows
into
radiance
field,
while
mesh-based
that
facilitate
through
differentiable
rendering
have
not
yet
scaled
to
complexity
scale
outdoor
scenes.
We
present
a
novel
inverse
framework
for
large
urban
capable
jointly
reconstructing
scene
geometry,
spatially-varying
materials,
HDR
set
posed
RGB
images
with
optional
depth.
Specifically,
we
use
neural
field
account
primary
rays,
an
explicit
mesh
(reconstructed
underlying
field)
modeling
secondary
rays
produce
higher-order
effects
cast
shadows.
By
faithfully
disentangling
complex
geometry
materials
effects,
our
method
enables
photorealistic
specular
shadow
on
several
datasets.
Moreover,
it
supports
physics-based
manipulations
insertion
ray-traced
casting.