HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling DOI
Benjamin Attal, Jia‐Bin Huang, Christian Richardt

et al.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2023, Volume and Issue: 25, P. 16610 - 16620

Published: June 1, 2023

Volumetric scene representations enable photorealistic view synthesis for static scenes and form the basis of several existing 6-DoF video techniques. However, volume rendering procedures that drive these necessitate careful trade-offs in terms quality, speed, memory efficiency. In particular, methods fail to simultaneously achieve real-time performance, small footprint, high-quality challenging real-world scenes. To address issues, we present HyperReel―a novel representation. The two core components HyperReel are: (1) a ray-conditioned sample prediction network enables high-fidelity, high frame rate at resolutions (2) compact memory-efficient dynamic Our pipeline achieves best performance compared prior contemporary approaches visual quality with requirements, while also up 18 frames-per-second megapixel resolution without any custom CUDA code.

Language: Английский

Zero-1-to-3: Zero-shot One Image to 3D Object DOI
Ruoshi Liu, Rundi Wu,

Basile Van Hoorick

et al.

2021 IEEE/CVF International Conference on Computer Vision (ICCV), Journal Year: 2023, Volume and Issue: unknown, P. 9264 - 9275

Published: Oct. 1, 2023

We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just single RGB image. To perform novel view synthesis in this under-constrained setting, we capitalize on geometric priors that large-scale diffusion models learn about natural images. Our conditional model uses synthetic dataset to controls relative viewpoint, which allow new images be generated same under specified transformation. Even though it is trained dataset, our retains strong zero-shot generalization ability out-of-distribution datasets as well in-the-wild images, including impressionist paintings. viewpoint-conditioned approach can further used task 3D reconstruction from Qualitative and quantitative experiments show method significantly outperforms state-of-the-art single-view by leveraging Internet-scale pre-training.

Language: Английский

Citations

316

K-Planes: Explicit Radiance Fields in Space, Time, and Appearance DOI

Sara Fridovich-Keil,

Giacomo Meanti,

Frederik Warburg

et al.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2023, Volume and Issue: unknown, P. 12479 - 12488

Published: June 1, 2023

We introduce k-planes, a white-box model for radiance fields in arbitrary dimensions. Our uses planes to represent d-dimensional scene, providing seamless way go from static (d = 3) dynamic (d= 4) scenes. This planar factorization makes adding dimension-specific priors easy, e.g. temporal smoothness and multi-resolution spatial structure, induces natural decomposition of components scene. use linear feature decoder with learned color basis that yields similar performance as nonlinear black-box MLP decoder. Across range synthetic real, dynamic, fixed varying appearance scenes, k-planes competitive often state-of-the-art recon- struction fidelity low memory usage, achieving 1000x compression over full 4D grid, fast optimization pure PyTorch implementation. For video results code, please see sarafridov.github.io/K-Planes.

Language: Английский

Citations

211

Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation DOI
Haochen Wang,

Xiaodan Du,

Jiahao Li

et al.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2023, Volume and Issue: unknown, P. 12619 - 12629

Published: June 1, 2023

A diffusion model learns to predict a vector field of gradients. We propose apply chain rule on the learned gradients, and back-propagate score through Jacobian differentiable renderer, which we instantiate be voxel radiance field. This setup aggregates 2D scores at multiple camera viewpoints into 3D score, re-purposes pretrained for data generation. identify technical challenge distribution mismatch that arises in this application, novel estimation mechanism resolve it. run our algorithm several off-the-shelf image generative models, including recently released Stable Diffusion trained large-scale LAION 5B dataset.

Language: Английский

Citations

197

MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures DOI
Zhiqin Chen,

Thomas Funkhouser,

Peter Hedman

et al.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2023, Volume and Issue: unknown

Published: June 1, 2023

Neural Radiance Fields (NeRFs) have demonstrated amazing ability to synthesize images of 3D scenes from novel views. However, they rely upon specialized volumetric rendering algorithms based on ray marching that are mismatched the capabilities widely deployed graphics hardware. This paper introduces a new NeRF representation textured polygons can efficiently with standard pipelines. The is represented as set textures representing binary opacities and feature vectors. Traditional z-buffer yields an image features at every pixel, which interpreted by small, view-dependent MLP running in fragment shader produce final pixel color. approach enables NeRFs be rendered traditional polygon rasterization pipeline, provides massive pixel-level parallelism, achieving interactive frame rates wide range compute platforms, including mobile phones. Project page: https://mobile-nerf.github.io

Language: Английский

Citations

161

HexPlane: A Fast Representation for Dynamic Scenes DOI

Ang Cao,

Justin Johnson

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2023, Volume and Issue: unknown

Published: June 1, 2023

Modeling and re-rendering dynamic 3D scenes is a challenging task in vision. Prior approaches build on NeRF rely implicit representations. This slow since it requires many MLP evaluations, constraining real-world applications. We show that can be explicitly represented by six planes of learned features, leading to an elegant solution we call HexPlane. A HexPlane computes features for points spacetime fusing vectors extracted from each plane, which highly efficient. Pairing with tiny regress output colors training via volume rendering gives impressive results novel view synthesis scenes, matching the image quality prior work but reducing time more than 100×. Extensive ablations confirm our design robust different feature fusion mechanisms, coordinate systems, decoding mechanisms. simple effective representing 4D volumes, hope they broadly contribute modeling scenes. 1 Project page: https://caoang327.github.io/HexPlane.

Language: Английский

Citations

154

Fast Dynamic Radiance Fields with Time-Aware Neural Voxels DOI
Jiemin Fang, Taoran Yi, Xinggang Wang

et al.

Published: Nov. 29, 2022

Neural radiance fields (NeRF) have shown great success in modeling 3D scenes and synthesizing novel-view images. However, most previous NeRF methods take much time to optimize one single scene. Explicit data structures, e.g. voxel features, show potential accelerate the training process. features face two big challenges be applied dynamic scenes, i.e. temporal information capturing different scales of point motions. We propose a field framework by representing with time-aware named as TiNeuVox. A tiny coordinate deformation network is introduced model coarse motion trajectories further enhanced network. multi-distance interpolation method proposed on both small large Our significantly accelerates optimization while maintaining high rendering quality. Empirical evaluation performed synthetic real scenes. TiNeuVox completes only 8 minutes 8-MB storage cost showing similar or even better performance than methods.

Language: Английский

Citations

135

NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed Neural Radiance Fields DOI
Liangchen Song, Anpei Chen, Zhong Li

et al.

IEEE Transactions on Visualization and Computer Graphics, Journal Year: 2023, Volume and Issue: 29(5), P. 2732 - 2742

Published: Feb. 22, 2023

Visually exploring in a real-world 4D spatiotemporal space freely VR has been long-term quest. The task is especially appealing when only few or even single RGB cameras are used for capturing the dynamic scene. To this end, we present an efficient framework capable of fast reconstruction, compact modeling, and streamable rendering. First, propose to decompose according temporal characteristics. Points associated with probabilities belonging three categories: static, deforming, new areas. Each area represented regularized by separate neural field. Second, hybrid representations based feature streaming scheme efficiently modeling fields. Our approach, coined NeRFPlayer, evaluated on scenes captured hand-held multi-camera arrays, achieving comparable superior rendering performance terms quality speed recent state-of-the-art methods, reconstruction 10 seconds per frame interactive Project website: https://bit.ly/nerfplayer.

Language: Английский

Citations

111

Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction DOI
Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang

et al.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2023, Volume and Issue: unknown, P. 9223 - 9232

Published: June 1, 2023

Modern methods for vision-centric autonomous driving perception widely adopt the bird's-eye-view (BEV) representation to describe a 3D scene. Despite its better efficiency than voxel representation, it has difficulty describing fine-grained structure of scene with single plane. To address this, we propose tri-perspective view (TPV) which accompanies BEV two additional perpendicular planes. We model each point in space by summing projected features on three lift image TPV space, further transformer-based encoder (TPVFormer) obtain effectively. employ attention mechanism aggregate corresponding query Experiments show that our trained sparse supervision effectively predicts semantic occupancy all voxels. demonstrate first time using only camera inputs can achieve comparable performance LiDAR-based LiDAR segmentation task nuScenes. Code: https://github.com/wzzheng/TPVFormer.

Language: Английский

Citations

100

SparseFusion: Distilling View-Conditioned Diffusion for 3D Reconstruction DOI

Zhizhuo Zhou,

Shubham Tulsiani

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2023, Volume and Issue: unknown

Published: June 1, 2023

We propose SparseFusion, a sparse view 3D reconstruction approach that unifies recent advances in neural rendering and probabilistic image generation. Existing approaches typically build on with reprojected features but fail to generate unseen regions or handle uncertainty under large viewpoint changes. Alternate methods treat this as (probabilistic) 2D synthesis task, while they can plausible images, do not infer consistent underlying 3D. However, we find trade-off between consistency generation does need exist. In fact, show geometric generative inference be complementary mode-seeking behavior. By distilling scene representation from view-conditioned latent diffusion model, are able recover whose renderings both accurate realistic. evaluate our across 51 categories the CO3D dataset it outperforms existing methods, distortion perception metrics, for sparse-view novel synthesis.

Language: Английский

Citations

98

NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior DOI

Wenjing Bian,

Zirui Wang, Kejie Li

et al.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2023, Volume and Issue: unknown

Published: June 1, 2023

Training a Neural Radiance Field (NeRF) without precomputed camera poses is challenging. Recent advances in this direction demonstrate the possibility of jointly optimising NeRF and forward-facing scenes. However, these methods still face difficulties during dramatic movement. We tackle challenging problem by incorporating undistorted monocular depth priors. These priors are generated correcting scale shift parameters training, with which we then able to constrain relative between consecutive frames. This constraint achieved using our proposed novel loss functions. Experiments on real-world indoor outdoor scenes show that method can handle trajectories outperforms existing terms view rendering quality pose estimation accuracy. Our project page https://nope-nerf.active.vision.

Language: Английский

Citations

93