In-Place Scene Labelling and Understanding with Implicit Scene Representation DOI
Shuaifeng Zhi, Tristan Laidlow, Stefan Leutenegger

et al.

2021 IEEE/CVF International Conference on Computer Vision (ICCV), Journal Year: 2021, Volume and Issue: unknown, P. 15818 - 15827

Published: Oct. 1, 2021

Semantic labelling is highly correlated with geometry and radiance reconstruction, as scene entities similar shape appearance are more likely to come from classes. Recent implicit neural reconstruction techniques appealing they do not require prior training data, but the same fully self-supervised approach possible for semantics because labels human-defined properties.We extend fields (NeRF) jointly encode geometry, so that complete accurate 2D semantic can be achieved using a small amount of in-place annotations specific scene. The intrinsic multi-view consistency smoothness NeRF benefit by enabling sparse efficiently propagate. We show this when either or very noisy in room-scale scenes. demonstrate its advantageous properties various interesting applications such an efficient tool, novel view synthesis, label denoising, super-resolution, interpolation fusion visual mapping systems.

Language: Английский

Meta-Learning in Neural Networks: A Survey DOI Creative Commons
Timothy M. Hospedales, Antreas Antoniou,

Paul Micaelli

et al.

IEEE Transactions on Pattern Analysis and Machine Intelligence, Journal Year: 2021, Volume and Issue: unknown, P. 1 - 1

Published: Jan. 1, 2021

The field of meta-learning, or learning-to-learn, has seen a dramatic rise in interest recent years. Contrary to conventional approaches AI where given task is solved from scratch using fixed learning algorithm, meta-learning aims improve the algorithm itself, experience multiple episodes. This paradigm provides an opportunity tackle many challenges deep learning, including data and computation bottlenecks, as well fundamental issue generalization. In this survey we describe contemporary landscape. We first discuss definitions position it with respect related fields, such transfer multi-task hyperparameter optimization. then propose new taxonomy that more comprehensive breakdown space methods today. promising applications successes few-shot reinforcement architecture search. Finally, outstanding areas for future research.

Language: Английский

Citations

1328

pixelNeRF: Neural Radiance Fields from One or Few Images DOI
Alex Yu,

Vickie Ye,

Matthew Tancik

et al.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2021, Volume and Issue: unknown, P. 4576 - 4585

Published: June 1, 2021

We propose pixelNeRF, a learning framework that predicts continuous neural scene representation conditioned on one or few input images. The existing approach for constructing radiance fields [27] involves optimizing the to every independently, requiring many calibrated views and significant compute time. take step towards resolving these shortcomings by introducing an architecture conditions NeRF image inputs in fully convolutional manner. This allows network be trained across multiple scenes learn prior, enabling it perform novel view synthesis feed-forward manner from sparse set of (as as one). Leveraging volume rendering NeRF, our model can directly images with no explicit 3D supervision. conduct extensive experiments ShapeNet benchmarks single tasks held-out objects well entire unseen categories. further demonstrate flexibility pixelNeRF demonstrating multi-object real DTU dataset. In all cases, outperforms current state-of-the-art baselines reconstruction. For video code, please visit project website:https://alexyu.net/pixelnerf.

Language: Английский

Citations

1012

Efficient Geometry-aware 3D Generative Adversarial Networks DOI
Eric R. Chan, Connor Z. Lin, Matthew A. Chan

et al.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2022, Volume and Issue: unknown, P. 16102 - 16112

Published: June 1, 2022

Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using only collections single-view 2D photographs has been a long-standing challenge. Existing GANs are either compute intensive or make approximations that not 3D-consistent; the former limits quality resolution generated latter adversely affects multi-view consistency shape quality. In this work, we improve computational efficiency image without overly relying on these approximations. We introduce an expressive hybrid explicit implicit network architecture that, together with other design choices, synthesizes high-resolution in real time but also produces geometry. By decoupling feature neural rendering, our framework is able to leverage state-of-the-art CNN generators, such as StyleGAN2, inherit their expressiveness. demonstrate 3D-aware synthesis FFHQ AFHQ Cats, among experiments.

Language: Английский

Citations

670

Learning Representations by Maximizing Mutual Information Across Views DOI Creative Commons

Philip Bachman,

R Devon Hjelm,

William Buchwalter

et al.

arXiv (Cornell University), Journal Year: 2019, Volume and Issue: unknown

Published: Jan. 1, 2019

We propose an approach to self-supervised representation learning based on maximizing mutual information between features extracted from multiple views of a shared context. For example, one could produce local spatio-temporal context by observing it different locations (e.g., camera positions within scene), and via modalities tactile, auditory, or visual). Or, ImageNet image provide which produces repeatedly applying data augmentation. Maximizing these requires capturing about high-level factors whose influence spans -- e.g., presence certain objects occurrence events. Following our proposed approach, we develop model learns representations that significantly outperform prior methods the tasks consider. Most notably, using learning, achieve 68.1% accuracy standard linear evaluation. This beats results over 12% concurrent 7%. When extend use mixture-based representations, segmentation behaviour emerges as natural side-effect. Our code is available online: https://github.com/Philip-Bachman/amdim-public.

Language: Английский

Citations

663

Reinforcement Learning, Fast and Slow DOI Creative Commons
Matthew Botvinick,

Sam Ritter,

Jane X. Wang

et al.

Trends in Cognitive Sciences, Journal Year: 2019, Volume and Issue: 23(5), P. 408 - 422

Published: April 17, 2019

Recent AI research has given rise to powerful techniques for deep reinforcement learning. In their combination of representation learning with reward-driven behavior, would appear have inherent interest psychology and neuroscience. One reservation been that procedures demand large amounts training data, suggesting these algorithms may differ fundamentally from those underlying human While this concern applies the initial wave RL techniques, subsequent work established methods allow systems learn more quickly efficiently. Two particularly interesting promising center, respectively, on episodic memory meta-learning. Alongside as leveraging meta-learning direct implications subtle but critically important insight which bring into focus is fundamental connection between fast slow forms Deep (RL) driven impressive advances in artificial intelligence recent years, exceeding performance domains ranging Atari Go no-limit poker. This progress drawn attention cognitive scientists interested understanding However, raised be too sample-inefficient – is, it simply provide a plausible model how humans learn. present review, we counter critique by describing recently developed operate nimbly, solving problems much than previous methods. Although were an context, propose they rich A key insight, arising methods, concerns slower, incremental Over just past few revolutionary occurred (AI) research, where resurgence neural network or 'deep learning' [1LeCun Y. et al.Deep learning.Nature. 2015; 521: 436Crossref PubMed Scopus (42113) Google Scholar, 2Goodfellow I. Learning. Vol. 1. MIT Press, 2016Google Scholar] fueled breakthroughs image [3Krizhevsky A. al.Imagenet classification convolutional networks.Adv. Neural Inf. Process. Syst. 2012; : 1097-1105Google 4Eslami S.M.A. al.Neural scene rendering.Science. 2018; 360: 1204-1210Crossref (264) Scholar], natural language processing [5Bahdanau D. machine translation jointly align translate.arXiv. 2014; 1409.0473Google 6Van Den Oord al.Wavenet: generative raw audio.arXiv. 2016; 1609.03499Google many other areas. These developments attracted growing psychologists, psycholinguists, neuroscientists, curious about whether might suggest new hypotheses concerning cognition brain function [7Marblestone A.H. al.Toward integration neuroscience.Front. Comput. Neurosci. 10: 94Crossref (316) 8Song H.F. al.Reward-based recurrent networks value-based tasks.eLife. 2017; 6: e21492Crossref (1) 9Yamins D.L.K. DiCarlo J.J. Using goal-driven models understand sensory cortex.Nat. 19: 356Crossref (650) 10Sussillo al.A finds naturalistic solution production muscle activity.Nat. 18: 1025Crossref (229) 11Khaligh-Razavi S.-M. Kriegeskorte N. supervised, not unsupervised, explain cortical representation.PLoS Biol. e1003915Crossref (554) Scholar]. area appears inviting perspective (Box 1). marries modeling (see Glossary) learning, set rewards punishments rather explicit instruction [12Sutton R.S. Barto A.G. Reinforcement Learning: An Introduction. 2018Google After decades aspirational practical idea, within 5 years exploded one most intense areas generating super-human tasks video games [13Mnih V. al.Human-level control through 518: 529Crossref (13741) poker [14Moravčík M. al.Deepstack: expert-level heads-up poker.Science. 356: 508-513Crossref (431) multiplayer contests [15Jaderberg first-person population-based learning.arXiv. 1807.01281Google complex board games, including go chess [16Silver al.Mastering game tree search.Nature. 529: 484Crossref (8554) 17Silver shogi self-play general algorithm.arXiv. 1712.01815Google 18Silver without knowledge.Nature. 550: 354Crossref (4476) 19Silver algorithm masters chess, shogi, self-play.Science. 362: 1140-1144Crossref (1270) Scholar].Box 1Deep LearningRL centers problem behavioral policy, mapping states situations actions, maximizes cumulative long-term reward simple settings, policy can represented look-up table, listing appropriate action any state. richer environments, however, kind infeasible, must instead encoded implicitly parameterized function. Pioneering 1990s showed could approximated using multilayer (or deep) ([78Tesauro G. Temporal difference td-gammon.Commun. ACM. 1995; 38: 58-68Crossref (964) L.J. Lin, PhD Thesis, Carnegie Melon University, 1993), allowing gradient-descent discover rich, nonlinear mappings perceptual inputs actions panel below). technical challenges prevented until 2015, when breakthrough demonstrated made such Figure IB Since then, rapid toward improving scaling [79Hessel al.Rainbow: combining improvements 1710.02298Google its application task Capture Flag [80Jaderberg al.Population based networks.arXiv. 1711.09846Google cases, later involved integrating architectural algorithmic complements, search slot-based, episodic-like [52Graves al.Hybrid computing dynamic external memory.Nature. 538: 471Crossref (801) IC Other focused goal speed, make observations, reviewed main text.The figure illustrates evolution starting Tesauro's groundbreaking backgammon-playing system 'TD-gammon' [78Tesauro centered took input learned output estimate 'state value,' defined expected future rewards, here equal estimated probability eventually winning current position. Panel B shows Atari-playing DQN reported Mnih colleagues Here, Scholar]) takes screen pixels learns joystick actions. C schematic state-of-the art Wayne [51Wayne al.Unsupervised predictive goal-directed agent.arXiv. 1803.10760Google full description detailed 'wiring' agent beyond scope paper (but found Scholar]). indicates, architecture comprises multiple modules, leverages predict upcoming events, 'speaks' reinforcement-learning module selects predictor module's The learns, among tasks, perform navigation maze-like shown text. Beyond topic, hold special mechanisms drive originally inspired animal conditioning [20Sutton Toward modern theory adaptive networks: expectation prediction.Psychol. Rev. 1981; 88: 135Crossref (924) are believed relate closely reward-based centering dopamine [21Schultz W. substrate prediction reward.Science. 1997; 275: 1593-1599Crossref (5895) At same time, representations support generalization transfer, abilities biological brains. Given connections, offer source ideas researchers both at neuroscientific levels. And indeed, started take notice commentary first also sounded note caution. On blush fashion quite different humans. hallmark difference, argued, lies sample efficiency versus RL. Sample refers amount data required attain chosen target level performance. measure, indeed drastically learners. To expert human-level orders magnitude experts themselves [22Tsividis P.A. al.Human Atari.2017 AAAI Spring Symposium Series. short, RL, least incarnation, Or so argument gone [23Lake B.M. concept probabilistic program induction.Science. 350: 1332-1338Crossref (1414) 24Marcus learning: critical appraisal.arXiv. 1801.00631Google applicable beginning around 2013 (e.g., [25Mnih al.Playing atari 2013; 1312.5602Google even short time since innovations show dramatically increased. mitigate original demands huge effectively fast. emergence computational revives candidate consider two problem: meta-RL. We examine enable potential point considering why fact slow. describe primary sources inefficiency. end paper, will circle back constellations issues described concepts connected. slowness requirement parameter adjustment. Initial (which still very widely used research) employed gradient descent sculpt connectivity outputs As discussed only [26Kumaran al.What do intelligent agents need? complementary updated.Trends Cogn. Sci. 20: 512-534Abstract Full Text PDF (278) adjustments during form small, order maximize [27Hardt al.Train faster, generalize better: stability stochastic descent.arXiv. 1509.01240Google avoid overwriting effects earlier (an effect sometimes referred 'catastrophic interference'). small step-sizes proposed second weak inductive bias. basic lesson procedure necessarily faces bias–variance trade-off: stronger assumptions makes patterns (i.e., bias procedure) less accomplished (assuming matches what's data!). able master wider range (greater variance), sample-efficient [28Bishop C.M. Pattern Recognition Machine Learning (information science statistics). Springer-Verlag, 2006Google effect, strong what allows considers narrow interpreting incoming will, perforce, hone correct hypothesis rapidly weaker biases (again, assuming falls range). Importantly, generic extremely low-bias systems; parameters (connection weights) capable fit wide data. dictated trade-off, means networks, 1) tend sample-inefficient, requiring Together, factors—incremental adjustment bias—explain first-generation models. clear factors mitigated, proceed manner. follows, specific confronts problem, addition field, bear suggestive links neuroscience, shall detail. If then way faster updating. Naively increasing rate governing optimization leads catastrophic interference. there another accomplish goal, keep record use directly reference making decisions. [29Pritzel control.arXiv. 1703.01988Google 30Gershman S.J. Daw N.D. animals: integrative framework.Annu. Psychol. 68: 101-128Crossref (47) 42Lengyel Dayan P. Hippocampal contributions control: third way.Adv. 2008; 889-896Google parallels 'non-parametric' approaches resembles 'instance-' 'exemplar-based' theories [31Logan G.D. instance automatization.Psychol. 1988; 95: 492Crossref (2336) 32Smith E.E. Medin D.L. Categories Concepts. 9. Harvard University 1981Google When situation encountered decision take, compare internal stored situations. associated highest value, outcomes similar present. state computed network, refer resulting 'episodic RL'. explanation mechanics presented Box 2.Box 2Episodic RLEpisodic value memories [30Gershman 43Bornstein A.M. al.Reminders choices decisions humans.Nat. Commun. 8: 15958Crossref (106) 44Bornstein Norman K.A. Reinstated context guides sampling-based reward.Nat. 997Crossref (82) Consider, example, valuation depicted I, wherein stores each along discounted sum obtained next n steps. items comprise followed. state, computes weighted similarity (sim.) extended values recording taken sums store, querying store find to-be-evaluated was taken. fact, [81Blundell C. al.Model-free 1606.04460Google achieve games.The success depends compute similarity. follow-up Pritzel al. improved gradually shaping results 57 Environment showcasing benefits (representation) (value) Episodic games. unlike standard approach, information gained experienced event leveraged immediately guide behavior. whereas 'fast' went 'slow,' twist story:

Language: Английский

Citations

609

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields DOI
Michael Niemeyer, Andreas Geiger

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2021, Volume and Issue: unknown

Published: June 1, 2021

Deep generative models allow for photorealistic image synthesis at high resolutions. But many applications, this is not enough: content creation also needs to be controllable. While several recent works investigate how disentangle underlying factors of variation in the data, most them operate 2D and hence ignore that our world three-dimensional. Further, only few consider compositional nature scenes. Our key hypothesis incorporating a 3D scene representation into model leads more controllable synthesis. Representing scenes as neural feature fields allows us one or multiple objects from background well individual objects' shapes appearances while learning unstructured unposed collections without any additional supervision. Combining with rendering pipeline yields fast realistic model. As evidenced by experiments, able translating rotating changing camera pose.

Language: Английский

Citations

592

DeepVoxels: Learning Persistent 3D Feature Embeddings DOI
Vincent Sitzmann, Justus Thies, Felix Heide

et al.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2019, Volume and Issue: unknown, P. 2432 - 2441

Published: June 1, 2019

In this work, we address the lack of 3D understanding generative neural networks by introducing a persistent feature embedding for view synthesis. To end, propose DeepVoxels, learned representation that encodes view-dependent appearance scene without having to explicitly model its geometry. At core, our approach is based on Cartesian grid embedded features learn make use underlying structure. Our combines insights from geometric computer vision with recent advances in learning image-to-image mappings adversarial loss functions. DeepVoxels supervised, requiring reconstruction scene, using 2D re-rendering and enforces perspective multi-view geometry principled manner. We apply problem novel synthesis demonstrating high-quality results variety challenging scenes.

Language: Английский

Citations

565

pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis DOI
Eric R. Chan, Marco Aurélio Alvarenga Monteiro, Petr Kellnhofer

et al.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2021, Volume and Issue: unknown, P. 5795 - 5805

Published: June 1, 2021

We have witnessed rapid progress on 3D-aware image synthesis, leveraging recent advances in generative visual models and neural rendering. Existing approaches how-ever fall short two ways: first, they may lack an under-lying 3D representation or rely view-inconsistent rendering, hence synthesizing images that are not multi-view consistent; second, often depend upon network architectures expressive enough, their results thus quality. propose a novel model, named Periodic Implicit Generative Adversarial Networks (π-GAN pi-GAN), for high-quality synthesis. π-GAN leverages representations with periodic activation functions volumetric rendering to represent scenes as view-consistent radiance fields. The proposed approach obtains state-of-the-art synthesis multiple real synthetic datasets.

Language: Английский

Citations

559

Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes DOI
Zhengqi Li, Simon Niklaus, Noah Snavely

et al.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2021, Volume and Issue: unknown, P. 6494 - 6504

Published: June 1, 2021

We present a method to perform novel view and time synthesis of dynamic scenes, requiring only monocular video with known camera poses as input. To do this, we introduce Neural Scene Flow Fields, new representation that models the scene time-variant continuous function appearance, geometry, 3D motion. Our is optimized through neural network fit observed input views. show our can be used for varieties in-the-wild including thin structures, view-dependent effects, complex degrees conduct number experiments demonstrate approach significantly outperforms recent methods, qualitative results space-time on variety real-world videos.

Language: Английский

Citations

483

Pluralistic Image Completion DOI
Chuanxia Zheng, Tat‐Jen Cham, Jianfei Cai

et al.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2019, Volume and Issue: unknown, P. 1438 - 1447

Published: June 1, 2019

Most image completion methods produce only one result for each masked input, although there may be many reasonable possibilities. In this paper, we present an approach pluralistic - the task of generating multiple and diverse plausible solutions completion. A major challenge faced by learning-based approaches is that usually ground truth training instance per label. As such, sampling from conditional VAEs still leads to minimal diversity. To overcome this, propose a novel probabilistically principled framework with two parallel paths. One reconstructive path utilizes given get prior distribution missing parts rebuild original distribution. The other generative which coupled obtained in path. Both are supported GANs. We also introduce new short+long term attention layer exploits distant relations among decoder encoder features, improving appearance consistency. When tested on datasets buildings (Paris), faces (CelebA-HQ), natural images (ImageNet), our method not generated higher-quality results, but outputs.

Language: Английский

Citations

476