Deep Declarative Networks
The past several years have seen an explosion of interest in generative modeling: unsupervised models which learn to synthesize new elements from the training data domain. Such models have been used to breathtaking effect for generating realistic images, especially of human faces, which are in some cases indistinguishable from reality. The unsupervised latent representations learned by these models can also prove powerful when used as feature sets for supervised learning tasks.
Thus far, the vision community’s attention has mostly focused on generative models of 2D images. However, in computer graphics, there has been a recent surge of activity in generative models of three-dimensional content: learnable models which can synthesize novel 3D objects, or even larger scenes composed of multiple objects. As the vision community turns from passive internet-images based vision toward more embodied vision tasks, these kinds of 3D generative models become increasingly important: as unsupervised feature learners, as training data synthesizers, as a platform to study 3D representations for 3D vision tasks, and as a way of equipping an embodied agent with a 3D `imagination’ about the kinds of objects and scenes it might encounter.
With this workshop, we aim to bring together researchers working on generative models of 3D shapes and scenes with researchers and practitioners who can use these generative models to improve embodied vision tasks. For our purposes, we define “generative model” to include methods that synthesize geometry unconditionally as well as from sensory inputs (e.g. images), language, or other high-level specifications. Vision tasks that can benefit from such models include scene classification and segmentation, 3D reconstruction, human activity recognition, robotic visual navigation, question answering, and more.