Home » Data Science » Design » Engineering » Explaining of Israel » Technology » Neural Information Processing Systems Conference

Neural Information Processing Systems Conference

December 13, 2019 sherry 0 Data Science, Design, Engineering, Explaining of Israel, Technology,

By Lin Xiao, Alina Beygelzimer, Emily Fox, Florence d’Alché-Buc and Hugo Larochelle
NeurIPS 2019 Program Chairs
Neural Information Processing Systems Conference

With this blog post, it is our pleasure to unveil the NeurIPS paper awards for 2019, and share more information on the selection process for these awards.

Outstanding Paper Awards

We’re continuing the tradition of highlighting some of the most notable accepted papers at the conference. The NeurIPS 2019 Outstanding Paper Committee includes of Bob Williamson, Michele Sebag, Samuel Kaski, Brian Kingsbury and Andreas Krause. They made their recommendations as follows.

We asked the Outstanding Paper Committee to choose from the set of papers that had been selected for oral presentation. Before looking at the papers, they agreed on the following criteria to guide their selection.

Potential to endure — Focused on the main game, not sidelines. Likely that people will still care about this in decades to come.
Insight — Provides new (and hopefully deep) understanding; does not just show a few percentage points of improvement.
Creativity / Unexpectedness / Wow — Looks at the problem in a creatively new way; a result that really surprises the reader.
Revolutionary — Will radically change the way people will think in the future.
Rigour — Unimpeachable rigour and care.
Elegance — Beautiful, clean, slick, polished.
Realism — Does not over-claim the significance.
Scientific — Actually falsifiable.
Reproducible — The results are actually reproducible; code available, and it works on a wide range of machines; data available; full detailed proofs.

They also agreed on some criteria that they would like to avoid:

Inefficient — Steering away from work that only stand out due to resource profligacy (achieved a higher league table ranking largely by virtue of squandering huge resources).
Trendiness — An approach taken because an idea is fashionable but could be accessed in a different more effective way using other approaches.
Over Complicated — The paper engaged in unnecessary complexity.

Finally, they determined it appropriate to introduce an additional Outstanding New Directions Paper Award, to highlight work that distinguished itself in setting a novel avenue for future research.

They had access to the papers, the reviewer reports and comments from the (senior) area chairs.

They then did an initial triage to come up with a short list of three papers and an extended list of eight papers. The committee members then each individually evaluated these eight papers on their own. Each committee member gave each paper a ranking. They then shared these rankings with each other. In one case the committee members sought additional expert opinion (and took it into account in their decision making).

Ultimately, the committee members were in strong agreement independently, and after brief discussion reached the following consensus recommendation for outstanding paper awards:

Outstanding Paper Award

Distribution-Independent PAC Learning of Halfspaces with Massart Noise
The paper studies the learning of linear threshold functions for binary classification in the presence of unknown, bounded label noise in the training data. It solves a fundamental, and long-standing open problem by deriving an efficient algorithm for learning in this case. This paper makes tremendous progress on a long-standing open problem at the heart of machine learning: efficiently learning half-spaces under Massart noise. To give a simple example highlighted in the paper, even weak learning disjunctions (to error 49%) under 1% Massart noise was open. This paper shows how to efficiently achieve excess risk equal to the Massart noise level plus epsilon (and runs in time poly(1/epsilon), as desired). The algorithmic approach is sophisticated and the results are technically challenging to establish. The final goal is to be able to efficiently get excess risk equal to epsilon (in time poly(1/epsilon)).

Outstanding New Directions Paper Award

Uniform convergence may be unable to explain generalization in deep learning
The paper presents what are essentially negative results showing that many existing (norm based) bounds on the performance of deep learning algorithms don’t do what they claim. They go on to argue that they can’t do what they claim when they continue to lean on the machinery of two-sided uniform convergence. While the paper does not solve (nor pretend to solve) the question of generalisation in deep neural nets, it is an “instance of the fingerpost (to use Francis Bacon’s phrase) pointing the community to look in a different place.

The committee members also wanted to highlight the following papers for honorable mentions:

Honorable Mention Outstanding Paper Award

Nonparametric Density Estimation & Convergence Rates for GANs under Besov IPM Losses
The paper shows, in a rigorous theoretical manner, that GANs can outperform linear methods in density estimation (in terms of rates of convergence). Leveraging prior results on wavelet shrinkage, the paper offers new insight into the representational power of GANs. Specifically, the authors derive minimax convergence rates for non-parametric density estimation under a large class of losses (so-called integral probability metrics) within a large function class (Besov spaces). Reviewers felt this paper would have significant impact for researchers working on non-parametric estimation and GANs.
Fast and Accurate Least-Mean-Squares Solvers
Least Mean-Square solvers operate at the core of many ML algorithms, from linear and Lasso regression to singular value decomposition and Elastic net. The paper shows how to reduce their computational complexity by one or two orders of magnitude, with no precision loss and improved numerical stability. The approach relies on the Caratheodory theorem, establishing that a coreset (set of d2 + 1 points in dimension d) is sufficient to characterize all n points in a convex hull. The novelty lies in the divide-and-conquer algorithm proposed to extract a coreset with affordable complexity (O(nd + d5 log n), granted that d<< n). Reviewers emphasize the importance of the approach, for practitioners as the method can be easily implemented to improve existing algorithms, and for extension to other algorithms as the recursive partitioning principle of the approach lends itself to generalization.

Honorable Mention Outstanding New Directions Paper Award

Putting An End to End-to-End: Gradient-Isolated Learning of Representations
The paper revisits the layer-wise building of deep networks, using self-supervised criteria inspired from van Oord et al. (2018), specifically the mutual information between the representation of the current input, and input close in space or time. As noted by reviewers, such self-organization in perceptual networks might give food for thought at the cross-road of algorithmic perspectives (sidestepping end-to-end optimization, its huge memory footprint and computational issues), and cognitive perspectives (exploiting the notion of so-called slow features and going toward more “biologically plausible” learning processes).
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
The paper presents an elegant synthesis of two broad approaches in CV: the multiple view geometric, and the deep representations. Specifically, the paper makes three contributions: 1) A per-voxel neural renderer, which enables resolution-free rendering of a scene in a 3D aware manner; 2) A differentiable ray-marching algorithm, which solves the difficult problem of finding surface intersections along rays cast from a camera; and 3) A latent scene representation, which uses auto-encoders and hyper-networks to regress the parameters of the scene representation network.

Congratulations to all authors for their great contribution to our research community! We also thank the members of the selection committee for taking on what certainly was a very difficult task.

Test of Time Award

As in previous years we created a committee to select a paper published 10 years ago at NeurIPS and that was deemed to have had a particularly significant and lasting impact on our community.

Amir Globerson, Antoine Bordes, Francis Bach and Iain Murray agreed to take on this task, and we are extremely thankful for their meticulous and arduous work.

They started from a list of 18 accepted papers to NeurIPS 2009 that have had the most citations since their publication. They then focussed their search on papers that have enjoyed a sustained impact, meaning that recent papers are still meaningfully referring to and building on the work outlined in those papers. The committee also wanted to be able to identify a precise contribution to the field that made the selected paper stand out, and that the paper be well-written enough to be accessible to most of the community today. With these goals in mind, each member of the committee championed a short list of papers.

Ultimately, they identified the following paper that they felt struck the best balance of an important contribution, lasting impact, and broad appeal:

Dual Averaging Method for Regularized Stochastic Learning and Online Optimization

Congratulations to Lin Xiao for single-handedly having had such an enduring impact on our community!

Alina Beygelzimer, Emily Fox, Florence d’Alché-Buc, Hugo Larochelle
NeurIPS 2019 Program Chairs

From System 1 Deep Learning to System 2 Deep Learning

Deep Understanding: The Next Challenge for AI

0 0 votes

Article Rating

Tags: 2019 NeurIPS Paper Awards, A/B testing, academia, Accuracy, action, activation function, active learning, AdaGrad, administrative perspectives, agent, agglomerative clustering, AI, AI methods, analytics, analytics space, applied machine learning, ar, area under the PR curve, area under the ROC curve, artificial general intelligence, artificial intelligence, association, attribute, AUC (Area under the ROC Curve), Augmented Reality, Author response, automation bias, Automation of Machine-Learning, Autonomous driving, average precision, backpropagation, bag of words, baseline, batch, batch normalization, batch size, Bayesian, Bayesian neural network, Beginning of Review Submissions, Bellman equation, bias (ethics/fairness), bias (math), Big Data Analytics, bigram, binary classification, binning, boosting, broad spectrum, broadcasting, bucketing, calibration layer, Call the newspapers, candidate generation, candidate sampling, categorical data, centroid, centroid-based clustering, checkpoint, class, class-imbalanced dataset, classification, classification model, classification threshold, clipping, Cloud TPU, clustering, co-adaptation, collaborative filtering, comments, Computational neural networks, computer vision, Conference, confirmation bias, confusion matrix, content, Continuous delivery, continuous feature, convenience sampling, convergence, convex function, convex optimization, convex set, convolution, convolutional filter, convolutional layer, convolutional layers, convolutional neural network, convolutional operation, cost, counterfactual fairness, coverage bias, crash blossom, critic, Cross validation, cross-entropy, custom Estimator, Cyber, data analysis, data augmentation, Data Science, Data Scientists, data set or dataset, database, DataFrame, Dataset API (tf.data), decision boundary, decision threshold, decision tree, decision trees, deep learning, deep model, deep neural network, Deep Q-Network (DQN), delivering, demographic parity, dense feature, dense layer, depth, depthwise separable convolutional neural network (sepCNN), DESIGN, Developers, device, DevOps, Dialogue Bots, different hyperparameters, different initializations, different overall structure, dimension reduction, dimensions, discrete feature, discriminative model, discriminator, disparate impact, disparate treatment, divisive clustering, domains, downsampling, DQN, dropout regularization, dynamic model, eager execution, early stopping, Education, embedding space, embeddings, empirical risk minimization (ERM), engineering, ensemble, environment, Ethics of artificial intelligence, excellent, experience, experimenter’s bias, Explaining of Israel, fine tuning, Fintech, focus, forget gate, full softmax, fully connected layer, Future of AI, GAN, generalization, generalization curve, generalized linear model, generative adversarial network (GAN), generative model, generator, gradient, gradient clipping, gradient descent, Gradient descent algorithm, graph, graph execution, great success, greedy policy, ground truth, group attribution bias, hashing, Healthcare, helpful feedback, heuristic, hidden layer, hierarchical clustering, high-quality, hinge loss, holdout data, hyperparameter, hyperplane, i.i.d., ideas, image recognition, imbalanced dataset, implicit bias, improvement, in-group bias, incompatibility of fairness metrics, independently and identically distributed (i.i.d), individual fairness, industry, industry tracks, Inference, innovation, input function, input layer, instance, Intelligent robots, inter-rater agreement, interpretability, iot, item matrix, items, iteration, k-means, k-median, Keras, Kernel Support Vector Machines (KSVMs), label, labeled example, lambda, Large scale analytics, layer, Layers API (tf.layers), leading experts, learning rate, least squares regression, linear model, linear regression, Log Loss, log-odds, logistic regression, logits, Long Short-Term Memory (LSTM), loss, loss curve, loss surface, LSTM, Machine ethics, MACHINE LEARNING, majority class, Markov decision process (MDP), Markov property, matplotlib, matrix factorization, Mean Absolute Error (MAE), Mean Squared Error (MSE), metric, Metrics API (tf.metrics), mini-batch, mini-batch stochastic gradient descent (SGD), minimax loss, minority class, ML, MNIST, model, model function, model training, Momentum, momentum (Momentum), more complex math (Proximal and others), multi-class classification, multi-class logistic regression, multi-class regression, multinomial classification, N-gram, NaN trap, Natural language processing, Natural Language Understanding, negative class, Neural Information Processing Systems Conference, neural network, neural networks, NeurIPS 2019 data, NeurIPS Finals Program 2019, NeurIPS Meetings, NeurIPS printings for 2019 restoration, NeurIPS Program 2019, NeurIPS Tutorials 2019, NeurIPS Workshops 2019, neuron, NLU, node (neural network), node (TensorFlow graph), noise, non-response bias, normalization, numerical data, NumPy, objective, objective function, offline inference, one-hot encoding, one-shot learning, one-vs.-all, online inference, Operation (op), optimizer, organizing conference, out-group homogeneity bias, outliers, output layer, Overfitting, pandas, parameter, Parameter Server (PS), parameter update, partial derivative, participants, participation bias, partitioning strategy, perceptron, performance, perplexity, pipeline, policy, pooling, positive class, post-processing, PR AUC (area under the PR curve), pre-trained model, precision, precision-recall curve, prediction, prediction bias, predictive applications, predictive parity, predictive rate parity, premade Estimator, preparing for. The NeurIPS audit process 2019, preprocessing, prior belief, proxy (sensitive attributes), proxy labels, Q-function, Q-learning, quantile, quantile bucketing, quantization, queue, random forest, random policy, rank (ordinality), rank (Tensor), rater, re-ranking, real world, real-world domains, recall, recommendation system, Rectified Linear Unit (ReLU), recurrent neural network, Registration, regression, regression model, regularization, regularization rate, reinforcement learning, reinforcement learning (RL), replay buffer, reporting bias, representation, research and application, research innovations, research track, researchers, Retail, return, Review Guidelines, reward, ridge regularization, RNN, Robot rights, robotics, ROC (receiver operating characteristic) Curve, root directory, Root Mean Squared Error (RMSE), rotational invariance, sampling bias, SavedModel, Saver, scalar, scaling, scikit-learn, scoring, selection bias, semi-supervised learning, sensitive attribute, sequence model, serving, session (tf.session), shape (Tensor), sigmoid function, similarity measure, size invariance, sketching, softmax, sparse feature, sparse representation, sparse vector, sparsity, sparsity/regularization (Ftrl), spatial pooling, sponsorship, squared hinge loss, squared loss, Start of Reviewing, Start Submissions, Starting, State, state-action value function, state-of-the-art, static model, stationarity, step, step size, stochastic gradient descent (SGD), stride, structural risk minimization (SRM), Submissions and the beginning of the review, subsampling, summary, supervised learning, supervised machine learning, support vector machines, synthetic feature, Systems for ML, tabular Q-learning, target, target network, technical presentations, TECHNOLOGY, temporal data, Tensor, Tensor Processing Unit (TPU), Tensor rank, Tensor shape, Tensor size, TensorBoard, TensorFlow, TensorFlow Playground, TensorFlow Serving, termination condition, test set, tf.Example, tf.keras, The conference, The final program for NeurIPS 2019, The NeurIPS 2019 ticketing process, The Summit, Threat to human dignity, time series analysis, timestep, topics, tower, TPU, TPU chip, TPU device, TPU master, TPU node, TPU Pod, TPU resource, TPU slice, TPU type, TPU worker, training, training set, trajectory, transfer learning, translational invariance, trigram, true negative (TN), true positive (TP), true positive rate (TPR), tutorial, unawareness (to a sensitive attribute), underfitting, unlabeled example, unsupervised learning, unsupervised machine learning, update frequency, upweighting, user matrix, validation, validation set, vanishing gradient problem, Wasserstein loss, Weaponization of AI, weight, Weighted Alternating Least Squares (WALS), wide model, width, Workflow managers

About The Author

sherry

CODESIGN.BLOG

0 Comments

Inline Feedbacks

View all comments