zavrel8
- Jul 22, 2022
- 10 min read

Can AI help us understand ICML 2022?

An automated conference overview with Zeta Alpha.

For some of us “Deep Learning is All We Need”, for others it’s a bit more diverse. But for all of us, ICML (International Conference on Machine Learning) is one of the highlights of the AI R&D calendar.

This week ICML 2022 is live in Baltimore. With over 1200 papers, a challenging and almost impossible task for any AI researcher is to pick the most important papers to read. At Zeta Alpha, we have the full set of ICML 2022 papers available for discovery to explore your favorite topics. But how can we make sense of the whole conference, and get an impression of what is going on in Baltimore? What is trending and impactful at ICML this year? Can modern AI actually help with that?

First, we embed all of the documents as vectors using Transformer based Large Language Models. This already allows us to make a semantic map of the conference.

The clusters in the visualization are based on content of the papers and their similarity, which allows us to identify topical groups and browse related work. Nodes in the graph are individual papers and the size of the node is our estimate or prediction of the paper’s future impact (more on that below). But how do we find out what the different clusters in the map stand for? GPT-3 to the rescue: labeling the clusters

This is where we enlisted the help of GPT-3. Since this 175B parameter language model is quite good at generating coherent text, we prompt-engineered our way to generate a short cluster summary. Here's a few examples, where we give GPT-3 some titles from the cluster and it generates a pretty reasonable summary:


Cluster 2
Modeling Irregular Time Series with Continuous Recurrent Units
NOMU: Neural Optimization-based Model Uncertainty
3D Infomax improves GNNs for Molecular Property Prediction
Input Dependent Sparse Gaussian Processes
Training Discrete Deep Generative Models via Gapped Straight-Through Estimator
Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models
Generative Coarse-Graining of Molecular Conformations
Structure-preserving GANs
SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization
Reconstructing Nonlinear Dynamical Systems from Multi-Modal Time Series
GPT-3 Summary: Variational methods for deep generative models.

Cluster 1
Generic Coreset for Scalable Learning of Monotonic Kernels: Logistic Regression, Sigmoid and more
Causal structure-based root cause analysis of outliers
Information Discrepancy in Strategic Learning
Nonparametric Sparse Tensor Factorization with Hierarchical Gamma Processes
Entropic Gromov-Wasserstein between Gaussian Distributions
Improving Mini-batch Optimal Transport via Partial Transportation
Streaming Inference for Infinite Feature Models
IDYNO: Learning Nonparametric DAGs from Interventional Dynamic Data
Easy Variational Inference for Categorical Models via an Independent Binary Approximation
Instrumental Variable Regression with Confounder Balancing
GPT-3 Summary: Methods for learning from data with different structures.

Cluster 7
Analyzing and Mitigating Interference in Neural Architecture Search
Position Prediction as an Effective Pretraining Strategy
Self-Supervised Models of Audio Effectively Explain Human Cortical Responses to Speech
Learning Multiscale Transformer Models for Sequence Generation
Dialog Inpainting: Turning Documents into Dialogs
Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Label-Free Explainability for Unsupervised Models
What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization?
Generalization and Robustness Implications in Object-Centric Learning
GPT-3 Summary: These machine learning paper titles have in common that they all focus on ways to improve models, either by making them more robust or by increasing their data efficiency.

Cluster 3
DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck
Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution
Large Batch Experience Replay
Learning Pseudometric-based Action Representations for Offline Reinforcement Learning
Off-Policy Evaluation for Large Action Spaces via Embeddings
Lyapunov Density Models: Constraining Distribution Shift in Learning-Based Control
Robust Imitation Learning against Variations in Environment Dynamics
Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error
On the Role of Discount Factor in Offline Reinforcement Learning
Flow-based Recurrent Belief State Learning for POMDPs
GPT-3 Summary: Reinforcement learning methods.

Human-in-the-loop goes a long way, and in a few minutes of editing you can get a pretty good sense of the topics. It surely beats reading through the original list of all papers published in https://proceedings.mlr.press/v162/ and ordered alphabetically by the last name of the first author. Predicting Impact

But how do we know which papers are worth our attention, when we have very limited time? Forgetting for a moment about the power of serendipity, we care about the conference because it helps us pick out those papers that are most likely to have an impact on our own work. So we would like to have an automated estimate of that. How do we get that? To estimate the predicted impact, we look at a number of signals, and we make use of the fact that most of the papers at a conference have already appeared on arXiv before the conference, while the conference versions do not get published until the conference date. These arXiv versions have already been collecting mentions on social media (Twitter), and citations from other papers. In addition to that we have an estimate of the influence of an author based on their h-index. At the time of the conference, the Outstanding papers are also announced by the program committee, which we also add into the mix. We do a simple combination of these four scores to calculate an impact score, that allows us to rank and pick papers that stand out in each of the topic clusters. These things combined make for an automated conference overview generator. Another step in the journey towards an automated research assistant with Zeta Alpha.

So here are our top rated papers per cluster:

Graph Neural Networks

“G-Mixup: Graph Data Augmentation for Graph Classification” by Xiaotian Han, Zhimeng Jiang, Ninghao Liu, Xia Hu | (ICML Outstanding Paper) | 0 Twitter mentions | 0 Citations | Authors’ H-index avg 3.6

“Local Augmentation for Graph Neural Networks” by Songtao Liu, Rex Ying, Hanze Dong, Lanqing Li, Tingyang Xu, Yu Rong, Peilin Zhao, Junzhou Huang, Dinghao Wu | 0 Twitter mentions | 2 Citations | Authors’ H-index avg 3.9

“Knowledge Base Question Answering by Case-based Reasoning over Subgraphs” by Rajarshi Das, Ameya Godbole, Ankita Naik, Elliot Tower, Manzil Zaheer, Hannaneh Hajishirzi, Robin Jia, Andrew Mccallum | 0 Twitter mentions | 0 Citations | Authors’ H-index avg 7.6

“The CLRS Algorithmic Reasoning Benchmark” by Petar Veličković, Adrià Puigdomènech Badia, David Budden, Razvan Pascanu, Andrea Banino, Misha Dashevskiy, Raia Hadsell, Charles Blundell | 0 Twitter mentions | 0 Citations | Authors’ H-index avg 6.7

“p p -Laplacian Based Graph Neural Networks” by Guoji Fu, Peilin Zhao, Yatao Bian | 2 Twitter mentions | 0 Citations | Authors’ H-index avg 3.0

Deep Generative Models

“3D Infomax improves GNNs for Molecular Property Prediction” by Hannes Stärk, Dominique Beaini, Gabriele Corso, Prudencio Tossou, Christian Dallago, Stephan Günnemann, Pietro Lió | 83 Twitter mentions | 2 Citations | Authors’ H-index avg 1.8

“Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization” by Brandon Trabucco, Xinyang Geng, Aviral Kumar, Sergey Levine | 32 Twitter mentions | 0 Citations | Authors’ H-index avg 8.7

“Scaling-up Diverse Orthogonal Convolutional Networks by a Paraunitary Framework” by Jiahao Su, Wonmin Byeon, Furong Huang | 27 Twitter mentions | 0 Citations | Authors’ H-index avg 2.3

“A Neural Tangent Kernel Perspective of GANs” by Jean-Yves Franceschi, Emmanuel De Bézenac, Ibrahim Ayed, Mickael Chen, Sylvain Lamprier, Patrick Gallinari | 23 Twitter mentions | 0 Citations | Authors’ H-index avg 1.7

“Generative Modeling for Multi-task Visual Learning” by Zhipeng Bao, Martial Hebert, Yu-Xiong Wang | 3 Twitter mentions | 0 Citations | Authors’ H-index avg 12.5

Different ways of training Neural Networks

“Prioritized Training on Points that are Learnable, Worth Learning, and not yet Learnt” by Sören Mindermann, Jan M Brauner, Muhammed T Razzak, Mrinank Sharma, Andreas Kirsch, Winnie Xu, Benedikt Höltgen, Aidan N Gomez, Adrien Morisot, Sebastian Farquhar, Yarin Gal | 159 Twitter mentions | 2 Citations | Authors’ H-index avg 3.8

“Wide Neural Networks Forget Less Catastrophically” by Seyed Iman Mirzadeh, Arslan Chaudhry, Dong Yin, Huiyi Hu, Razvan Pascanu, Dilan Gorur, Mehrdad Farajtabar | 16 Twitter mentions | 0 Citations | Authors’ H-index avg 5.0

“Secure Quantized Training for Deep Learning” by Marcel Keller, Ke Sun | 22 Twitter mentions | 0 Citations | Authors’ H-index avg 2.7

“Bayesian Model Selection, the Marginal Likelihood, and Generalization” by Sanae Lotfi, Pavel Izmailov, Gregory Benton, Micah Goldblum, Andrew Gordon Wilson | (ICML Outstanding Paper) | 0 Twitter mentions | 0 Citations | Authors’ H-index avg 4.2

“Structured Stochastic Gradient MCMC” by Antonios Alexos, Alex J Boyd, Stephan Mandt | 4 Twitter mentions | 0 Citations | Authors’ H-index avg 5.4

Federated Learning and optimization

“Proximal and Federated Random Reshuffling” by Konstantin Mishchenko, Ahmed Khaled, Peter Richtarik | 13 Twitter mentions | 6 Citations | Authors’ H-index avg 5.3

“Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data” by Gautam Kamath, Xingtu Liu, Huanyu Zhang | 11 Twitter mentions | 1 Citations | Authors’ H-index avg 3.5

“Stochastic Reweighted Gradient Descent” by Ayoub El Hanchi, David Stephens, Chris Maddison | 7 Twitter mentions | 0 Citations | Authors’ H-index avg 5.2

“Privacy for Free: How does Dataset Condensation Help Privacy?” by Tian Dong, Bo Zhao, Lingjuan Lyu | (ICML Outstanding Paper) | 0 Twitter mentions | 0 Citations | Authors’ H-index avg 4.0

“FedNL: Making Newton-Type Methods Applicable to Federated Learning” by Mher Safaryan, Rustem Islamov, Xun Qian, Peter Richtarik | 7 Twitter mentions | 0 Citations | Authors’ H-index avg 4.5

Contrastive & Self-supervised learning, Representations

“A Study of Face Obfuscation in ImageNet” by Kaiyu Yang, Jacqueline H. Yau, Li Fei-Fei, Jia Deng, Olga Russakovsky | 26 Twitter mentions | 8 Citations | Authors’ H-index avg 5.7

“Combining Diverse Feature Priors” by Saachi Jain, Dimitris Tsipras, Aleksander Madry | 21 Twitter mentions | 0 Citations | Authors’ H-index avg 4.8

“Learning Stable Classifiers by Transferring Unstable Features” by Yujia Bao, Shiyu Chang, Dr.Regina Barzilay | 18 Twitter mentions | 0 Citations | Authors’ H-index avg 5.2

“When and How Mixup Improves Calibration” by Linjun Zhang, Zhun Deng, Kenji Kawaguchi, James Zou | 37 Twitter mentions | 1 Citations | Authors’ H-index avg 2.3

“Frustratingly Easy Transferability Estimation” by Long-Kai Huang, Junzhou Huang, Yu Rong, Qiang Yang, Ying Wei | 6 Twitter mentions | 2 Citations | Authors’ H-index avg 6.0

Large Language Models, Video Transformers, Pre-training

“NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework” by Xingcheng Yao, Yanan Zheng, Xiaocong Yang, Zhilin Yang | 90 Twitter mentions | 0 Citations | Authors’ H-index avg 8.4

“Generalization and Robustness Implications in Object-Centric Learning” by Andrea Dittadi, Samuele S Papa, Michele De Vita, Bernhard Schölkopf, Ole Winther, Francesco Locatello | 65 Twitter mentions | 0 Citations | Authors’ H-index avg 4.5

“Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts” by Yan Zeng, Xinsong Zhang, Hang Li | 30 Twitter mentions | 0 Citations | Authors’ H-index avg 4.8

“Understanding Dataset Difficulty with KaTeX parse error: Undefined control sequence: \mathcalV at position 1: \̲m̲a̲t̲h̲c̲a̲l̲V̲-Usable Information” by Kawin Ethayarajh, Yejin Choi, Swabha Swayamdipta | (ICML Outstanding Paper) | 0 Twitter mentions | 0 Citations | Authors’ H-index avg 7.3

“Analyzing and Mitigating Interference in Neural Architecture Search” by Jin Xu, Xu Tan, Kaitao Song, Renqian Luo, Yichong Leng, Tao Qin, Tie-Yan Liu, Jian Li | 4 Twitter mentions | 2 Citations | Authors’ H-index avg 4.8

Machine Learning Theory

“Approximate Frank-Wolfe Algorithms over Graph-structured Support Sets” by Baojian Zhou, Yifan Sun | 13 Twitter mentions | 0 Citations | Authors’ H-index avg 3.2

“Entropic Gromov-Wasserstein between Gaussian Distributions” by Khang Le, Dung Q Le, Huy Nguyen, Dat Do, Tung Pham, Nhat Ho | 11 Twitter mentions | 2 Citations | Authors’ H-index avg 1.8

“Convergence of Uncertainty Sampling for Active Learning” by Anant Raj, Francis Bach | 3 Twitter mentions | 0 Citations | Authors’ H-index avg 11.2

“Nonparametric Sparse Tensor Factorization with Hierarchical Gamma Processes” by Conor Tillinghast, Zheng Wang, Shandian Zhe | 4 Twitter mentions | 2 Citations | Authors’ H-index avg 3.7

“Generic Coreset for Scalable Learning of Monotonic Kernels: Logistic Regression, Sigmoid and more” by Elad Tolochinksy, Ibrahim Jubran, Dan Feldman | 0 Twitter mentions | 12 Citations | Authors’ H-index avg 2.5

Robustness, Adversarial Attacks

“Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them” by Florian Tramer | 5 Twitter mentions | 2 Citations | Authors’ H-index avg 4.9

“Robust Models Are More Interpretable Because Attributions Look Normal” by Zifan Wang, Matt Fredrikson, Anupam Datta | 3 Twitter mentions | 2 Citations | Authors’ H-index avg 2.8

“Building Robust Ensembles via Margin Boosting” by Dinghuai Zhang, Hongyang Zhang, Aaron Courville, Yoshua Bengio, Pradeep Ravikumar, Arun Sai Suggala | 0 Twitter mentions | 0 Citations | Authors’ H-index avg 15.0

“Robustness and Accuracy Could Be Reconcilable by (Proper) Definition” by Tianyu Pang, Min Lin, Xiao Yang, Jun Zhu, Shuicheng Yan | 0 Twitter mentions | 0 Citations | Authors’ H-index avg 11.0

“Metric-Fair Classifier Derandomization” by Jimmy Wu, Yatong Chen, Yang Liu | 0 Twitter mentions | 0 Citations | Authors’ H-index avg 10.0

Multi-agent RL, Bandits, and learning from preferences

“Thresholded Lasso Bandit” by Kaito Ariu, Kenshi Abe, Alexandre Proutiere | 3 Twitter mentions | 2 Citations | Authors’ H-index avg 4.4

“Stabilizing Q-learning with Linear Architectures for Provable Efficient Learning” by Andrea Zanette, Martin Wainwright | 0 Twitter mentions | 0 Citations | Authors’ H-index avg 10.7

“A Simple Unified Framework for High Dimensional Bandit Problems” by Wenjie Li, Adarsh Barik, Jean Honorio | 1 Twitter mentions | 2 Citations | Authors’ H-index avg 5.2

“No-Regret Learning in Partially-Informed Auctions” by Wenshuo Guo, Michael Jordan, Ellen Vitercik | 0 Twitter mentions | 0 Citations | Authors’ H-index avg 10.3

“Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path” by Haoyuan Cai, Tengyu Ma, Simon Du | 0 Twitter mentions | 0 Citations | Authors’ H-index avg 10.2

Reinforcement Learning Methods

“Offline Meta-Reinforcement Learning with Online Self-Supervision” by Vitchyr H Pong, Ashvin V Nair, Laura M Smith, Catherine Huang, Sergey Levine | 16 Twitter mentions | 0 Citations | Authors’ H-index avg 9.0

“Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics” by Matthias Weissenbacher, Samarth Sinha, Animesh Garg, Kawahara Yoshinobu | 26 Twitter mentions | 0 Citations | Authors’ H-index avg 3.5

“Do Differentiable Simulators Give Better Policy Gradients?” by Hyung Ju Suh, Max Simchowitz, Kaiqing Zhang, Russ Tedrake | (ICML Outstanding Paper) | 0 Twitter mentions | 0 Citations | Authors’ H-index avg 5.0

“The Importance of Non-Markovianity in Maximum State Entropy Exploration” by Mirco Mutti, Riccardo De Santi, Marcello Restelli | (ICML Outstanding Paper) | 0 Twitter mentions | 0 Citations | Authors’ H-index avg 2.2

“Interactive Inverse Reinforcement Learning for Cooperative Games” by Thomas Kleine Büning, Anne-Marie George, Christos Dimitrakakis | 5 Twitter mentions | 0 Citations | Authors’ H-index avg 3.4

Now of course, this method of quickly grouping and ranking a list of over 1200 papers from ICML has many obvious shortcomings, and many further improvements and refinements can be made. We still think it is a good start towards providing a quick guide to the conference, with a small amount of work, and what’s more important, you can do it yourself using Zeta Alpha now for any arbitrary filtered collection. We believe this is the direction in which future AI research assistants will develop that can guide people towards more effective allocation of their time and towards avoiding missing important information to make better decisions using cognitive augmentation.