• castella2

The Future will be Self-supervised: how to learn from unlabeled data @ NeurIPS 2020

Updated: Dec 17, 2020

The main conference at NeurIPS 2020 has wrapped up, more than 20 thousand people attended and around 1900 papers were presented. We loved Gather town, and much less the global time zones which required staying up all night in Europe to attend the keynotes and oral sessions. Overall, the conference was an amazing online experience and the innovations introduced here should keep us inspired and busy for the months to come. And after the main conference, a stunning total of another 62 workshops brought even more good stuff.

In this blog, we want to write a few words about one of the topics that has most captured our attention at NeurIPS 2020: Self-supervised learning (SSL). It is becoming clear this is the key to the next wave of progress in AI.

The subfield, which aims to exploit vast amounts of unlabeled data, while at the same time learning better and more robust internal representations, seems to be exploding: a rough count in the Zeta Alpha platform shows that 399 papers at NeurIPS (21% of the total) deal with some form of SSL. The basic idea of SSL is to learn by taking an unlabeled object (image, video, or text) and either use one part of the object itself (after masking it) as the target to predict, or to use a natural relation between multiple related objects (e.g. the next frame in a video, the next word in sentence) as a constraint.

SSL notably started in Computer Vision, has grown to shape the last two years of both Computer Vision and NLP, and has enabled drastic improvements in label efficiency for supervised learning. Yann Lecun voiced this strongly in his ICLR 2020 keynote talk, as he said: “the future will be self-supervised”, and these words have stuck with us, so we figured we’d share some resources about it. Without further ado, here’s a collection of papers about the topic we loved at the conference!

Computer Vision

Self-supervised learning enables the construction of representations that are invariant to some transformation, and hence more robust for generalization. For example, it enables object recognition models to deal with pose change, zooming in, lighting levels or color changes. This means that with a small amount of finetuning on a dataset like ImageNet, SSL is now able to beat fully supervised models. Self-supervision is often used to learn the invariants from large datasets by using simple data augmentation techniques (e.g. presenting multiple different crops of the same image) and later fine-tuning the representations on a specific downstream task, such as classification or segmentation. For instance, SimCLR and the InfoNCE loss among others, are methods for self-supervised learning that try to push augmented versions of the same image closer together in the embeddings space, while pushing so called 'negatives' out. So here's a list of our top Self-supervised papers on Computer Vision at NeurIPS:

1. Multimodal self-supervised learning:

2. Learning pixel-wise representations on images:

3. Self-supervised learning on 3D data:

Language and Speech

The last few years in NLP have been heavily influenced by self-supervision in the form of Language Modelling and some of its variations. This year’s NeurIPS included some notable examples, such as one of the best paper award winners “Language Models are Few-Shot Learners” (GPT-3). In addition, learning representations for audio data also benefits greatly from self-supervised Contrastive Learning, which we’re including here. Check out our list of favorite papers on the topic.

Conceptual and Theoretical Advancements

Almost as a tradition in Deep Learning, empirical results often come first and their theoretical understanding arrives later (if it arrives at all!); self-supervision is not an exception. Many theoretical results are now arising which link self-supervised techniques such as Contrastive Learning with Information Theory, and in the meanwhile, new techniques which we don’t really understand seem to work surprisingly well. This not-so-niche-anymore has grown to deserve its own NeurIPS workshop: "Self-supervised learning theory and practice". Here are our suggestions to keep up to speed with the topic:


Because they need less human labeling, Self-supervised approaches can be much cheaper to build than supervised learning. Which makes it an interesting candidate for companies working in applications that require machine learning models to work. We call attention to the medical domain, where annotating data is often expensive and time-consuming, when at all possible. If you have an interest in this area, here are some publications you should be looking at:

NeurIPS 2020 Workshops

On Friday and Saturday of NeurIPS, two workshops were fully dedicated to Self-supervised learning:

  • Self-Supervised Learning for Speech and Audio Processing Organized by Pengtao Xie, Shanghang Zhang, Pulkit Agrawal, Ishan Misra, Cynthia Rudin, Abdelrahman Mohamed, Wenzhen Yuan, Barret Zoph, Laurens van der Maaten, Xingyi Yang, and Eric Xing Fri, Dec 11th @ 9:50 EST – 19:25 EST With keynotes by a.o. Mirco Ravanelli, Luke Zettlemoyer, Chelsea Finn, Dong Yu, Mark Hasegawa-Johns​on, and Bhuvana Ramabhadran this was a fantastic workshop. It is clear that in speech and audio processing unlabeled data is abundant and all the benefits of SSL can be made to work. In particular, wave2vec2 has rapidly gained attention, and is already a de facto standard method. UW's Luke Zettlemoyer's closing key note gave a great overview of RoBERTa, BART and the more recently introduced model called MARGE, and nicely showed some of the thinking on why SSL works so well for seq2seq models in NLP.

  • Self-Supervised Learning -- Theory and Practice Organized by Pengtao Xie, Shanghang Zhang, Pulkit Agrawal, Ishan Misra, Cynthia Rudin, Abdelrahman Mohamed, Wenzhen Yuan, Barret Zoph, and Laurens van der Maaten, Xingyi Yang, Eric Xing Sat, Dec 12th @ 9:00 PST – 18:25 PST With amazing keynotes by Yejin Choi, Aloysha Efros, Jitendra Malik, Chelsea Finn (again), Quoc V. Le, Yann LeCun, Ruslan Salakhutdinov, and Oriol Vinyals, a.o. this was an intellectual tour the force around the very fundamentals and future of SSL itself. Notably, Efros argued that the popular contrastive self-supervision is not actually quite self supervised, as it adds hidden bias from the selected data augmentation heuristics. Making it more naturally self-supervised is the path forward. A true highlight of the workshop was Yann LeCun's improvised whiteboard lecture where he explained how all the various self-supervised models can be unified under an Energy Based view. Here are some screenshots from the presentation, to share the atmosphere:

Unfortunately, the following panel discussion was too late for even the most diehard Europeans like us to stay up for, so hopefully the workshop organizers will make recording available soon.

At Zeta Alpha

Self-supervised learning promises to launch a wave of progress in AI, but we also care so much about self-supervision because it’s at the core of what we do. We rely on self supervision to improve our models for information retrieval, question answering and recommendations, and for adaptation of NLP and search to domains where little labeled data is available. If you want to know more about the inner workings of our platform, check out the paper we published at the EMNLP workshop for Scholarly Document Processing: A New Neural Search and Insights Platform for Navigating and Organizing AI Research.

We hope you find these collections useful and that you enjoyed this year’s edition of NeurIPS as much as we did. Make sure to follow us on twitter at @zetavector to keep track of this space! And let us know if there are cool papers on SSL that we should add to this discussion.


© 2020 Zeta Alpha

twitter orange.png
linkedin orange.png