Time to look back at a great conference and collect learnings. A brief reflection on our experience, along with some personal recommendations from the team members about our favourite papers.
Last week (Nov. 16–20) we, Zeta Alpha, attended the EMNLP 2020 conference as sponsors. In a previous blog post, we presented an overview on what the conference had to offer in 2020, diving into what topics were gaining traction and one should pay attention to. Now that the conference is behind us, here are some of our favorites.
Firstly, we found useful the fact that everybody attending was in front of a computer, which meant they could feel our product with their own eyes and fingers and it was great to see their enthusiasm about our product. However, attracting the eyeballs of the casual wanderer is far more challenging online than it is in a physical setting.
About the Content
All three main keynotes sparked interest and discussion among attendees: Claire Cardie’s talk on the big picture around Information Extraction, Rich Caruana on interpretable ML and Janet Pierrehumbert on Linguistic Behaviour and NLP. To us, Rich’s talk stood out as packed with valuable insight while being entertaining: “Why Friends Don’t Let Friends Deploy Black-Box Models: The Importance of Intelligibility in Machine Learning”. In this keynote, Rich walked us through trends on ML interpretability along with several illustrative examples on how some interpretable models can be engineered to make better predictions. This talk was also important because it spoke to a broader trend in the ML community: a growing interest in interpretability and an acknowledgement that this facet is key if models want to make the jump from research to production. In terms of workshops, one of the most exciting bits was our participation in the new Scholarly Document Processing Workshop, where we presented our work “A New Neural Search and Insights Platform for Navigating and Organizing AI Research”, in which we detail the workings of our platform for navigating AI Research.
Our personal Top 10
Finally, we want to share some recommendations from our team on the works presented at EMNLP. Here’s a selection of our favourite 10 in no particular order, followed by a short comment on what makes them special. This is a solid start if you missed the conference and want to catch up!
An Embedding Model for Estimating Legislative Preferences from the Frequency and Sentiment of Tweets — Victor: “The hypothesis and the method are quite simple, but the application was surprising and fascinating to me...”o me...”
Dense Passage Retrieval for Open-Domain Question Answering — Victor: “This paper brings very important insights which are relevant for dense retrieval, performing a wide empirical analysis on different approaches for training, very useful to us!”
What Do Models Learn from Question Answering Datasets?— Olga: “If you look at current QA leaderboards, it seems like QA is almost a ‘solved’ task: models are doing great. But then you check QA on real-world data and it’s still a bit of a mess, so why’s that? This paper provides some answers.”
Masking as an Efficient Alternative to Finetuning for Pretrained Language Models — Carsten: “Masking taken to a whole new level. Cool idea, even cooler that it actually works.”
How Much Knowledge Can You Pack Into the Parameters of a Language Model? — Jakub: “It’s really interesting to peek into language models and see what they memorize from pre-training. It turns out… they can answer quite a lot of questions without explicit access to external knowledge!”
Information-Theoretic Probing with Minimum Description Length Sergi: “Such a simple yet powerful tool to better understand what’s encoded in representations: compress a message with them and see how long it is. Farewell classifier accuracy probing!”
Embedding Words in Non-Vector Space with Unsupervised Graph Learning — Sergi: “While the applicability is still a big question mark here, it’s inspiring to see works trying to learn representations beyond the usual dense distributed embeddings.”
Flexible retrieval with NMSLIB and FlexNeuART — Marzieh: “NLP open source tools are the backbone of many industry applications, and it’s great to see them getting a spotlight at conferences such as EMNLP”.
Modularized Transfomer-based Ranking Framework — Jakub: “Transformers do well at ranking, but at what cost? Taking into account computation constraints is just as important as performance, and this work addresses the problem.”
SLM: Learning a Discourse Language Representation with Sentence Unshuffling — Marzieh: “Shuffle sentences, have a hierarchical Transformer recover the original order and learn better discourse-level representations. These pre-training tasks keep getting more interesting.”
Our wrap-up comes to an end here, but our presence at conferences does not. Next up, we’ll be at NeurIPS in a couple of weeks, so make sure to catch up with us there! In the meanwhile, make sure to follow us on twitter @ZetaVector to not miss a thing.