From Monday 16th of November through Friday 20th, EMNLP 2020 is taking place.
The Empirical Methods for NLP conference comes packed with 750+ papers in the main conference and 400+ in workshops. Last week we have added the new papers to our platform for you to enjoy. Get started with this overview to navigate the program.
Provincia La Altagracia, Dominican Republic, by Justin Aikin on Unsplash
Originally the conference was meant to take place in Punta Cana in the Dominican Republic. To make up for the loss, Zeta Alpha is organizing the EMNLP 2020 Tropical Beach Party on Wednesday evening (CET). Contact us if you want to join and mingle with the EMNLP crowd.
As the Zeta Alpha team is counting down for the conference to start, we have written this overview, to help you navigate the program.
Paper count evolution in the past decade.
This year, it’s once again breaking all its historical records of volume. The conference has done nothing but grow in the past decade, and this year is no exception: 752 papers in the main conference and 535 in workshop and tutorial ones.
In addition, the companion conference Findings of EMNLP has made its debut with 447 papers. According to the organizing committee, the purpose of this venue is to present works that were not selected to appear in the main conference, yet were strong enough to be worthy of publication.
Perhaps the most interesting trend in the last few years has been the explosion of workshops: this year, 24 workshops and tutorials with 7 of them being a first edition. Here are some of the ones we like the most:
First workshop on Scholarly Document Processing: practical works on processing academic literature. Zeta Alpha is present in two publications: A New Neural Search and Insights Platform for Navigating and Organizing AIResearch and Effective distributed representations for academic expert search.
System Demonstrations: with on the Transformers library by 🤗huggingface and many more such as Wikipedia2Vec or OpenUE: An Open Toolkit of Universal Extraction from Text.
Sixth Workshop on Noisy User-generated Text (W-NUT 2020): NLP on real-world noisy data sources such as Twitter or Reddit and analyzing how and why some otherwise powerful models fail in this regime.
We compiled a list of the top-10 organizations ranked by the number of papers where they appeared as author affiliations. Carnegie Mellon University stands out in the first place, and geographically speaking the US dominates, where the only outsider organizations are Tsinhua University (China) and the University of Eddinburgh (UK).
Organizations ranked by papers accepted at EMNLP.
With increased publication volume, also comes a harder time at identifying what work to focus on as an attendee. We did some keyword analysis on paper titles and compared it to last year’s edition. Here’s what we found.
Relative change in keyword frequency in titles.
Terms like ‘classification’ and plain ‘neural network(s)’ or ‘deep’ are clearly dropping in popularity and the reason for this might be simple: they’ve become so ubiquitous that they can often be omitted and researchers choose to emphasize other aspects of their work. Breaking meta-consistency, ‘attention’ is losing itself in explicit mentions, as it becomes less of a defining feature of new publications.
On the flipside, ‘transformer’ absolute mentions have tripled, and ‘language model’ is also also increasing in popularity. Interstingly, the stem ‘train’ (including training and pre-training) is making a huge jump, as more and more publications focus on developing better techniques to train models instead of changing the models themselves. “Knowledge Graphs” are also increasing in popularity, and interestingly, the usage of knowledge base is losing popularity year over year in favor of the former.
According to our data — combining main conference and workshops — 691 out of the 1263 (55%) of the papers published at EMNLP (+workshops) have already been available as a pre-print on arXiv for a while, which means that some of these papers have already had an impact even before the conference started. Here are the top 10 most cited papers, along with their current citation count:
1. Transformers: State-of-the-Art Natural Language Processing by Thomas Wolf et al. (213) 🖥 Virtual Poster
2. How Much Knowledge Can You Pack Into the Parameters of a Language Model? by Adam Roberts, Colin Raffel and Noam Shazeer (31) 🖥 Virtual Poster
3. Dense Passage Retrieval for Open-Domain Question Answering by Vladimir Karpukhin, Barlas Oguz et al. (25 citations) 🖥 Virtual Poster
4. Hierarchical Graph Network for Multi-hop Question Answering by Yuwei Fang et al. (22 citations) 🖥 Virtual Poster
5. On Extractive and Abstractive Neural Document Summarization with Transformer Language Models by Jonathan Pilault, Raymond Li, Sandeep Subramanian et al. (20 citations) 🖥 Virtual Poster
6. An information theoretic view on selecting linguistic probes by Zining Zhu and Frank Rudzicz. (20 citations) 🖥 Virtual Poster
7. Evaluating the Factual Consistency of Abstractive Text Summarization by Wojciech Kryscinski et al. (19 citations) 🖥 Virtual Poster
8. Scalable Zero-shot Entity Linking with Dense Entity Retrieval by Ledell Wu et al. (18 citations) 🖥 Virtual Poster
9. Experience Grounds Language by Yonatan Bisk, Ari Holtzman, Jesse Thomason et al. (16 citations) 🖥 Virtual Poster
10. — Statistical Power and Translationese in Machine Translation Evaluation by Yvette Graham et al. (15 citations) 🖥 Virtual Poster
Some of these are already classics by now! We recommend that you check out these rising star papers as they present at the conference.
Papers on Topic
Finally, to make your life a bit easier, we’ve curated a few topical lists in AI Research Navigator, combining the main EMNLP conference and workshops and tutorials.
Question Answering: abstractive, extractive, knowledge-aware… you name it. Here’s a list of papers around it.
Transformers: a mixtape of transformers, new architectures, training on new datasets, studies, etc.
Knowledge Graphs: knowledge-aware question answering, entity linking, graph embeddings, etc.
Interpretability: (and explainability!) about build models and designing techniques that we humans can actually understand.
Summarization: extractive and abstractive summarization, new evaluation strategies for the task, etc.
COVID-19: most of the papers in this list belong to workshops on useful processing of text data in the context of COVID-19.
EMNLP is not the first (nor the last) conference to go fully online this year, and luckily for us, this means that the organizers have learned from previous conferences and the online experience is often becoming more refined at each iteration. An early preview of the virtual conference shows a promising workflow to discover publications and attend live sessions; and we’ll report more thoroughly on this aspect when we share our recap. Zeta Alpha will be attending the conference as a Gold sponsor and we’ll also be sharing our experience live on Twitter @zetavector, so make sure to follow us if you don’t want to miss a thing! And if you're attending, make sure to drop by our booth, and chat. We'd love to learn what you are interesed in!