September 16, 2022 | Science Park Amsterdam
TRANSFORMERS AT WORK
Transformers have evolved far beyond their original niche in Natural Language Processing, they have caused a paradigm shift in Search Technology, and they are currently the leading approach in Machine Learning thanks to their versatility and scalability.
In the 3rd edition of our workshop you will learn about the progress in Transformers from world-renowned researchers pushing the boundaries in their respective subfields such as Graph Neural Networks, Neural Information Retrieval, large-scale Language Models, Computer Vision or Multimodality.
After the workshop food, drinks + live music (Turboguru) create a unique opportunity to meet and have fun with fellow AI R&D folks from Industry and Academia at the beautiful Startup Village Amsterdam!
[NEW!: Talk Titles and Abstracts below]
[UPDATE 8 Sept. 2022: In-person registration now at capacity, sign up for online or join the waiting list]
Meet our speakers:
New edition, all the more value
We are proud to welcome researchers and leaders from industry and academia. Our speakers will show you the latest developments & most promising advances in the field. The starring role is of course reserved for Transformer models.
Advances in NLP and Search are changing the way AI systems can process unstructured data, which makes understanding of the state of the art and its implications for applications critical to getting ahead in virtually any industry.
Make valuable connections and meet experts and peers not only in the AI industry, but also in all others touched by tech. "If you want to go somewhere, it is best to find someone who has already been there".
To celebrate the amazing progress in our field and kickstart the academic year with some fun, we can't conclude the evening without some entertainment. Drinks, food, & live music included, good company guaranteed!
What you can expect:
12:00-13:00 Arrival and registration
13:00 Opening: Jakub Zavrel and Sergi Castella
13:30 - 14:45 Talks Session 1: Marzieh Fadaee, Gautier Izacard, and Rodrigo Nogueira
14:45 - 15:15 Coffee Break
15:15 - 16:30 Talks Session 2: Ahmet Üstün, Auke Wiggers and Cees Snoek
16:30 - 16:45 Break
16:45 - 17:30 Talks Session 3: Thomas Kipf and Andrew Jaegle
17:30 - 18:00 Panel "Transformers at Work"
18:00 Drinks, Food & Networking
19:30 Live music: Turboguru
Find the full list of titles and abstracts below.
Speakers, Titles and Abstracts
Marzieh Fadaee (Zeta Alpha)
From Transformers to Work: Advances in Neural Search
Transformers have powered a wave of progress in Information Retrieval, but when we translate this to building production ready systems, a lot of questions arise. Can we find a smarter and more cost-effective way to harness the power of large language models for domain-specific neural search? What do we know about the generalization capability of retrieval models, and why is in-domain effectiveness not a good indicator of zero-shot effectiveness? Can we combine multiple data sets and task objectives to train a more robust multi-purpose retriever? And how do we move towards non-English retrieval? We adressed these questions in several research projects done at Zeta Alpha and in this talk we’re going to present the outcome of these projects.
Gautier Izacard (Meta AI)
Retrieval-augmented Language Models
Large language models are able to learn to solve tasks using only few examples, a sign of their generalization capabilities. A key feature of these abilities is that they improve significantly with model size. Hypothetically, scaling enables both more complex reasoning and better memorization. The idea behind retrieval augmented language models is to outsource the knowledge stored in the model weights to an external memory. Retrieval-augmented models have shown promising results both for knowledge intensive tasks but also for language modeling. In this talk, I’ll give an overview of retrieval augmented language models, their promise and challenges, with a focus on our recent model Atlas.
Rodrigo Nogueira (UNICAMP, Zeta Alpha)
The Prompting Power of Large Language Models
With sufficient number of parameters and training data, large language models (LLMs) show the emergent property of "in-context learning", that is the ability to perform tasks never seen before by relying only on instructions and examples in natural language provided as input to the model. In this talk, I will show how this remarkable ability of LLMs is allowing us to solve tasks at an increasing pace. Among these, recent solutions to reasoning and mathematical tasks are emblematic, as these problems were known for decades to expose limitations of neural models. A key insight from these solutions is the use of step-by-step instructions, which I will argue bridges the gap between the neural and symbolic worlds. I will also show that the use of symbolic representations leads to better out-of-domain information retrieval models. Finally, I will discuss the limitations of retrieval-augmented language models, and argue that specialized models will allow us to efficiently retrieve accurate information.
Ahmet Üstün (University of Groningen)
Adapters and Hyper-networks for NLP
To adapt pre-trained language models (PLMs) to downstream tasks or target languages, the standard approach is to fine-tune all model parameters on available datasets. However, this approach brings a relatively large parameter cost. Moreover, standard fine-tuning is prone to catastrophic forgetting or interferences, especially when in multi-task or multilingual scenarios. Adapters are proposed as a parameter-efficient alternative to standard fine-tuning. Besides parameter efficiency, they also facilitate better learning for multiple tasks or languages when used with hyper-networks or other modular frameworks. In this talk, I will give a brief introduction to adapters and hyper-networks. I will present the benefits of these methods by highlighting particular use cases, such as multi-task learning or cross-lingual knowledge transfer.
Auke Wiggers (Qualcomm)
Efficient transformers: towards Deployment on Edge Devices
Many transformer models are famously trained at enormous scale. While their efficiency has been improved by better design of key operators and improved architectures, they are often more expensive than comparable convolutional models, and deploying them to edge devices poses new challenges. In this talk, we will discuss how these models are used at Qualcomm with a focus on efficient computation. We first cover two applications where transformer-based approaches improve over convolutional ones in terms of compute: video super-resolution and neural data compression. We then cover how we improve inference efficiency via quantization, and give an outlook on challenges when deploying these models to edge devices.
Thomas Kipf (Google Brain)
Slot Attention: Towards Object-centric Perception
The world around us — and our understanding of it — is rich in compositional structure: from atoms and their interactions to physical objects in our everyday environments. How can we learn models of the world that take this structure into account and generalize to new compositions in systematic ways? This talk focuses on an emerging class of slot-based neural architectures that utilize attention mechanisms to perform perceptual grouping of scenes into objects and abstract entities without direct supervision. I will briefly introduce the Slot Attention mechanism as a core representative for this class of models and show how slot-based architectures can be used for self-supervised object discovery in video and 3D scenes.
Andrew (Drew) Jaegle (DeepMind)
Long-context Anymodal Generation with Perceivers
A central goal of artificial intelligence is the development of systems that flexibly process data from any modality for any task. Perceivers are a family of architectures that scale well to very large inputs in many modalities by encoding data to a latent bottleneck. But latent-space encoding handles all elements in a single pass, while autoregressive generation - which has become the go-to tool for generation in language and many other domains - assumes processing happens one element at a time. I will describe Perceiver AR, a recently proposed long-context autoregressive model that avoids these problems by carefully restructuring the Perceiver latent space. Perceiver AR obtains state-of-the-art performance on generation benchmarks on images, language, and music, while scaling to inputs several orders of magnitude longer than Transformer-XL, even when using very deep architectures. Perceiver AR's long context window allows it to easily support data without a natural left->right ordering, and its latent structure allows compute budget to be adapted at eval time for either improved performance or reduced generation time.
Cees Snoek (University of Amsterdam)
Transformers for Computer Vision
Get Yer Chakras out and Shake your Booty
The five gurus of Turboguru take you on a musical journey to different countries. Accompanied by catchy Balkan beats, Arabic trap and danceable influences from all over the world, Turboguru creates an atmosphere in which the entire audience is guaranteed to encounter enlightenment dancing together!