Transformers at Work 2024

September 20, 2024 | San Francisco Bay Area

TRANSFORMERS AT WORK

5th edition

Transformer models are the backbone of modern AI, and we're back to celebrate recent progress. In the fifth edition of our workshop we are exploring recent breakthroughs and applications in the enterprise.

In the afternoon workshop program, we bridge the space between cutting edge research in AI and building practical applications that deliver value. Learn about the progress in AI from world-renowned researchers and engineers pushing the boundaries of Neural Search, RAG, LLMOps, Prompt Optimization, Agents, and AI Hardware.

The evening program is a unique networking opportunity with food, drinks and live music + after-party to celebrate the progress in the field and dive into the weekend with AI researchers and engineers from Industry and Academia in the Bay Area!

Meet our speakers:

Zhuyun Dai

(Google Deepmind)

LLM-Powered Retrieval

Natalia Vassilieva

(Cerebras)

Hardware, Training and Inference

Douwe Kiela

(Contextual)

RAG at the Edge

Michael Ryan

(Stanford University)

DSPy and Prompt Optimization

Fernando Rejon Barrera

(Zeta Alpha)

RAGOps & Customization

Raza Habib

(Humanloop)

Building LLM powered products

Julia Kiseleva

(MultiOn)

Evaluating AI Agents

Rama Akkiraju

(NVIDIA)

FACTS about Gen AI chatbots

New edition, all the more value

LEARNING

We are proud to welcome leading researchers and leaders from industry and academia. Our speakers will show you the latest developments & most promising advances in the field. The starring role is of course reserved for Transformer models.

DISCOVERY

Advances in LLMs and Vector-based Semantic Search are changing the way AI systems can process unstructured data, which makes understanding of the state of the art and its implications for applications critical to getting ahead in virtually any industry.

NETWORKING

Make valuable connections and meet experts and peers not only in the AI industry, but also in all others touched by tech. "If you want to go somewhere, it is best to find someone who has already been there".

FUN

To celebrate the amazing progress in our field and kickstart the academic year with some fun, we can't conclude the evening without some entertainment. Drinks, food, & live music included, good company guaranteed!

What you can expect:

Time schedule:

12:00 - 13:00 Arrival and registration
13:00 - 13:30 Opening: Jakub Zavrel & Dinos Papakostas
13:30 - 14:45 Session 1: Rama Akkiraju, Raza Habib, Julia Kiseleva
14:45 - 15:15 Coffee Break
15:15 - 16:30 Session 2: Michael Ryan, Natalia Vassilieva, Douwe Kiela
16:30 - 16:45 Break
16:45 - 17:30 Session 3: Fernando Rejon Barrera, Zhuyun Dai
17:30 - 18:00 Panel "Transformers at Work"
18:00 Drinks, Food & Networking
19:00 Live music: Huney Knuckles

22:00 End

Find the full list of titles and abstracts below.

Speakers, Titles and Abstracts

In this session, we’ll explore how RAGOps enables businesses to fully customize their GenAI deployments for optimal accuracy and ROI. We will show how to tailor critical components, including data integration, synchronization and access control, data enhancement, domain-specific fine-tuning, retrieval strategies, LLM models, and whole RAG applications. We’ll also introduce the latest Zeta Alpha extensibility plugin, which facilitates the development, testing, deployment, and monitoring of RAG agents in a framework-agnostic manner. This allows rapid experimentation with new use cases and smooth transitions into production.

Fernando Rejon Barrera (Zeta Alpha)
RAGOps: Finetuning and Customizing Gen AI Deployments

Rama Akkiraju (NVIDIA)
FACTS about building Generative AI-based Chatbots: Lessons and Best Practices

Enterprise chatbots, powered by generative AI, are rapidly emerging as the most explored initial applications of this technology in the industry, aimed at enhancing employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), Langchain framework serve as key technological components in building generative-AI based chatbots. However, leveraging generative AI for enterprise chatbots presents numerous challenges and considerations. Crafting a successful enterprise chatbot demands meticulous engineering of RAG pipelines, fine-tuning LLMs, engineering prompts, ensuring the relevance and accuracy of enterprise knowledge, honoring document access control permissions, providing concise responses, including pertinent references, and safeguarding personal information. In this talk, we present our recipes for optimizing RAG performance across various control points from three case studies: three enterprise-grade chatbots for answering questions on IT and HR benefits, company financial earnings, and all enterprise content. Each of these domains exposed us to different concerns that must be addressed in RAG-based Chatbots ranging from dealing with data that contains structured, unstructured, and multi-modal content. Our key findings from our work are 1) document metadata enrichment plays a critical role in retrieval relevancy 2) retrievers struggle with complex and multi-part queries thereby necessitating complex agent architectures and 3) guardrails play a critical role in securing sensitive documents while building enterprise chatbots. Incidentally, all of these issues require LLM-based solutions themselves making the process of RAG pipeline optimization recursive. We conclude with best practices for building enterprise-grade chatbots that we have distilled from our work that shed light on techniques for dealing with chatbot FACTS - content freshness (F), architectures (A), cost economics of LLMs ©, testing cycles (T), and security (S).

Julia Kiseleva (MultiOn)
Evaluating Interactive Autonomous Agents

Seamless interaction between AI agents and humans through natural language remains a critical objective in AI research. This talk addresses the challenges of developing interactive autonomous agents that can understand and execute grounded natural language instructions. Despite significant progress, challenges such as the scarcity of suitable datasets and the need for robust evaluation platforms persist.

Natalia Vassilieva (Cerebras)
Training and Inference Trade-offs for Domain-Specific and Multilingual LLMs

This talk explores the trade-offs between training large language models (LLMs) from scratch and adapting existing generalist models for specific domains or languages. While the quality of large-scale foundational “generalist” models has steadily improved, this progress often comes with increased model size and higher serving costs. Moreover, even the most powerful models today may struggle with the nuances of specialized fields like medicine or finance, lack fluency in low-resource languages and dialects, and exhibit a bias toward Western cultures.

In many cases, it is more efficient to develop specialized models that are finely tuned to the unique vocabulary, context, and nuances of a particular field or language, leading to better quality outputs and more cost-effective inference. However, the question remains: should you train this specialized model from scratch, or adapt an existing, high-quality, English-centric generalist model? Also, which model size to pick? How much data is required? How to avoid catastrophic forgetting during adaptation?

We will address these and other questions, supported by case studies, while emphasizing the importance of efficient training techniques and scaling laws as critical tools in this process.

Raza Habib (Humanloop)
Best Practices for building high retention LLM powered products

Through his work as the CEO of Humanloop, Raza has personally worked with tens of companies to build and deploy LLM powered products. He's also interviewed many of the best engineering leaders building AI products through his podcast, High Agency. In this talk, Raza will summarise the best practices for building with AI. We'll cover questions like: how to actually build reliable agents in practice, the best way to evaluate your AI systems and what skills your team needs.

Douwe Kiela (Contextual AI)
RAG on the edge: GRIT and OLMoE for hyper-efficient retrieval and generation

Retrieval-augmented generation (RAG) has become the dominant paradigm for allowing language models to work on extraneous data sources. As generative AI becomes increasingly important, a natural question is how we can make it feasible to deploy RAG systems directly to edge devices. In this talk, I will cover two research contributions that push the frontier on hyper-efficient RAG on the edge. First, I will discuss generative representational instruction tuning (GRIT), where we demonstrate that the weights between a retriever and generator can be shared via instruction tuning, allowing us to cache representations for much faster RAG. Second, I’ll present OLMoE, the first fully open source Mixture-of-Experts (MoE) language model that outperforms much larger models while being an order of magnitude more efficient.

Michael Ryan (Stanford University)
DSPy: Prompt Optimization for LM Programs

It has never been easier to build amazing LLM powered applications. Unfortunately engineering reliable and trustworthy LLMs remains challenging. Instead, practitioners should build LM Programs comprised of several composable calls to LLMs which can be rigorously tested, audited, and optimized like other software systems. In this talk I will introduce the idea of LM Programs in DSPy: The library for Programming — not Prompting LMs. I will demonstrate how the LM Program abstraction allows the creation of automatic optimizers for LM Programs which can optimize both the prompts and weights in an LM Program. I will conclude with an introduction to MIPROv2: our latest and highest performing prompt optimization algorithm for LM Programs.

Zhyun Dai (Google Deepmind)
LLM-Powered Retrieval: From Distillation to New Architectures

Information retrieval systems are essential for accessing the vast knowledge stored in large corpora, but current models often fall short when it comes to reasoning, following instructions, and generalizing to new distributions. This talk delves into our research aimed at enhancing retrieval models by harnessing the power of large language models (LLMs).

We first tackle the challenge of generalizing neural retrievers across different domains. We showcase how LLM distillation can be leveraged to achieve this, enabling versatile neural retrievers like the Gecko text embeddings API. We then introduce XTR, a novel multi-vector retrieval model that brings closer the architecture of LLMs and retrievers, which improves retriever generalization ability while being efficient.

In the second part of this talk, we explore the potential of long-context LLMs to revolutionize the future of retrieval by digesting the entire corpus as a prompt. To evaluate this exciting frontier, we introduce LOFT, a new benchmark specifically designed to assess the impact of long-context models retrieval, retrieval-augmented generation (RAG), and database querying.

Registration