Neural Search Talks [9] — Generating Training Data with Large Language Models w/ Marzieh Fadaee

Jakub Zavrel
Dec 13, 2022
1 min read

Updated: Mar 30, 2023

Marzieh Fadaee — NLP Research Lead at Zeta Alpha — joins Andrew Yates and Sergi Castella to chat about her work using large Language Models like GPT-3 to generate domain-specific training data for retrieval models with little-to-no human input. The two papers discussed are "InPars: Data Augmentation for Information Retrieval using Large Language Models" and "Promptagator: Few-shot Dense Retrieval From 8 Examples".

The conversation touches on the details of prompting and the costs of generating domain-specific datasets for information retrieval.

https://www.youtube.com/watch?v=MlxZI_bFD8U

📄 InPars: https://arxiv.org/abs/2202.05144

📄 Promptagator: https://arxiv.org/abs/2209.11755

Timestamps:

00:00 Introduction

02:00 Background and journey of Marzieh Fadaee

03:10 Challenges of leveraging Large LMs in Information Retrieval

05:20 InPars, motivation and method

14:30 Vanilla vs GBQ prompting

24:40 Evaluation and Benchmark

26:30 Baselines