top of page
Search

Neural Search Talks [2] — The Curse of Dense Low-Dimensional Information Retrieval

In this second episode of the Neural Search Talks podcast, Andrew Yates and Sergi Castella discuss the paper "The The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes"


This paper investigates what happens when dense vector search indexes are scaled up and show that there are limitations in the representational capacity of such indices. It turns out as index size grows, the chances of retrieving 'false positives' in a dense index grow faster than for a sparse one, hinting at a possible fundamental limitation of the approach.

Resources:



Timestamps:

00:00 Co-host introduction

00:26 Paper introduction

02:18 Dense vs. Sparse retrieval

05:46 Theoretical analysis of false positives(1)

08:17 What is low vs. high dimensional representations

11:49 Theoretical analysis o false positives (2)

20:10 First results: growing the MS-Marco index

28:35 Adding random strings to the index

39:17 Discussion, takeaways

44:26 Will dense retrieval replace or coexist with sparse methods?

50:50 Sparse, Dense and Attentional Representations for Text Retrieval


Referenced work:

6 views0 comments

Comments


bottom of page