In this second episode of the Neural Search Talks podcast, Andrew Yates and Sergi Castella discuss the paper "The The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes"
This paper investigates what happens when dense vector search indexes are scaled up and show that there are limitations in the representational capacity of such indices. It turns out as index size grows, the chances of retrieving 'false positives' in a dense index grow faster than for a sparse one, hinting at a possible fundamental limitation of the approach.
Resources:
Timestamps:
00:00 Co-host introduction
00:26 Paper introduction
02:18 Dense vs. Sparse retrieval
05:46 Theoretical analysis of false positives(1)
08:17 What is low vs. high dimensional representations
11:49 Theoretical analysis o false positives (2)
20:10 First results: growing the MS-Marco index
28:35 Adding random strings to the index
39:17 Discussion, takeaways
44:26 Will dense retrieval replace or coexist with sparse methods?
50:50 Sparse, Dense and Attentional Representations for Text Retrieval
Referenced work:
Sparse, Dense and Attentional Representations for Text Retrieval by Yi Luan et al. 2020.
Comments