top of page

Search

Neural Search Talks [2] — The Curse of Dense Low-Dimensional Information Retrieval

Jakub Zavrel
Jan 21, 2022
1 min read

In this second episode of the Neural Search Talks podcast, Andrew Yates and Sergi Castella discuss the paper "The The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes"

This paper investigates what happens when dense vector search indexes are scaled up and show that there are limitations in the representational capacity of such indices. It turns out as index size grows, the chances of retrieving 'false positives' in a dense index grow faster than for a sparse one, hinting at a possible fundamental limitation of the approach.

Resources:

Paper: https://aclanthology.org/2021.acl-short.77/

Timestamps:

00:00 Co-host introduction

00:26 Paper introduction

02:18 Dense vs. Sparse retrieval

05:46 Theoretical analysis of false positives(1)

08:17 What is low vs. high dimensional representations

11:49 Theoretical analysis o false positives (2)

20:10 First results: growing the MS-Marco index

28:35 Adding random strings to the index

39:17 Discussion, takeaways

44:26 Will dense retrieval replace or coexist with sparse methods?

50:50 Sparse, Dense and Attentional Representations for Text Retrieval

Referenced work:

Sparse, Dense and Attentional Representations for Text Retrieval by Yi Luan et al. 2020.

Recent Posts

Zeta Alpha launches custom AI agents built for knowledge-intensive enterprise teams.

Zeta Alpha launches custom AI agents built for knowledge-intensive enterprise teams.

AI & Scientific Discovery - Trends in AI: Nov. '25

AI & Scientific Discovery - Trends in AI: Nov. '25

Agentic RAG: From Overviews to Deep Research

Agentic RAG: From Overviews to Deep Research

Comments

bottom of page