Why GenAI Pilots Fail: Common Challenges with Enterprise RAG - Trends in AI: September 2025

Dinos Papakostas
Sep 18
7 min read

Retrieval-Augmented Generation (RAG) is powering the current wave of AI applications, from multimodal question answering and technical document summarization to streamlining knowledge-intensive work flows. But bringing these applications to production quickly, effectively, and with a clear path to ROI is hard.

In this post, we outline key challenges in Enterprise RAG, move beyond the basics towards agentic systems, and explore how automatic optimization can adapt these systems to domain-specific needs.

*** Join our Luma community to stay up to date with our upcoming events! ***

Are GenAI pilots failing in production?

A recent MIT report made headlines with claims that "95% of generative AI pilots at companies are failing". Read closely and "failing" mostly refers to measurable ROI, rather than failure to ship. The report also notes "67% of externally partnered deployments succeed vs. 33% of internal builds" and stresses the importance of tight collaboration between domain experts and implementation teams. The takeaway: the barriers are less about core technology and more about strategy and execution.

On the flip side, a Google Cloud survey of 3,000+ senior enterprise leaders reports that 88% of agentic AI early adopters see ROI on at least one GenAI use case. Still, internal blockers persist, with data privacy and security ranking as the top criteria when evaluating AI model providers, which is often the difference between a green light and a no-go.

The Open-Source Model Landscape in Summer '25

As always, we keep a close eye to the open source landscape. Some notable recent releases:

MoonShot AI's Kimi K2 September update: better performance in agentic use cases and support for longer context tasks
DeepSeek V3.1: unifies V3 and R1 into a hybrid reasoning LLM, with a focus on agentic use
ByteDance's Seed-OSS-36B: signaling that their Seed series is here to stay
Zhipu AI's GLM-4.5V: a multimodal extension of GLM-4.5
Qwen3-VL (vision-language model) and Qwen3-Next (sparse MoE with efficiency-focused inference improvements)
Google DeepMind's EmbeddingGemma: an open-weight dense bi-encoder, that is SOTA in its weight class, multilingual, quantization aware training - viable even for CPU-only workloads

What Makes Enterprise RAG Challenging?

A recurring theme in this blog has been the contrast between the potential upside of a successful GenAI implementation that is production-ready and has a clear path towards ROI, and the difficulties that many large organizations face when it comes to developing such an application. But why is delivering a working solution based on enterprise search so challenging?

It turns out that the paradox with Enterprise RAG lies at its core: enterprise data. A RAG solution is only as effective as the knowledge that is contained within the "information silos" of the enterprise - cloud drives, chats, emails, wikis, websites, code repos, CRMs, and more. As Jakub Zavrel, founder of Zeta Alpha, mentions in Ben Lorica's latest edition of the Gradient Flow newsletter:

"Enterprise search is never going to be turnkey out of the box. It requires deep customization."

After helping our partners at BASF, Festo, and Deloitte bring highly specialized RAG systems to production, these are the hurdles we've encountered most often:

1. Enterprise RAG is Unstructured

Document collections in large organizations are never uniform. You'll find clean, single-column Microsoft Word files alongside multi-column PDFs where OCR struggles with layout boundaries; technical manuals packed with figures, graphs, schematics, and tables; websites and other web resources; massive spreadsheets with thousands of rows; (No)SQL databases; and slide decks that combine all of the above.

If your pipeline only ingests "nice" text, it will miss the knowledge that actually matters. An effective RAG solution must handle this diversity end to end, parsing all these formats, supporting embedding modalities such as images and plots, and making the resulting knowledge seamlessly accessible to users.

2. Enterprise RAG is Decentralized

This large volume of content doesn't live in a single source. Office documents may sit in OneDrive or SharePoint; team communications in Microsoft Teams or Slack; source code in GitHub; tasks and boards on Jira or Trello, files scattered across Dropbox or S3; customer information in HubSpot or Salesforce; wikis on Confluence and internal portals, legacy systems, etc. Ultimately, users shouldn't care where the data resides - they just need a unified interface that satisfies their information needs. That's why broad and reliable connector coverage is essential, so information can flow into the system regardless of origin.

3. Enterprise RAG Must Respect Access Roles

In big teams, not everyone has the same access. Some files are private to their owners, others shared with specific groups, and some visible to everyone. Role-based access control (RBAC), typically enforced via SSO and identity providers, must be honored by all downstream apps and GenAI is no exception - leaking confidential content would be a serious breach. As a result, a RAG system must integrate with the org's identity provider and enforce permissions end-to-end.

4. Enterprise RAG is not Static

A major differentiator between PoC implementations and production-grade apps is data freshness. Document collections are constantly changing: versions are updated, new items are deleted, old ones are deleted, and permissions evolve. RAG systems need to sync frequently and efficiently, always reflecting the latest state to avoid serving stale information.

This becomes even harder when combined with all of the above: large numbers of documents across many sources must be checked at short intervals, while still tracking per-user access regardless of where the document originated.

5. Enterprise RAG is not Just Vector Search

A persistent misconception is that embeddings are all you need for RAG - or that if you don't use embeddings in your system, it is not RAG. In reality, the "R" (Retrieval) in RAG can be implemented in multiple ways that don't involve vector search, and the most effective systems combine different retrieval methods and stages effectively.

A common architecture we've implemented repeatedly is shown below: the user input is first pre-processed by a language model to perform intent recognition and extract appropriate filters and search parameters; then it's re-written for each search backend - typically including both keyword and vector search. The results are fused and possibly re-ranked in a second stage to provide the generative model only with the most relevant results.

Flowchart illustrating Enterprise RAG process with Query Rewriting, Intent Recognition, Keyword/Vector Search, and Re-ranking.

Related research papers:

On the Theoretical Limitations of Embedding-Based Retrieval - O. Weller et al. (Google DeepMind, JHU) - Aug. 28th 2025

6. Enterprise RAG is Large-Scale

Moving to production increases the ingestion scale from hundreds of documents to thousands or even millions. When vector search is used, embedding models' finite context length limits how much information can be encoded per vector, so text chunking is often employed - multiplying the number of embeddings by one or two orders of magnitude (and even more with late interaction models like ColBERT or ColPali).

Due to the nature of approximate nearest neighbors indices and traversal algorithms like HNSW, the required data structures can demand prohibitive amounts of RAM. For example, indexing 1 billion BERT-base embeddings (768 dimensions at full precision) corresponds to roughly 3.5TB of RAM, or around $22K/month at typical cloud rates - before considering redundancy replicas.

If you're interested in this topic, we've covered how binary quantization in OpenSearch helps bring this back to a reasonable cost in a previous blog post, and our CTO Fernando Rejon Barrera shared more details in an OpenSearchCon Europe 2025 talk.

7. Enterprise RAG is not Single-Shot (Agentic RAG)

In large organizations, generic, single-pass RAG often yields answers that are technically correct but incomplete. Advanced enterprise use cases - like Deep Research that produces full-fledged research reports on topics or projects - require Agentic RAG. Instead of a single index lookup stuffed into the prompt, the system executes a dynamic plan across components: enterprise retrieval and web search, memory, external tools/APIs (e.g., via MCP), and often multi-agent orchestration with planners, specialist sub-agents, and reasoning models. Rigid, monolithic pipelines are hard to adapt and tend to underperform as they limit iterative control.

Flowchart showing how LLM agents decompose tasks, use tools, search, and synthesize results.

Related research papers:

SFR-DeepResearch Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents - X. Nguyen et al. (Salesforce AI Research) - Sept. 9th 2025
BrowseComp-Plus A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent - Z. Chen et al. (U. Waterloo, CSIRO, CMU) - Aug. 8th 2025

8. Enterprise RAG must be Grounded

Trustworthiness and transparency are critical to adoption. Outputs should point back to the exact sources used to synthesize the answer, so the users can verify its correctness quickly and effortlessly. Common patterns include in-line citations and snippet previews on hover, revealing the relevant evidence without disrupting the knowledge discovery workflow.

Related research papers:

Why Language Models Hallucinate - A. Kalai et al. (OpenAI) - Sept. 4th 2025

What's Next for Enterprise RAG?

An integral part of the development lifecycle for any production-grade AI application is rigorous evaluation, usually followed by a few iteration cycles to iron out issues and reach the target accuracy. Recently, we've seen growing interest in prompt/program optimization for RAG & multi-agent systems, with frameworks like DSPy and TextGrad that iterate using LLM-generated critiques in natural language to refine behavior.

This has inspired a newer line of work that tackles optimization in an end-to-end manner, using evolutionary (genetic) approaches to explore improvements across prompts, tools, and agent graphs, scored against objectives like accuracy, number of tool calls, latency, and overall complexity. We are optimistic about this approach: in our experiments, automatic agent optimization has matched - and at times surpassed - the performance of systems such as Zeta Alpha's Deep Research agent that were refined over many months, while completing the optimization cycle within a few hours and for a total cost < $100 in LLM calls.

Related research papers:

GEPA Reflective Prompt Evolution Can Outperform Reinforcement Learning - L. Agrawal et al. (UC Berkeley, Stanford, Databricks, MIT) - July 25th 2025
Maestro Joint Graph & Config Optimization for Reliable AI Agents - W. Wang et al. (RELAI.ai) - Sept. 4th 2025

In this blog post, we identified common Enterprise RAG challenges and shared practical ways to address them without reinventing the wheel. If you are leading knowledge work and exploring how to integrate AI and Enterprise RAG in your processes and workflows, reach out to us for an initial conversation on turning your internal expertise into a valuable asset.

For more insights and detailed coverage, watch the full webinar recording below, and join our Luma community for upcoming discussions and events.