Zeta Alpha: The Neural Discovery Platform
Zeta Alpha is a customizable Neural Discovery Platform for Enterprise. Leverage Neural Search on your private content in any domain, at scale and with secure access control. Connect people with knowledge, unlock existing information silo's, and find expertise inside and outside your organization. Our infrastructure and engineering provides the best ingredients for a state-of-the-art managed neural search solution today. Zeta Alpha’s experts help you to quickly benefit from neural search and natural language understanding technologies, and the way we combine search and knowledge tools helps you create a complete solution to discover, organize, and share the knowledge to make better decisions, in AI, in R&D, and beyond.
Neural Search and Discovery
Neural Search is a new breakthrough in how information can be explored and discovered.
Recent deep learning based neural network models, and in particular very large language models based on the Transformer architecture have allowed a complete rethinking of the basis of document understanding and search (Lin, Nogueira and Yates, 2020). This leads to a new paradigm: Neural Search, breaking the existing limits of keyword based search systems.
In Neural Search, user queries and information needs are encoded as vectors in very high dimensional semantic vector spaces, and information in documents is encoded in the same way. Passage and document retrieval is then performed by quickly finding the most similar among billions of vectors. This approach transforms search from matching keywords to truly meaning based (aka. semantic) search. The first generation of semantic search still required manual maintenance of taxonomies and synonym sets. The most recent innovations in neural search make such manual maintenance obsolete. There are two key challenges in search this approach is able to overcome:
Bridging the lexical gap. In keyword-based systems, users must use words in their query that are also used by the authors of the needed documents. This makes it very difficult to discover what is unknown. By pre-training of neural network models on very large document collections, they understand the a priori semantic relationships between words and phrases. This frees the user from formulating queries using the exact words that the target documents contain. A query for “self-driving cars” will e.g. also find documents about “autonomous vehicles”.
Understanding word order, relationships, and context. When we use longer passages as input, the neural encoding methods will largely preserve the meaning implied by the sentence structure in the encoded vector space, and retrieve the most similar and highly relevant passages, where keyword-based methods have a very hard time retrieving anything relevant at all. This allows discovery and user profiling using natural language questions, or query by example from documents that the user has tagged.
These changes transform the realm of search engines into the domain of discovery or insight engines (Gartner 2021), that allow people to connect to relevant knowledge without actively using search queries and without already having knowledge of what they are searching for (i.e. unknown unknowns).
Moreover, neural search opens the road to effortless multi-lingual and multi-modal discovery engines.
Figure 1. Information discovery without exact keyword matches using Zeta Alpha Transformer Powered Search.
The space of pre-trained and domain-specific neural language understanding models is evolving very rapidly at the moment. The architecture of the Zeta Alpha platform allows our customers to plug in the latest available open source (e.g. through the Hugging Face Transformers Library) or proprietary models for embedding text in vector spaces. The expertise of our PhD-level research team helps us to select the right models for the best performance or to train models based on customer and domain-specific data sets.
Discovery. Discovery of existing and new knowledge is provided primarily through the Search API. Zeta Alpha’s search layer provides both classical keyword-based search (with the usual boolean, phrase and field operators), as well as modern neural vector search. We call the latter Transformer Powered Search. Search is offered both at the Document, Passage and Sentence level, using a variety of filters and ranking functions. External data sources, such as social media mentions, author influence, or code popularity can be used as signals in the ranking.
The results of the search layer also power most of the other modules in the system.
One of the most powerful knowledge discovery modes is ‘Find Similar’, which allows the user to query by example from documents or collections they have identified, but using all the other search operators and filters.
Figure 2. Query by example. ‘Find Similar’ search in Zeta Alpha.
Visual Analytics. When a user quickly wants to get an overview of a large body of documents, a list of search results is not the optimal way of navigating knowledge. For such use cases, we provide a number of Visual Analytics modules to map the search results to two-dimensional semantic maps, time series, or other aggregate statistics.
Figure 3. A Semantic Map of a document collection in Zeta Alpha.
Expert Search. In many cases the goal of an information seeker is not to read existing information, but to connect with internal or external experts that can provide answers or guide the user to insights, better understanding and hence better decision-making. For this purpose, Zeta Alpha incorporates an Expert Search module. Based on a user query, the people most related to the topic are identified (Berger et al., 2020).
Figure 4. Expert Search Module in Zeta Alpha.
Question Answering. In the long term, decision support systems will aim to provide direct answers to users’ questions in natural language rather than a list of search results in the form of documents.
Figure 5. Question Answering Module in Zeta Alpha.
The Zeta Alpha platform is built with this future development in mind and includes both an extractive as well as abstractive question answering system that synthesizes answers from retrieved documents and passages, while providing explainability via pointers to the relevant documents and passages. We consider this functionality ‘in Beta’.
Teamwork & knowledge management: Tagging, Notes and Sharing. The Zeta Alpha platform combines concepts of search and insight engines with common productivity tools for organizing search results into collections (called Tags), for note taking, and for sharing collections and notes in teams. These functions aim to promote knowledge reuse and learning among colleagues. Tags and notes can be used for topical or project-related collections, and are created on the fly by users as needed. Users will get alerts when tags they follow receive updates by others.
Figure 6. Users’ private and shared tags for organizing knowledge in Zeta Alpha.
Staying Up-to-Date: Recommendations. Once users define what they are working on in their system of tags and stored documents, the Zeta Alpha platform will start filtering new documents that are added to the system on a daily basis and sending highly relevant recommendations as alerts. This makes it easier for busy knowledge workers to stay up-to-date in a highly dynamic knowledge environment.
Figure 7. Daily recommendations via email and in the Zeta Alpha platform.
Our Services: building custom Neural Search and Discovery solutions.
Do you have large amounts of unstructured data and does keyword search not give you the best results? Do you feel your whole knowledge management process needs modernization? Let Zeta Alpha build a state-of-the-art neural discovery solution for your company.
There are many open source neural search solutions. Do you really want to build your own expertise and development team in this area? With Zeta Alpha you will not reinvent the wheel. Our experts can advise, and you start with a turn-key solution and a fully featured user friendly neural search platform that works.
Our fully featured neural search platform, tailored for you::
Indexing of all enterprise content using state-of-the-art neural search
Your private search schema and knowledge base
Customized processing workflows
Guaranteed 99.9% availability
Fine grained access controls
SSO and API integration
Customization to New Domains
All components of the system, such as crawlers, connectors to third-party systems, document processing steps, or neural encoders can be fully configured to new schema’s and knowledge domains as needed. The Zeta Alpha platform can give access to a mix of public and private documents and respects access rights by design.
InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval
2023 | Vitor Jeronymo, Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Jakub Zavrel, Rodrigo Nogueira
Multi-objective Representation Learning for Scientific Document Retrieval
2022 | Mathias Parisot and Jakub Zavrel (accepted at SDP workshop COLING 2022)
No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval
2022 | Guilherme Moraes Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo & Rodrigo Nogueira
Dense Neural Retrieval for Scientific Documents at Zeta Alpha
2022 | Jakub Zavrel, Marzieh Fadaee, Artem Grotov and Rodrigo Nogueira
InPars: Data Augmentation for Information Retrieval using Large Language Models
2022 | Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee & Rodrigo Nogueira
Building a Platform for Ensemble-based Personalized Research Literature Recommendations for AI and Data Science at Zeta Alpha
2021 | Jakub Zavrel, Artem Grotov, & Jonathan Mitnik
mMARCO: A Multilingual Version of the MS MARCO Passage Ranking Dataset
2021 | Luiz Bonifacio, Vitor Jeronymo, Hugo Queiroz Abonizio, Israel Campiotti, Marzieh Fadaee, Roberto Lotufo & Rodrigo Nogueira
Pretrained Transformers for Text Ranking: BERT and Beyond
2021 | Jimmy Lin, Rodrigo Nogueira, & Andrew Yates
A New Neural Search and Insights Platform for Navigating and Organizing AI Research
2020 | Marzieh Fadaee, Olga Gureenkova, Fernando Rejon Barrera, Carsten Schnober, Wouter Weerkamp, Jakub Zavrel