A_Lucas

Posted on Jan 22

From Enterprise RAG to AI Assistant—Alibaba Cloud Elasticsearch Technology Practice in AI Search

#productivity #elasticsearch #ai #development

01 Challenges in landing AI search

Over the past year, rapid iteration of base large language model technology has driven the evolution of AI search in the following ways:

Refactoring of the search technology chain

The comprehensive reconstruction based on the large language model is reshaping the technology chain of AI search. Every link from data collection, document parsing, and vector retrieval to query analysis, intent recognition, ranking models, and knowledge graphs is undergoing profound changes. New interaction methods such as conversational search, answer summarization, intelligent customer service, enterprise digital employees and virtual people are gradually becoming mainstream, which not only improves the user experience, but also provides the possibility for more application scenarios.

AI search as infrastructure

AI search has become one of the basic technologies for various AI applications. As a popular AI native application, it not only drives the development of knowledge-based AI applications, but also gradually becomes a built-in capability of major basic models. For example, techniques such as vector retrieval, retrieval-enhanced generation (RAG), and semantic search have been widely used in several domains. This integration trend enhances the adaptability and flexibility of AI search in different scenarios.

Bottlenecks Facing Effectiveness Improvement

Although AI search has made significant progress in terms of effectiveness, the illusion problem is still a major constraint to its wide application, especially in business scenarios that require high knowledge accuracy. In addition, high cost and low privacy security controllability are also important challenges in the implementation process.

To cope with these issues, Alibaba cloud Elasticsearch has introduced an innovative AI search solution that uses RAG technology to enhance the capabilities of all aspects of retrieval enhancement generation, and deeply integrates the enterprise version of the AI Assistant, applying RAG technology to the field of AIOps.

02 5X Improvement in Elasticsearch Vector Performance

The Elasticsearch vector engine continues to be optimized, with particular focus on performance and cost improvements. Initially, due to a common perception bias - that ES vector engines are powerful but may have performance shortcomings, especially for applications in the Java ecosystem - this view is gradually being overturned by its technical evolution. From the initial 8.0 release to the current 8.15 release, Elasticsearch has iterated and made significant progress in performance optimization, including but not limited to the effective integration of hardware acceleration technologies.

Elasticsearch utilizes hardware acceleration technology to achieve significant performance leaps in the field of vector retrieval, especially when dealing with complex similarity computation tasks. This technological innovation is not limited to the theoretical level; it has been proven that the deep integration of hardware gas pedals has increased the efficiency of some computationally intensive operations by several times or more. For example, benchmark data from September 2022 to the present can be visualized that the query response time has been drastically reduced from the initial 100ms to about 20ms now, highlighting the huge performance improvement brought by the iterative upgrading of Elasticsearch's vector retrieval.

Elasticsearch's memory optimization is also noteworthy. Through vector quantization technology, the memory required is only a quarter of the original demand, which greatly improves resource utilization. In the latest version, BBQ (Better Binary Quantization) brings a quantization leap to Elasticsearch, reducing float32 dimensions to bits, which reduces memory by about 95% while maintaining high ranking quality. BBQ outperforms traditional methods such as Product Quantization (PQ) in terms of indexing speed (20-30x reduction in quantization time) and query speed (2-5x improvement in query speed), with no additional loss of accuracy.

03 Comprehensive Explanation of Elasticsearch for the Enterprise AI Capabilities

Semantic expansion and sparse vector representation:

Elasticsearch utilizes techniques such as sparse coding, which not only indexes the original vocabulary, but also effectively extends it to concepts or words related to it, with model-calculated weights attached to each extension, enhancing the depth and breadth of semantic understanding. This is due to the sparse vector technology, which stores information efficiently with a low memory footprint, compared to dense vectors that require full-memory indexing, which significantly improves resource efficiency.

Query efficiency and resource optimization:

The query process benefits from the inverted index structure, which avoids the overhead of vector similarity matching and accelerates the retrieval speed. In addition, Elasticsearch's sparse vectors reduce memory requirements, further optimizing resource utilization.

Hybrid Search Strategy:

Modern search requirements have motivated Elasticsearch to support multimodal queries, combining text, vector search, and rrf hybrid sorting methods to enhance the relevance and coverage of results. This hybrid search strategy recalls more diverse data and enhances the user experience.

Ranking and Relevance Adjustment:

In order to accurately select the most relevant results from the large amount of data recalled, ES employs a ranking mechanism such as BM25 to initially determine the weights considering factors such as document frequency and location. Subsequently, a secondary ranking of the initially filtered documents is performed by integrated learning or a more refined model (e.g., Rerank stage) to ensure that the top results are highly relevant.

Model integration and native support:

Elasticsearch demonstrates powerful model integration capabilities, allowing users to load custom models directly into the cluster and run them, enabling end-to-end processing from input to output (e.g., word embedding generation) without external preprocessing steps. This not only simplifies the workflow, but also facilitates the seamless integration of machine learning models with the search engine, reinforcing the system's level of intelligence and adaptability.

04 Alibaba Cloud Elasticsearch accuracy has been increased to 95 percent

Relying on the powerful Elasticsearch foundation and based on the Alibaba Cloud AI search open platform, the Alibaba cloud Elasticsearch AI search product integrates diversified models and hybrid retrieval technologies to realize the leap from traditional search to AI semantic search. The solution forms a complete and efficient application framework for RAG scenarios through fine data preprocessing, intelligent vectorization, multi-dimensional search recall, and large model-assisted generation.

Document parsing and slicing: Use self-developed models to identify unstructured data, extract key information, and ensure content integrity and semantic coherence.

Efficient vectorization: Adopting parameter-optimized vector models to reduce costs while ensuring effectiveness and achieve efficient execution of the vectorization process.

RRF Hybrid Retrieval Strategy: Combining text, sparse and dense vector indexes to realize multi-way recall and significantly improve retrieval precision and efficiency.

Intent Understanding and Rearrangement Optimization: Understand user intent through query analysis model, and work with the rearrangement model to fine-sort the results to ensure the relevance of the content.

Comprehensive Evaluation and Flexible Configuration: AI search open platform provides one-stop service, including multiple model components, compatible with open source ecology, helping enterprises quickly build customized search systems.

Through the comprehensive application of Alibaba cloud Elasticsearch AI search, the customer witnessed remarkable results in the knowledge base Q&A scenario, with the accuracy rate increasing from 48% initially to over 95% eventually. In addition, the combination of three-way hybrid search and re-ranking model further enhances the accuracy of search and guarantees the excellence of search experience.

05 AI Assistant Integrates Qwen LLM to Enable AIOps

Elasticsearch Enterprise Edition's AI Assistant incorporates RAG technology and Alibaba cloud's large language model services to provide an AIOps assistant for enterprises. This innovative tool shows excellent application potential in multiple areas such as universal search, observability analysis and security, and not only helps developers make significant progress in anomaly monitoring, alert handling, problem identification and diagnosis, data analysis and modeling, and query performance optimization, but also greatly improves work efficiency through a more intuitive and easy-to-use interactive interface.

Especially in terms of observability, the AI Assistant can efficiently request, analyze, and visualize your data with the help of an automated function call mechanism, transforming it into information with practical operational value. In addition, a knowledge base based on Elastic Learned Sparse EncodeR (ELSER) support further enriches contextual information and suggestions from private datasets; while RAG technology combined with generalized large models ensures more accurate data understanding and expressiveness.

After integrating the Qwen Model on the Alibaba Cloud AI Search Open Platform, Elasticsearch's AI Assistant pays special attention to activation function simulation calls to ensure seamless compatibility between different systems. This allows users to flexibly switch between multiple connectors according to specific needs, thus realizing efficient information retrieval and processing processes. Especially in microservice operation and maintenance scenarios, AI Assistant plays a crucial role - it not only monitors abnormal conditions and potential failure points in real time, but also analyzes detailed error logs and quickly locates the root cause of the problem in conjunction with existing operation and maintenance manuals. At the same time, AI Assistant can also effectively integrate all kinds of alarm information, comprehensively analyze the chain of security attacks, and then propose practical defense strategies, significantly improving the speed and quality of problem solving.

By calling the API interface and automatically generating ESQL query statements, AI Assistant is able to perform complex data analysis tasks and generate intuitive and easy-to-understand statistical charts, even for users who do not know much about Elasticsearch query syntax. Whether it's exploring relationships between fields or interpreting data trends and other data insights, AI Assistant meets the diverse needs of users with high efficiency and easy operation.

Free Trial：AlibabaCloud Elasticsearch official website

Learn More：AlibabaCloud Technology Solutions

DEV Community

From Enterprise RAG to AI Assistant—Alibaba Cloud Elasticsearch Technology Practice in AI Search

01 Challenges in landing AI search

02 5X Improvement in Elasticsearch Vector Performance

03 Comprehensive Explanation of Elasticsearch for the Enterprise AI Capabilities

04 Alibaba Cloud Elasticsearch accuracy has been increased to 95 percent

05 AI Assistant Integrates Qwen LLM to Enable AIOps

Top comments (0)

Read next

ElasticSearch Architecture: A Comprehensive Guide

What On Earth Is The system_instruction Parameter in Gemini (It's More Powerful Than You Think)

10 API Skills Every Developer Needs to Work with AI

COBOL, Dates, May 20, 1875, and Disinformation