The year 2024 marked a pivotal moment for information retrieval (IR), fueled by remarkable advancements in AI—particularly in deep learning. Improvements in data scale, computational power, and model size catalyzed a paradigm shift, moving IR from traditional keyword-based matching to deep learning-driven approaches. The growing adoption of large language models (LLMs) further transformed search, information extraction, and knowledge synthesis, bringing greater intelligence and innovation.
From the rise of retrieval-augmented generation (RAG) to more advanced Graph RAG, which integrates knowledge engineering techniques with RAG, information retrieval experienced a profound transformation. These advancements have democratized AI and expanded its applications across enterprise search, content discovery, knowledge management, and data synthesis, driving widespread adoption and setting new benchmarks for the industry.
This blog will summarize the monumental changes AI brought to Information Retrieval (IR) in 2024, exploring how deep learning, LLMs, and vector databases redefined search, data analysis, and knowledge synthesis. We'll also look ahead to the innovations expected in 2025, including advancements in RAG, multimodal embeddings, and AI infrastructure, setting the stage for the next wave of AI-driven applications.
Scaling Law: The Driving Force Behind AI Advances
The scaling law is the key driver of AI advancements in 2024. Larger model sizes, datasets, and computational resources have given rise to increasingly powerful LLMs such as GPT-4o and Claude 3.5, alongside more capable embedding models like OpenAI’s text-embedding-3-large and the open-source BGE-M3. These advancements have significantly improved generalization across domains, setting new benchmarks for understanding and retrieval tasks.
Information retrieval (IR) systems and LLMs have become deeply integrated, leveraging external data sources by combining semantic search, full-text retrieval, and tools such as knowledge graphs (KGs) into unified systems. Additionally, advanced LLMs with reasoning and self-reflection capabilities can act as agents, autonomously deciding when to use retrieval tools. This integration has enabled more nuanced reasoning, precise retrieval, and human-like answer generation, transforming search engines, enterprise knowledge bases, and conversational AI platforms.
Figure: Query Intention Understanding with LLM's Reasoning Capability Replaces Complex Algorithms in Traditional Web Search
Evolution of RAG: From Prototype to Production
Retrieval-Augmented Generation (RAG), introduced as a practical approach to enhancing LLMs with external knowledge bases, matured significantly in 2024. It transitioned from Twitter demos to production-ready systems, gaining adoption across industries—from enterprise knowledge bases to consumer-facing chatbots. Let’s see how it matured.
Quality Improvement with Hybrid Search and Rerankers
Cross-encoder-based rerankers enhance retrieval accuracy by directly scoring query-document relevance rather than relying solely on vector similarity. Typically applied after initial retrieval via Approximate Nearest Neighbor (ANN) search, these rerankers conduct deep contextual analysis to prioritize the most relevant results. This nuanced approach can improve the precision and quality of RAG-generated answers.
Figure: How Does a Reranker Enhance Your RAG Apps?
Offline Labeling and Metadata Filtering
Offline LLM-powered label extraction has automated tagging documents with metadata, such as version numbers or covered features. For instance, metadata filters ensure that queries like “What index types are supported in Milvus 2.5?” retrieve only relevant information, avoiding irrelevant results from other versions.
These innovations has enhanced the adaptability of RAG in complex scenarios requiring higher answer quality or finer control over responses. As a result, RAG’s applications expanded to diverse use cases, including customer support, technical documentation, and enterprise knowledge management.
Enhancing Document Parsing and Preprocessing with LLMs
Integrating large language models (LLMs) into document preprocessing has revolutionized the handling of unstructured data, such as PDF files and scanned images. Tools like LlamaIndex’s LlamaParse and Unstructured.io have enabled the extraction of structured data from complex documents. Many document processing tools now include OCR capabilities, with some even leveraging vision-language models to extract tabular data and raw text. This functionality is particularly useful for industries such as legal, healthcare, and finance, which often rely heavily on tabular data.
Additionally, more sophisticated data processing techniques leverage LLMs as preprocessors, offering significant advancements. One example is contextual retrieval, which enhances the accuracy of information retrieval by addressing the loss of context during document chunking. By enriching each chunk with specific contextual details derived from the broader document, LLMs ensure that the retrieved content is more comprehensive thus, easier to retrieve and directly answer the user's question. For instance, raw chunks from a financial report may lack important context, such as the company being discussed and the relevant time period. Summarizing additional context from the entire report proves beneficial. Combined with hybrid retrieval and re-ranking, this approach enhances the relevance of retrieval quality, making RAG more practical. Contextual retrieval can be cost-effective when combined with prompt caching, as caching features reduce costs by avoiding the need for repeated processing of the same content.
Figure: An example of using an LLM to Enhance Document Parsing and Preprocessing
ColBERT and ColPali: Thinking Outside the Box
Conventional retrieval models typically depend on single-vector embeddings to represent entire documents, which limits their ability to capture fine-grained relationships between queries and documents. ColBERT (Contextualized Late Interaction over BERT) introduced a transformative late interaction mechanism that leverages multi-vector or token-level representations, enabling more detailed and context-aware retrieval. Instead of collapsing a document into a single vector, ColBERT encodes documents and queries into sets of contextual embeddings. The MaxSim operation then matches each query token to the most similar document token, producing a holistic and fine-grained relevance score. This approach enhances retrieval precision while maintaining computational efficiency, supporting the pre-computation of document embeddings.
Figure: How ColBERT works
ColPali extended ColBERT’s innovations by integrating Vision Language Models (VLMs), representing multimodal content like text, images, and diagrams as unified embeddings. This approach preserved documents' visual and structural integrity, bypassing traditional OCR and segmentation challenges and improving RAG performance for multimodal data.
Building upon ColBERT, ColPali extends this innovation to multimodal retrieval by integrating Vision Language Models (VLMs). This enables the unified representation of diverse content types, including text, images, and diagrams. ColPali preserves documents' structural and visual character, avoiding the pitfalls of traditional optical character recognition (OCR) methods. This advancement significantly enhances the performance of RAG for multimodal datasets, making it an ideal tool for applications requiring comprehending textual and visual information.
Figure: How ColPali works
Knowledge Engineering in the Age of LLMs
In 2024, structured knowledge tools like ontologies and knowledge graphs (KGs) experienced a resurgence, complementing large language models (LLMs) by grounding responses in factual data. This approach reduced hallucinations and enabled more accurate, domain-specific retrieval systems. One notable innovation was Graph RAG, which extended traditional RAG systems by integrating KGs into the retrieval process. Unlike baseline RAGs, which focus solely on semantic similarity, Graph RAG supports multi-hop reasoning and links disparate data points, enhancing its ability to answer complex queries, such as tracing historical relationships or navigating intricate datasets.
LLMs can now seamlessly transform unstructured text into structured knowledge graphs representing entities and their interrelationships. When combined with KGs, these systems boost semantic reasoning and provide deeper insights, overcoming the limitations of traditional RAG pipelines. These advancements underscore the transformative impact of LLMs in document parsing and preprocessing, especially in data-intensive and high-stakes industries.
Figure: KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation | Source
Text2SQL: Democratizing Data Access
The complexity of SQL and database schemas often limits data access to skilled analysts. In 2024, text-to-SQL technologies empowered non-technical users to query databases using plain language. Such technologies use LLMs to translate natural language into accurate SQL queries, transforming analytics workflows and democratizing data-driven decision-making across organizations.
When integrated with RAG pipelines, Text2SQL bridged the gap between structured databases and unstructured retrieval systems, making AI-driven insights more accessible. On the other hand, the vector database is a critical tool to help LLM compose SQL by storing the relevant high cardinality data or related SQL examples.
Figure- How Zilliz Cloud and Waii works to implement Text2SQL
A Recap of The Transformative Year of 2024
The year 2024 has been a turning point for Information Retrieval (IR). Advances in deep learning and Large Language Models (LLMs) have redefined how information is searched, processed, and analyzed.
Text embedding models now supplement—or even replace—traditional full-text search systems, delivering more accurate and context-aware results. Image retrieval has seen remarkable progress. Tasks once requiring hundreds of specialized classification models have been streamlined by multimodal embedding models, which unify text, image, and other data formats into a single, efficient framework. Similarly, LLMs have made knowledge entity labeling faster and more cost-effective, enabling the generation of fully automated knowledge graphs. Meanwhile, autonomous agents now generate SQL queries to retrieve insights from relational databases, simplifying analytics and enhancing data accessibility.
As someone deeply immersed in information retrieval for years and the creator of the Milvus vector database, I find it exciting to witness these transformative changes. The innovations of 2024 have laid a solid foundation, and 2025 promises to build upon this momentum with a surge in innovative applications leveraging Retrieval-Augmented Generation (RAG), multimodal embeddings, and agentic workflows.
A Vision for 2025: Milvus and the Future of AI Infrastructure
As AI continues to mature, the need for robust and scalable data infrastructure is becoming increasingly critical. Vector databases like Milvus and Zilliz Cloud, a cornerstone of deep learning-based IR, are rising to this challenge. Our vision at Zilliz for 2025 is ambitious: to deliver faster search speeds, lower storage costs, and seamless integration with existing data ecosystems and various emerging AI technologies.
The upcoming release of Milvus 3.0 will mark the beginning of a new era for vector databases. It will introduce a cloud-native vector lake capable of handling hundreds of billions of data points with unparalleled speed and efficiency. With query response times under 10 milliseconds and near-real-time interactive data exploration, Milvus 3.0 will redefine what’s possible for AI-powered applications. As vector databases solidify their role as a cornerstone of modern AI infrastructure, they will unlock opportunities for the next wave of AI-driven applications.
If you’re excited about what 2025 holds for AI, there’s no better time to start building. Explore our comprehensive guides and tutorials and take the first step toward creating your own cutting-edge AI applications.
Top comments (0)