- Use cases: Gen AI
- Industries: Finance, Healthcare, Retail
- Products: Atlas, Atlas Vector Search, Atlas Stream Processing
- Partners: Confluent, AWS
Whether organizations leverage AI to optimize business processes or enhance customer-facing applications, providing AI models with up-to-date data is essential to delivering a differentiated experience. While retrieval-augmented generation (RAG) systems enable organizations to ground large language models (LLMs) easily and foundational models with the truth of their proprietary data, keeping that data fresh adds another level of complexity.
By continuously updating vector embeddings, the core of RAG systems, AI models have up-to-date data to provide pertinent and accurate answers. Additionally, different embedding models may offer higher levels of accuracy depending on their primary purpose. Take, for example, an embedding model trained primarily on a specific language, such as Japanese or Simplified Chinese, instead of a more popular model that might have general knowledge of several languages. The specialized model will likely create embeddings that enable the foundation model or LLM to output content more accurately.
This solution addresses the issue of continuously updating and routing the creation of vector embeddings in a RAG system. By leveraging MongoDB Atlas Stream Processing and MongoDB Atlas Vector Search, both native capabilities in MongoDB Atlas, this solution walks developers through continuously updating, storing, and searching embeddings with a single interface.
While this solution demonstrates creating vector embeddings of song lyrics in different languages, the scenario is relevant to many industries and use cases, including:
With MongoDB:
The data we currently have about the song consists of the following fields:
The benefit of using the document data model is that it allows you to store all the related information of a song in a single document for easy and fast retrieval.
In the GitHub repository you will find detailed instructions on how to build the solution to update your embeddings asynchronously and at scale, leveraging MongoDB Atlas.
The first step is to create a MongoDB cluster. If you don’t have an Atlas account, create one following the steps in this link: https://www.mongodb.com/docs/guides/atlas/account/
We will create a cluster in Atlas using AWS as our cloud provider and us-east-1 as our region. Additionally, create an Atlas Stream Processing Instance (SPI) following the instructions in the documentation: https://www.mongodb.com/docs/atlas/atlas-sp/manage-processing-instance/
To create a Kafka cluster in Confluent Cloud follow the instructions in their documentation: https://docs.confluent.io/cloud/current/clusters/create-cluster.html#create-ak-clusters
Once you have created the cluster, go to cluster settings and copy the bootstrap URL.
The next step is to configure the topics for use in this solution: SpanishInputTopic, EnglishInputTopic, and OutputTopic.
To configure a new connection, click the configure button in the Stream Processing Instance, then click Connection Registry and add a new connection.
You will use this to connect the Atlas Stream Processing Instance with the Kafka Cluster.
Once you have created your Kafka cluster, Confluent will provide you with the bootstrap server URL, username, and password for the Connection Registry.
To configure the pipelines and connections in the Stream Processing Instance, you can connect to the cluster using the Mongo Shell (mongosh).
When clicking on the Connect button in the Stream Processing Instance, the Atlas UI provides instructions on connecting to the instance.
You can follow the steps to configure Atlas Stream Processing in the README file in the GitHub repo. There you will learn how to create the pipelines to subscribe to changes in MongoDB, emit to each language-specific topic, and merge the events containing the processed data with the embeddings received from the Kafka cluster into MongoDB using a MongoDB aggregation stage.
Next, you will create language-specific vector indexes in Atlas Search.
The definition for the Atlas Vector Search Index for Spanish is as follows:
The definition for the Atlas Vector Search Index for English is as follows:
The metadata service is a Python script that will subscribe to the input topics, create the tags and embeddings for the corresponding language according to the information received in the event, and write the event to the output topic.
We created a script in Python to help you interactively run semantic queries. You can find the script in the repository under the client folder.
Create this demo for yourself by following the instructions and associated models in this solution’s repository.
Learn about performing an approximate nearest neighbor search.
Set up Atlas Stream Processing and run your first stream processor.