Anti-Money Laundering and Fraud Prevention With MongoDB Vector Search and OpenAI

Ainhoa Múgica, Shiv Pullepu, Jack Yallop, and Paul Claret
July 17, 2024
#Vector Search

Fraud and anti-money laundering (AML) are major concerns for both businesses and consumers, affecting sectors like financial services and e-commerce. Traditional methods of tackling these issues, including static, rule-based systems and predictive artificial intelligence (AI) methods, work but have limitations, such as lack of context and feature engineering overheads to keeping the models relevant, which can be time-consuming and costly.

Vector search can significantly improve fraud detection and AML efforts by addressing these limitations, representing the next step in the evolution of machine learning for combating fraud. Any organization that is already benefiting from real-time analytics will find that this breakthrough in anomaly detection takes fraud and AML detection accuracy to the next level.

In this post, we examine how real-time analytics powered by Atlas Vector Search enables organizations to uncover deeply hidden insights before fraud occurs.

The evolution of fraud and risk technology

Over the past few decades, fraud and risk technology have evolved in stages, with each stage building upon the strengths of previous approaches while also addressing their weaknesses:

Risk 1.0: In the early stages (the late 1990s to 2010), risk management relied heavily on manual processes and human judgment, with decision-making based on intuition, past experiences, and limited data analysis. Rule-based systems emerged during this time, using predefined rules to flag suspicious activities. These rules were often static and lacked adaptability to changing fraud patterns.
Risk 2.0: With the evolution of machine learning and advanced analytics (from 2010 onwards), risk management entered a new era with 2.0. Predictive modeling techniques were employed to forecast future risks and detect fraudulent behavior. Systems were trained on historical data and became more integrated, allowing for real-time data processing and the automation of decision-making processes. However, these systems faced limitations such as,

Feature engineering overhead: Risk 2.0 systems often require manual feature engineering.
Lack of context: Risk 1.0 and Risk 2.0 may not incorporate a wide range of variables and contextual information.

Risk 2.0 solutions are often used in combination with rule-based approaches because rules cannot be avoided. Companies have their business- and domain-specific heuristics and other rules that must be applied.

Here is an example fraud detection solution based on Risk 1.0 and Risk 2.0 with a rules-based and traditional AI/ML approach.

Risk 3.0: The latest stage (2023 and beyond) in fraud and risk technology evolution is driven by vector search. This advancement leverages real-time data feeds and continuous monitoring to detect emerging threats and adapt to changing risk landscapes, addressing the limitations of data imbalance, manual feature engineering, and the need for extensive human oversight while incorporating a wider range of variables and contextual information.

Depending on the particular use case, organizations can combine or use these solutions to effectively manage and mitigate risks associated with Fraud and AML.

Now, let us look into how MongoDB Atlas Vector Search (Risk 3.0) can help enhance existing fraud detection methods.

How Atlas Vector Search can help

A vector database is an organized collection of information that makes it easier to find similarities and relationships between different pieces of data. This definition uniquely positions MongoDB as particularly effective, rather than using a standalone or bolt-on vector database. The versatility of MongoDB’s developer data platform empowers users to store their operational data, metadata, and vector embeddings on MongoDB Atlas and seamlessly use Atlas Vector Search to index, retrieve, and build performant gen AI applications.

Watch how you can revolutionize fraud detection with MongoDB Atlas Vector Search.

The combination of real-time analytics and vector search offers a powerful synergy that enables organizations to discover insights that are otherwise elusive with traditional methods. MongoDB facilitates this through Atlas Vector Search integrated with OpenAI embedding, as illustrated in Figure 1 below.

Figure 1: Atlas Vector Search in action for fraud detection and AML

Framework displaying Atlas Vector Search in action. The App and other sources of data are connected to MongoDB via Kafka. That data is then connected to Atlas triggers through vector embeddings in collaboration with OpenAI. Data also flows into change streams through aggregate text. Finally, classified transactions flow into Atlas Vector Search.

Business perspective: Fraud detection vs. AML

Understanding the distinct business objectives and operational processes driving fraud detection and AML is crucial before diving into the use of vector embeddings.

Fraud Detection is centered on identifying unauthorized activities aimed at immediate financial gain through deceptive practices. The detection models, therefore, look for specific patterns in transactional data that indicate such activities. For instance, they might focus on high-frequency, low-value transactions, which are common indicators of fraudulent behavior. AML, on the other hand, targets the complex process of disguising the origins of illicitly gained funds. The models here analyze broader and more intricate transaction networks and behaviors to identify potential laundering activities. For instance, AML could look at the relationships between transactions and entities over a longer period.

Creation of Vector Embeddings for Fraud and AML

Fraud and AML models require different approaches because they target distinct types of criminal activities. To accurately identify these activities, machine learning models use vector embeddings tailored to the features of each type of detection.

In this solution highlighted in Figure 1, vector embeddings for fraud detection are created using a combination of text, transaction, and counterparty data. Conversely, the embeddings for AML are generated from data on transactions, relationships between counterparties, and their risk profiles. The selection of data sources, including the use of unstructured data and the creation of one or more vector embeddings, can be customized to meet specific needs. This particular solution utilizes OpenAI for generating vector embeddings, though other software options can also be employed.

Historical vector embeddings are representations of past transaction data and customer profiles encoded into a vector format. The demo database is prepopulated with synthetically generated test data for both fraud and AML embeddings. In real-world scenarios, you can create embeddings by encoding historical transaction data and customer profiles as vectors.

Regarding the fraud and AML detection workflow, as shown in Figure 1, incoming transaction fraud and AML aggregated text are used to generate embeddings using OpenAI. These embeddings are then analyzed using Atlas Vector Search based on the percentage of previous transactions with similar characteristics that were flagged for suspicious activity.

In Figure 1, the term "Classified Transaction" indicates a transaction that has been processed and categorized by the detection system. This classification helps determine whether the transaction is considered normal, potentially fraudulent, or indicative of money laundering, thus guiding further actions.

If flagged for fraud: The transaction request is declined.
If not flagged: The transaction is completed successfully, and a confirmation message is shown.

For rejected transactions, users can contact case management services with the transaction reference number for details. No action is needed for successful transactions.

Combining Atlas Vector Search for fraud detection

With the use of Atlas Vector Search with OpenAI embeddings, organizations can:

Eliminate the need for batch and manual feature engineering required by predictive (Risk 2.0) methods.
Dynamically incorporate new data sources to perform more accurate semantic searches, addressing emerging fraud trends.
Adopt this method for mobile solutions, as traditional methods are often costly and performance-intensive.

Why MongoDB for AML and fraud prevention

Fraud and AML detection require a holistic platform approach as they involve diverse data sets that are constantly evolving. Customers choose MongoDB because it is a unified data platform (as shown in Figure 2 below) that eliminates the need for niche technologies, such as a dedicated vector database.

What’s more, MongoDB’s document data model incorporates any kind of data—any structure (structured, semi-structured, and unstructured), any format, any source—no matter how often it changes, allowing you to create a holistic picture of customers to better predict transaction anomalies in real time.

By incorporating Atlas Vector Search, institutions can:

Build intelligent applications powered by semantic search and generative AI over any type of data.
Store vector embeddings right next to your source data and metadata. Vectors inserted or updated in the database are automatically synchronized to the vector index.
Optimize resource consumption, improve performance, and enhance availability with Search Nodes.
Remove operational heavy lifting with the battle-tested, fully managed MongoDB Atlas developer data platform.

Figure 2: Unified risk management and fraud detection data platform

Diagram depicting the unified risk management and fraud detection platform. The diagram is broken down into 4 categories. The first, is data sources, which includes CRM, 3rd party, Transaction, and Sanctions list. The second category, Ingest, includes on prem and cloud. The third category, store and process, includes store, which is powered by MongoDB document store; generate embeddings, powered by MongoDB index store; and Search, powered by MongoDB Vector Search. The fourth category, activate, includes case management, fraud alerts, charts, and suspicious activity. Finally, the store and process category is connected to LLM and Embeddings.

Given the broad and evolving nature of fraud detection and AML, these areas typically require multiple methods and a multimodal approach. Therefore, a unified risk data platform offers several advantages for organizations that are aiming to build effective solutions. Using MongoDB, you can develop solutions for Risk 1.0, Risk 2.0, and Risk 3.0, either separately or in combination, tailored to meet your specific business needs.

The concepts are demonstrated with two examples: a card fraud solution accelerator for Risk 1.0 and Risk 2.0 and a new Vector Search solution for Risk 3.0, as discussed in this blog. It's important to note that the vector search-based Risk 3.0 solution can be implemented on top of Risk 1.0 and Risk 2.0 to enhance detection accuracy and reduce false positives.

If you would like to discover more about how MongoDB can help you supercharge your fraud detection systems, take a look at the following resources:

Revolutionizing Fraud Detection with Atlas Vector Search

Card Fraud solution accelerator (Risk 1.0 and Risk 2.0)

Risk 3.0 fraud detection solution GitHub repository

← Previous
Meet the 2024 MongoDB Community Champions!
MongoDB is excited to announce our new cohort of Community Champions! MongoDB Community Champions comprise an inspirational global group of passionate, dedicated MongoDB advocates—including customers, partners, and inspiring community leaders. They demonstrate exceptional leadership in advancing the growth and knowledge of MongoDB’s brand and technology. The eighteen Community Champions this year represent a range of expertise and serve in a variety of professional and community roles. For example, Zhiyang Su is a senior applied scientist specializing in search ranking. With extensive experience in natural language processing (NLP), deep learning, and high-performance systems, he excels in dialog system design and optimization. Passionate about knowledge sharing, he regularly writes technical blog posts about MongoDB, NLP, and product design. Community Champions serve as the connective tissue between MongoDB and our community, keeping them informed about MongoDB’s latest developments and offerings. Community Champions also share their knowledge and experiences with others through a variety of media channels and event engagements. “With my contributions, I’m helping developers to get the right thing done faster by boosting their productivity,” said Mark Paluch, Spring Data Engineer and 2024 Community Champion. “Close collaboration in the form of learning, discussing, and giving feedback is key to get there. As members of this program, Champions gain a variety of experiences—including exclusive access to executives, product roadmaps, preview programs, an annual Champions Summit with product leaders—and relationships that grow their professional stature as MongoDB practitioners and help them be seen as leaders in the technology community. “Building on our global Champions program, this impressive group allows us to highlight a new level of outstanding members,” said Chuck Freedman, Director of Advocacy and Enablement, Developer Relations at MongoDB. “Our team led a cross-company nomination, interview, and review process to welcome a range of qualified and inspiring individuals representing our customers, partners, and global community.” Reflecting on this year’s selection process, Abirami Sukumaran , Developer Advocate and 2024 Community Champion, said: “I was impressed by the comprehensive nature of the interview. It wasn't just about checking boxes; it felt like a 360-degree assessment of my knowledge and enthusiasm for MongoDB Atlas, which made the entire process very positive. I am really thrilled to share my experience on this database program with enthralled developers around the globe.” We are also currently accepting applications for the Community Creator program. The Creator program consists of community members who create and share content to help others learn and uplevel their MongoDB knowledge. Creators are given exclusive access to product sessions, priority access to content features, and swag. To learn more, please visit the MongoDB Community Creators page. And now, without further ado, let’s meet the 2024 cohort of Community Champions! For more, visit our MongoDB Community Champions page.
July 16, 2024

Next →
The Converged AI and Application Datastore for Insurance
In the inherently information-driven insurance industry, companies ingest, analyze, and process massive amounts of data, requiring extensive decision-making. To manage this, they rely on a myriad of technologies and IT support staff to keep operations running smoothly but often lack effectiveness due to their outdated nature. Artificial intelligence (AI) holds great promise for insurers by streamlining processes, enhancing decision-making, and improving customer experiences with significantly less time, resources, and staff compared with traditional IT systems. The convergence of AI and innovative application datastores is transforming how insurers work with data. In this post, we’ll look at how these elements are reshaping the insurance industry and offering greater potential for AI-powered applications, with MongoDB at the heart of the converged AI and application datastore. Scenario planning and flexible data layers One of the primary concerns for IT leaders and decision-makers in the insurance industry is making smart technology investments. The goal is to consolidate existing technology portfolios, which often include a variety of systems like SQL Server, Oracle, and IBM IMS. Consolidation helps reduce inventory and prepare for the future. But what does future-proofing really look like? Scenario planning is an effective strategy for future-proofing. This involves imagining different plausible futures and investing in the common elements that remain beneficial across all scenarios. For insurance companies, a crucial common thread is the data layer. By making data easier to work with, companies can ensure that their technology investments remain valuable regardless of how future scenarios unfold. MongoDB’s flexible developer data platform offers a distinct architectural advantage by making data easier to work with, regardless of the cloud vendor or AI application in use. This flexibility is vital for preparing for disruptive future scenarios, whether they involve regulatory changes, market shifts, or technological advancements. Watch now: The Converged AI and Application Datastore: How API's, AI & Data are Reshaping Insurance The role of AI and data in insurance Generative AI is revolutionizing the insurance sector, offering new ways to manage and utilize data. According to Celent's 2023 Technology Insight and Strategy Survey, 33% of companies across different industries have AI projects in planning, 29% in development, and 19% in production (shown in Figure 1 below). This indicates a significant shift towards AI-driven solutions by insurers actively experimenting with gen AI. Figure 1: Celent Technology Insight and Strategy Survey 2023 However, there's tension between maintaining existing enterprise systems and innovating with AI. Insurance companies must balance keeping the lights on with investing in AI to meet the expectations of boards and stakeholders. The solution lies in integrating AI in a way that enhances operational efficiency without overwhelming existing systems. However, data challenges need to be addressed to achieve this, specifically around access to data. According to a Workday Global Survey , only 4% of respondents said their data is fully accessible, and 59% say their enterprise data is somewhat or completely siloed. Without a solid data foundation, insurers will struggle to achieve the benefits they are looking for from AI. Data architectures and unstructured data When adopting advanced technologies like AI and ML, which require data as the foundation, organizations often grapple with the challenge of integrating these innovations into legacy systems due to their inflexibility and resistance to modification. A robust data architecture is essential for future-proofing and consolidating technology investments. Insurance companies often deal with a vast amount of unstructured data, such as claim images and videos, which can be challenging to manage. By leveraging AI, specifically through vector search and large language models, companies can efficiently process and analyze this data. MongoDB is ideal for managing unstructured data due to its flexible, JSON-like document model, which accommodates a wide variety of data types and structures without requiring a predefined schema. Additionally, MongoDB’s flexibility enables insurers to integrate seamlessly with various technologies, making it a versatile and powerful solution for unstructured data management. For example, consider an insurance adjuster assessing damage from claim photos. Traditionally, this would require manually reviewing each image. With AI, the photos can be converted into vector embeddings and matched against a database of similar claims, drastically speeding up the process. This not only improves efficiency but also enhances the accuracy of assessments. The converged AI and application datastore with MongoDB Building a single view of data across various systems is a game-changer for the insurance industry. Data warehouses and data lakes have long provided single views of customer and claim data, but they often rely on historical data, which may be outdated. The next step is integrating real-time data with these views to make them more dynamic and actionable. A versatile database platform plays a crucial role in this integration. By consolidating data into a single, easily accessible view, insurance companies can ensure that various personas, from underwriters to data scientists, can interact with the data effectively. This integration allows for more responsive and informed decision-making, which is crucial for staying competitive in a rapidly evolving market. This can be achieved with a converged AI and application datastore, as shown in Figure 2 below. This is where operational data, analytics insights, and unstructured data become operationally ready for the applications that leverage AI. Figure 2: Converged AI and application datastore reference architecture The convergence of AI, data, and application datastores is reshaping the insurance industry. By making smart technology investments, leveraging AI to manage unstructured data, and building robust data architectures, insurance companies can future-proof their operations and embrace innovation. A versatile and flexible data platform provides the foundation for these advancements, enabling companies to make their data more accessible, actionable, and valuable. The MongoDB Atlas developer data platform puts powerful AI and analytics capabilities directly in the hands of developers and offers the capabilities to enrich applications by consolidating, ingesting, and acting on any data type instantly. Because MongoDB serves as the operational data store (ODS)—with its flexible document model—insurers can efficiently handle large volumes of data in real-time. By integrating MongoDB with AI/ML platforms, insurers can develop models trained on the most accurate and up-to-date data, thereby addressing the critical need for adaptability and agility in the face of evolving technologies. With built-in security controls across all data, whether managed in a customer environment or through MongoDB Atlas, a fully managed cloud service, MongoDB ensures robust security with features such as authentication (single sign-on and multi-factor authentication), role-based access controls, and comprehensive data encryption. These security measures act as a safeguard for sensitive data, mitigating the risk of unauthorized access from external parties and providing organizations with the confidence to embrace AI and ML technologies. If you would like to learn more about the convergence of AI and application datastores, visit the following resources: Video: The Converged AI and Application Datastore: How API's, AI & Data are Reshaping Insurance Paper: Innovation in Insurance with Artificial Intelligence The MongoDB Solutions Library is curated with tailored solutions to help developers kick-start their projects
July 18, 2024