Novo Nordisk & MongoDB Atlas: Groundbreaking Time To Value Acceleration With A Clinical Study Report In Minutes

INDUSTRY

Pharmaceutical

PRODUCTS

MongoDB Atlas

USE CASE

Gen AI

CUSTOMER SINCE

2021

Founded in 1923 in Denmark, Novo Nordisk is today one of the world’s leading healthcare companies. Building upon its heritage in diabetes treatments, the company’s mission is to drive change to defeat serious chronic diseases. It does this by pioneering scientific breakthroughs, expanding access to its medicines, and working to prevent — and ultimately cure — disease.

Novo Nordisk employs more than 64,000 people in 80 countries. Its products are marketed in 170 countries, generating revenues of 232 billion Danish Krone ($33.5bn) in its Fiscal Year 2023.

Louise Lind Skov, Head of Content Digitalisation at Novo Nordisk explains, “Our treatments today are benefiting millions of people living with diabetes, obesity, and rare blood and endocrine diseases. We produce 50% of the world’s insulin, have manufactured over 600 million insulin pens, and more than 36 million people are using our diabetes care products. From our labs to our factory floors, we are discovering and developing innovative biological medicines and making them accessible to patients throughout the world.”

By harnessing generative AI (gen AI) with Amazon Bedrock and MongoDB Atlas, Novo Nordisk is dramatically accelerating how quickly it can get new medicines approved and delivered to patients.

“With NovoScribe, we are the first in the industry to generate a complete Clinical Study Report in minutes rather than weeks. We are doing it at scale, and with just a fraction of the resources we needed in the past. It is a game changer for the industry around the world.”

Louise Lind Skov, Novo Nordisk

Reimagining the path to regulatory approval

A Clinical Study Report (CSR) plays a pivotal role in the development process for any new medication. It serves as a comprehensive document that captures the methodology, execution, results, and analyses of a clinical trial. The report’s primary purpose is to provide a detailed account of the medical trial, ensuring that regulatory authorities, healthcare professionals, and other stakeholders, such as researchers and legal teams, can assess the efficacy and safety of a new pharmaceutical product.

An image depicting the Clinical Study Report

Figure 1: Example of a Clinical Study Report

Explaining the time and effort required to produce a clinical study report, Skov says, “A CSR usually takes around 12 weeks to compile, involving a multidisciplinary team of statisticians, scientists, and technical authors. Each day of delay means patients don’t get the treatments they need and the company cannot start to recover its R&D costs.”

The process starts with the statistical analysis of clinical trial data collected in the field, creating outputs such as tables and figures. Technical authors then extract and merge this data with report templates that are used in the regulatory submission. Extensive quality assurance (QA) processes are needed to ensure that all the data in the 100+ page report is consistent, comprehensive, and compliant with regulatory standards.

With the arrival of gen AI, Skov’s team at Novo Nordisk saw the opportunity to drive significant efficiencies in the production of CSRs. And so NovoScribe was born.

NovoScribe: Built on a solid foundation of Amazon Bedrock, LangChain, and MongoDB Atlas Vector Search

Initiating the project in mid-2023, Skov’s team reimagined their workflow with NovoScribe. They experimented with dynamically compiling the CSR by leveraging retrieval augmented generation to prompt state-of-the-art large language models (LLMs) using both statistical outputs from the clinical trials along with vector embeddings of report templates.

Within a few weeks, the experiments proved successful. NovoScribe produced CSRs faster and more accurately, and required fewer resources than the previous manual methods. NovoScribe was ready for prime time.

Tobias Kröpelin, NovoScribe Tech Lead and Statistical Programming Specialist at Novo Nordisk, explains the gen AI stack powering NovoScribe. “Each foundation model has its own strengths and weaknesses, so we typically experiment with a variety of different embedding and generation models for each report we compile.”

NovoScribe uses the Claude 3 and Titan foundation models hosted by Amazon Bedrock, alongside the company’s own private instance of ChatGPT. With the LangChain development and orchestration framework the team can switch between models quickly and easily, without having to change any application code. Using RAG, the models are served with report data and vector embeddings managed by MongoDB Atlas Vector Search.

NovoScribe generates validated text based on defined content rules and statistical output, Atlas Vector Search calculates the similarity of each text snippet to the relevant statistics. This combined with the LLM output draft the CSR. By utilizing Atlas Vector Search the relevant text is selected with a high degree of precision and accuracy. Full lineage of all sources are presented, enabling the authors to verify accuracy, which eliminates weeks of writing and reviews.

“What’s great about MongoDB Atlas is that we can store native vector embeddings of the report right alongside all of their associated text snippets and metadata,” says Kröpelin. “This means we can run really powerful and complex queries quickly. For each vector embedding we can filter on which source document it's coming from, who wrote it, and when.”

“This matters because report quality is critical — we have to get this right because patient safety demands that we don't get it wrong.”

Tobias Kröpelin, PhD, Novo Nordisk

A diagram illustrating the NovoScribe cloud-native architecture

Figure 2: NovoScribe cloud-native architecture

Gen AI + MongoDB Atlas: start fast and scale securely in the cloud

At the outset of the NovoScribe project, Kröpelin and the Novo Nordisk Statistics team started with the relational databases they typically used in their day-to-day work. But it quickly became obvious that the data model needed to feed both statistical outputs and report text into the LLMs was hugely complex and nowhere near flexible enough to cope with the pace of NovoScribe’s rapid feature development.

Kröpelin says, “Working with the tabular model of our traditional relational database, we would have ended up with dozens of separate tables, each with just a couple of columns. These looked nothing like the Python dictionaries my team were working with in code, which slowed down our development velocity. What also slowed us down was that we couldn’t make any changes to our application without complex schema migrations in the database. And then joining all of these tables at query time to prompt the LLMs crippled application performance and user experience.”

Beyond relational databases, Kröpelin’s team also had familiarity with MongoDB and quickly recognized its document data model would provide the ease of use, flexibility, and speed demanded by NovoScribe. A single call from the MongoDB Python driver can retrieve the entire object — including the source text snippets, its vector embedding, and metadata — without the overhead of joining data.

In addition to programmatic access, MongoDB Compass is available for non-developer team members to view and filter data stored in MongoDB via a GUI, enabling them to review the data set’s completeness before serving it to the LLMs.

By using the fully managed MongoDB Atlas service, Novo Nordisk gets the mission-critical assurances it needs to run highly regulated applications. As Waheed Jowiya, Digitalisation Strategy Lead at Novo Nordisk says, “Security and disaster recovery are non-negotiable. We have VPC access via Atlas’ support for Amazon Privatelink. In addition, fine-grained access controls, auditing, end-to-end data encryption, and backups are all standard Atlas features, configured with simple API calls.”

Jowiya goes on to say, “We have a small team, so the operational automation provided by MongoDB Atlas is invaluable. It also gives us optionality. NovoScribe runs on AWS today, but as a company, we also have a relationship with Azure. Through its multi-cloud support, we can run Atlas between both hyperscale platforms with complete freedom and no lock-in.”

Gen AI: Unlocking incredible productivity gains and looking to the future

Today NovoScribe compiles around 30% of all CSRs for Novo Nordisk, and this work is expected to increase to over 90% by the end of the year. The results achieved demonstrate why NovoScribe is gaining adoption momentum so quickly across the company.

“We’ve reduced the time taken to create Clinical Study Reports from 12 weeks to 10 minutes, with higher quality outputs and a fraction of the team. In terms of value, each day sooner a medicine gets to market can add around $15 million in revenue to the company.”

Waheed Jowiya, Digitalisation Strategy Lead at Novo Nordisk

Jowiya goes on to say that the LLMs take just minutes to generate the CSR using the data retrieved from MongoDB Atlas to produce the final output. The rest of the time is spent in QA. Highly skilled team members no longer have to take the time to pull the data together, or double check that they are cutting and pasting the right statistics into the appropriate section of the report. The gen AI models automate the process now, freeing them up to focus on driving more breakthrough research and development.

For Novo Nordisk, NovoScribe is just the start. Beyond CSRs, the company is exploring many new opportunities to apply gen AI in every part of its business, with MongoDB Atlas at the core of its efforts.

“Everything in gen AI is new — you can’t just go to GitHub and repurpose code others have written. Only MongoDB Atlas gives us the flexibility and scale at the data platform layer to experiment in how to harness one of the biggest technical advancements the industry has ever seen.”

Louise Lind Skov, Head of Content Digitalisation at Novo Nordisk

Next Steps

To learn more about how others are innovating with AI, check out the Building AI with MongoDB case study series. Then register for MongoDB Atlas and get started with gen AI in your next project.

From 12 weeks to 10 minutes: How Novo Nordisk Accelerates Time To Value with GenAI and MongoDB