How the NFSA is Using MongoDB Atlas and AI to Make Aussie Culture Accessible

Elliott Gluck

#Atlas Search#genAI#Vector Search

Where can you find everything from facts about Kylie Minogue, to more than 6,000 Australian home movies, to a 60s pop group playing a song with a drum-playing kangaroo? The NFSA!

Founded in 1935, the National Film and Sound Archive of Australia (NFSA) is one of the oldest archives of its kind in the world. It is tasked with collecting, preserving, and sharing Australia’s audiovisual culture. According to its website, the NFSA “represents not only [Australia’s] technical and artistic achievements, but also our stories, obsessions and myths; our triumphs and sorrows; who we were, are, and want to be.”

The NFSA’s collection includes petabytes of audiovisual data—including broadcast-quality news footage, TV shows, and movies, high-resolution photographs, radio shows, and video games—plus millions of physical and contextual items like costumes, scripts, props, photographs, and promotional materials, all tucked away in a warehouse.

“Today, we have eight petabytes of data, and our data is growing from one to two petabytes each year,” said Shahab Qamar, software engineering manager at NFSA.

Making this wealth of data easily accessible to users across Australia (not to mention all over the world) has led to a number of challenges, which is where MongoDB Atlas—which helps developers simplify and accelerate building with data—comes in.

Don’t change (but apply a few updates)

Because of its broad appeal, the NFSA's collection website alone receives an average of 100,000 visitors each month.

When Qamar joined the NFSA in 2020, he saw an opportunity to improve the organization’s web platform. His aim was to ensure the best possible experience for the site’s high number of daily visitors, which had begun to plateau. This included a website refresh, as well as addressing technical issues related to handling site traffic, due to the site being hosted on on-premises servers.

The site also wasn’t “optimized for Google Analytics,” said Qamar. In fact, the NFSA website was invisible to Google and other search engines, so he knew it was time for a significant update, which also presented an opportunity to set up strong data foundations to build deeper capabilities down the line.

But first, Qamar and team needed to find a setup that could serve the needs of the NFSA and Australia’s 26 million residents more robustly than their previous solution.

Specifically, Qamar said, the NFSA was looking for a fully managed database that could also implement search at scale, as well as a system that his small team of five could easily manage. It also needed to ensure high levels of resiliency and the ability to work with more than one cloud provider. The previous NFSA site also didn’t support content delivery networks, he added.

MongoDB Atlas supported all of the use cases the NFSA was looking for, Qamar said, including the ability to support multi-cloud hosting. And because Atlas is fully managed, it would readily meet the NFSA's requirements. In July 2023, after months of development, the new and greatly improved NFSA website was launched.

The redesign was immediately impactful: Since the NFSA’s redesigned site was launched, the number of users visiting the collection search website has gone up 200%, and content requests—which the NFSA access team responds to on a case-by-case basis—have gone up 16%.

(Getting search) back in black

While the previous version of the NFSA site included search, the prior functionality was prone to crashing, and the quality of the results was often poor, Qamar said.

For example, search results were delivered alphabetically rather than based on relevance, and the previous search didn’t support fine-tuning of relevance based on matches in specific fields.

So, as part of its site redesign, the NFSA was looking to add full text search, relevance-based search results, faceting, and pagination. MongoDB Atlas Search—which integrates the database, search engine, and sync mechanism into a single, unified, fully managed platform—ticked all of those boxes.

A search results page on the NFSA website
A search results page on the NFSA website

Indeed, the NFSA compared search results from its old site to its new MongoDB Atlas site and “found that MongoDB Atlas-based searches were more relevant and targeted,” Qamar said.

Previously, configuring site search required manual coding and meant downtime for the site, he noted. “The whole setup wasn’t very developer friendly and, therefore, a barrier to working efficiently with search configuration and fine-tuning,” Qamar said.

In comparison, MongoDB Atlas allowed for simple configuration and fine-tuning of the NFSA's search requirements.

The NFSA has also been using MongoDB Atlas Charts. Charts help the NFSA easily visualize its collection by custom grouping (like production year or genre), as well as helping the NFSA see which items are most popular with users.

“Charts have helped us understand how our collection is growing and evolving over time,” Qamar said.

NFSA’s use of MongoDB Charts

Can’t get you (AI) out of my head

Now, the NFSA—inspired by Qamar’s own training in machine learning and the broad interest in all things AI—is exploring how it can use Atlas Vector Search and generative AI tools to allow users to explore content buried in the NFSA collection.

One example cited is putting transcriptions of audiovisual files in NFSA’s collection into a vector database for retrieval-augmented generation (RAG). The NFSA has approximately 27 years worth—meaning, it would take 27 years to play it all back—of material to transcribe, and is currently developing a model to accurately capture the Australian dialect so the work is transcribed correctly.

Ultimately, the NFSA is interested in building a RAG-powered AI bot to provide historically and contextually accurate information about work in the NFSA’s archive.

The NFSA is also exploring how it can use RAG to deliver accurate, conversation-like search results without training large language models itself, and whether it can leverage AI to help restore some of the older videos in its collection. Qamar and team are also interested in vectorizing audio-visual material for semantic analysis and genre-based classification of collection material at scale, he said.

“Historically, we’ve been very metadata-driven and keyword-driven, and I think that’s a missed opportunity. Because when we talk about what an archive does, we archive stories,” Qamar said of the possibilities offered by vectors.

“An example I use is, what if the world ended tomorrow? And what if aliens came to Earth and only saw our metadata, what image of Australia would they see? Is that a true image of what Australia is really like?” Qamar said.

“How content is described is important, but content’s imagery, the people in it, and the audio and words being spoken are really important. Full-text search can take you somewhere along the way, but vector search allows you to look things up in a semantic manner. So it’s more about ideas and concepts than very specific keywords,” he said.

If you’re interested in learning how MongoDB helps accelerate and simplify time-to-mission for federal, state, and local governments, defense agencies, education, and across the public sector, check out MongoDB for Public Sector.

Check out MongoDB Atlas Vector Search to learn more about how Vector Search helps organizations like the NFSA build applications powered by semantic search and gen AI.

*Note that this story’s subheads come from Australian song titles!