Atlas Search Overview

On this page

Atlas Search Fundamentals

Indexing
Tokenization
Querying
Scoring
Atlas Search Availability
Atlas Search Architecture
Stored Source Architecture
Search Nodes Architecture
Atlas Search Indexes
Atlas Search Queries
Search Nodes Cost
Next Steps

MongoDB's Atlas Search allows fine-grained text indexing and querying of data on your Atlas cluster. It enables advanced search functionality for your applications without any additional management or separate search system alongside your database. Atlas Search provides options for several kinds of text analyzers, a rich query language that uses Atlas Search aggregation pipeline stages like $search and $searchMeta in conjunction with other MongoDB aggregation pipeline stages, and score-based results ranking.

Tip

Quickly try Atlas Search without needing an Atlas account, cluster, or collection, with the Atlas Search Playground. To learn more, see the documentation.

Atlas Search Fundamentals

The following concepts form the basis of Atlas Search and are essential to optimize your application.

Indexing

In the context of search, an index is a data structure that categorizes data in an easily searchable format. Search indexes enable faster retrieval of documents that contain a given term without having to scan the entire collection. While both Atlas Search indexes and MongoDB Indexes make data retrieval faster, note that they are not the same. Like the index in the back of a book, a search index is a mapping between terms and the documents that contain those terms. Search indexes also contain other relevant metadata, such as the positions of terms in documents.

Creating at least one search index is usually required in any search application. For more information, see Atlas Search Indexes.

Tokenization

When creating a search index, data must first be transformed into a sequence of tokens or terms. An analyzer facilitates this process through steps including:

Tokenization: Breaking up words in a string into indexable tokens. For example, dividing a sentence by whitespace and punctuation.
Normalization: Organizing data so that it is consistently represented and easier to analyze. For example, transforming text to lower case or removing unwanted words called stop words.
Stemming: Reducing words to their root form. For example, ignoring suffixes, prefixes, and plural word forms.

The specifics of tokenization are language-specific and can require making additional choices. Which analyzer to use depends on your data and application. For more information, see Process Data with Analyzers.

Querying

Search queries consult the index to return a set of results. Search queries are different than traditional database queries, as they are intended to meet more general information needs. Where a database query must follow a strict syntax, search queries can be for simple text matching, but can also look for similar phrases, number or date ranges, or use regular expressions or wildcards.

For more information, see Atlas Search Queries.

Scoring

Each document receives a relevancy score that enables query results to be returned in order from the highest relevance to the lowest. In the simplest form of scoring, documents score higher if the query term appears frequently in a document and lower if the query term appears across many documents in the collection. Scoring can also be customized. Tailoring search to a specific domain often means customizing the relevance-based default score by boosting, decaying, or modifying it in other ways.

For more information, see Score Documents.

Atlas Search Availability

Atlas Search is available on Atlas instances running MongoDB 4.2 or higher versions only. For certain features, Atlas Search might require a specific version of MongoDB. The following table lists the Atlas Search features that require specific MongoDB versions.

Atlas Search Feature	MongoDB Version for Feature
Facets	5.0.4+, 6.0+, 7.0+
Facets on Sharded Clusters	6.0+, 7.0+
Stored Source Fields	5.0.6+, 6.0+, 7.0+
Query Analytics	5.0+
$lookup with $search	6.0+, 7.0+
$unionWith with $search	6.0+, 7.0+
Sort	5.0+, 6.0+, 7.0+
Sort on Sharded Clusters	6.0+, 7.0+
Dedicated Search Nodes	6.0+, 7.0+
Manage search indexes programmatically with mongosh and MongoDB Drivers	6.0+, 7.0+
Atlas Search local deployment with the Atlas CLI	6.0+, 7.0+
`$search` `searchAfter` and `searchBefore` options	6.0.13+, 7.0.5+

Atlas Search is not supported for time series collections.

Atlas Search Architecture

The Atlas Search mongot process uses Apache Lucene and runs alongside mongod on each node in the Atlas cluster. The mongot process:

Creates Atlas Search indexes based on the rules in the index definition for the collection.
Monitors change streams for the current state of the documents and index changes for the collections for which you defined Atlas Search indexes.
Processes Atlas Search queries and returns matching documents.

Stored Source Architecture

If you define stored source fields in your Atlas Search index, the mongot process stores the specified fields and, for matching documents, returns the stored fields directly from mongot instead of doing a full document lookup on the database if you specify the returnStoredSource Option in your query.

Search Nodes Architecture

For dedicated (M10 or higher) sharded and unsharded Atlas clusters on any cloud provider, you can deploy separate Search Nodes that run only the mongot process for workload isolation. Atlas deploys Search Nodes with each cluster or with each shard on the cluster. For example, if you deploy 2 Search Nodes for a cluster with 3 shards, Atlas deploys 6 Search Nodes, 2 per shard.

Deploying separate Search Nodes provides the following benefits:

Scales storage independent of MongoDB cluster.
Scales query load independent of MongoDB.

When you deploy separate Search Nodes, the mongot processes run on separate Search Nodes that you can configure independently.

Atlas Search separate Search Nodes architecture

You can configure Search Nodes to run the mongot process separately from the database nodes that run the mongod process on your Atlas cluster. You can also configure the number of Search Nodes and the amount of resources provisioned for each search node.

You can't deploy Search Nodes separately for serverless clusters. To learn more about deploying Search Nodes separately, see Search Nodes for Workload Isolation. To deploy the Search Nodes from the UI or API, see Create a Cluster.

When you deploy separate Search Nodes, Atlas automatically assigns a mongod for each mongot. The mongot communicates with the mongod to listen for and sync index changes for the indexes that it stores.

If you delete all the Search Nodes on your cluster, there will be a brief interruption in processing your search query results. To learn more, see Modify a Cluster. If you delete your Atlas cluster, Atlas pauses and then deletes all associated Atlas Search deployments (mongot processes).

Note

The local SSDs used for Search Nodes require a 20% storage overhead to support index operations.

Atlas Search Indexes

An Atlas Search index is a data structure that categorizes data in an easily searchable format. It is a mapping between terms and the documents that contain those terms. Atlas Search indexes enable faster retrieval of documents using certain identifiers. You must configure an Atlas Search index to query data in your Atlas cluster using Atlas Search.

You can create an Atlas Search index on a single field or on multiple fields. We recommend that you index the fields that you regularly use to sort or filter your data in order to quickly retrieve the documents that contain the relevant data at query-time.

You can specify the fields to index using the following methods:

Dynamic mappings, which enables Atlas Search to automatically index all the fields of supported types in each document. This takes disk space and might negatively impact cluster performance.
Static mappings, which allows you to selectively identify the fields to index. If fields contain polymorphic data, Atlas Search indexes only documents that correspond to the mapping in the index definition and ignores documents that contain values that don't correspond to the mapping specified in the index definition for the fields.

Although the data stored on Atlas Search isn't an identical copy of data from the collection on your Atlas cluster, Atlas Search indexes still take some disk space and memory. If you enable the store option for fields that contain string values or if you configure the stored source fields in your index, Atlas Search stores an identical copy of the specified fields on disk, which can take disk space.

Atlas Search provides built-in analyzers for creating indexable terms that correct for differences in punctuation, capitalization, stop words, and more. Analyzers apply parsing and language rules to the query. You can also create a custom analyzer using available built-in character filters, tokenizers, and token filters. To learn more about the built-in and custom analyzers, see Process Data with Analyzers.

To learn more about Atlas Search support for other data types, see Data Types. The mongot process stores the indexed fields and the _id field on disk per index for the collections on the cluster.

If you change an existing index, Atlas Search rebuilds the index without downtime. This allows you to continue using the old index for existing and new queries until the index rebuild is complete. If you deployed separate Search Nodes, Atlas Search also rebuilds indexes for the following events:

Add Search Nodes
Scale Search Nodes
Internal mongot changes that require an index resync (such as some Atlas Search features that require an index update)

If you deployed separate Search Nodes, Atlas automatically deploys additional Search Nodes for the duration of the index rebuild to keep your old index up to date and available for queries while the new index builds.

If you make changes to the collection for which you defined Atlas Search indexes, the latest data might not be available immediately for queries. However, mongot monitors the change streams, which allows it to update stored copies of data, and Atlas Search indexes are eventually consistent.

Note

For Dedicated Search Nodes

Adding and adjusting shards triggers a rebuild of the Atlas Search index. During this index rebuild, the index might not have the most current data. Therefore, queries against data on the those shards might fail or return incorrect results.

If you reshard a collection with Atlas Search indexes, Atlas Search indexes on the collection become unavailable when the resharding operation completes. You must rebuild your Atlas Search indexes once the resharding operation completes.

Note

Atlas Search doesn't support encrypting Atlas Search indexes with encryption keys using Customer Key Management in the Atlas UI.

Tip

Atlas Search Queries

Atlas Search queries take the form of an aggregation pipeline stage. Atlas Search provides $search and $searchMeta stages, both of which must be the first stage in the query pipeline. These stages can be used in conjunction with other aggregation pipeline stages in your query pipeline. To learn more about these pipeline stages, see Choose the Aggregation Pipeline Stage.

Atlas Search also provides query operators and collectors that you can use inside the $search and $searchMeta aggregation pipeline stages. The Atlas Search operators allow you to locate and retrieve matching data from the collection on your Atlas cluster. The collector returns a document representing the search metadata results.

You can use Atlas Search operators to query terms, phrases, geographic shapes and points, numeric values, similar documents, synonymous terms, and more. You can also search using regex and wildcard expressions. The Atlas Search compound operator allows you to combine multiple operators inside your $search stage to perform a complex search and filter of data based on what must, must not, or should be present in the documents returned by Atlas Search. You can use the compound operator to also match or filter documents in the $search stage itself. Running $match after $search is less performant than running $search with the compound operator.

To learn more about the syntax, options, and usage of the Atlas Search operators, see Use Operators and Collectors in Atlas Search Queries.

When you run a query, Atlas Search uses the configured read preference to identify the node on which to run the query. The query first goes to the MongoDB process, which is mongod for a replica set cluster or mongos for a sharded cluster. For sharded clusters, your cluster data is partitioned across mongod instances and each mongot knows about the data on the mongod on the same node only. Therefore, you can't run queries that target a particular shard. mongos directs the queries to all shards, making these scatter gather queries. If you use zones to distribute a sharded collection over a subset of the shards in the cluster, Atlas Search routes the query to the zone that contains the shards for the collection that you are querying and runs your $search queries on just the shards where the collection is located.

The MongoDB process routes the query to the mongot on the same node. Atlas Search performs the search and scoring and returns the document IDs and other search metadata for the matching results to mongod. The mongod then performs a full document lookup implicitly for the matching results and returns the results to the client.

Note

For Dedicated Search Nodes

When you run a query, the query first goes to the MongoDB process based on the configured read preference. The mongod process routes the search query through a load balancer on the same node, which distributes the requests across all of the mongot processes. The Atlas Search mongot process performs the search and scoring and returns the matching results to mongod, which mongod then returns to the client. If you use the $search concurrent option in your query, Atlas Search enables intra-query parallelism. To learn more, see Parallelize Query Execution Across Segments.

Atlas Search associates a relevance-based score with every document in the result set. The relevance-based scoring allows Atlas Search to return documents in the order from the highest score to the lowest. Atlas Search scores documents higher if the query term appears frequently in a document and lower if the query term appears across many documents in the collection. Atlas Search also supports customizing the relevance-based default score by boosting, decaying, or other modifying options. To learn more about customizing the resulting scores, see Score the Documents in the Results.

Tip

Search Nodes Cost

MongoDB supports separate Search Nodes on dedicated (M10 or higher) clusters. Search Nodes are deployed on compute-intensive NVMe instances. You must deploy a minimum of two nodes. You will be billed daily for hourly resource usage per node. To learn more, see Search Node Costs.

Next Steps

For a hands-on experience creating Atlas Search indexes and running Atlas Search queries against the sample datasets, try the Atlas Search Course on MongoDB University and the tutorials in the following pages:

Prefer to learn by watching?

Watch an overview of Atlas and Atlas Search and get started setting up Atlas Search for your data. The video demonstrates how to load sample data on your cluster, create an Atlas Search index, and run a sample query using Search Tester and Data Explorer.

Duration: 10 Minutes

Back

Atlas Search

Search Playground

Atlas Search Overview

Tip