How to Define a Custom Analyzer and Run an Atlas Search Diacritic-Insensitive Query

On this page

Create the Atlas Search Index

Search the Collection

This tutorial describes how to create an index that uses a custom analyzer and run a diacritic-insensitive query against the sample_mflix.movies collection. It takes you through the following steps:

Set up an Atlas Search index on the title and genres fields in the sample_mflix.movies collection.
Run an Atlas Search compound query against the title and genres fields in the sample_mflix.movies collection using the wildcard and text operators.

Before you begin, ensure that your Atlas cluster meets the requirements described in the Prerequisites.

To create an Atlas Search index, you must have Project Data Access Admin or higher access to the project.

Create the Atlas Search Index

In this section, you will create an Atlas Search index on the title and genres fields in the sample_mflix.movies collection.

In Atlas, go to the Clusters page for your project.

If it is not already displayed, select the organization that contains your desired project from the Organizations menu in the navigation bar.
If it is not already displayed, select your desired project from the Projects menu in the navigation bar.
If the Clusters page is not already displayed, click Database in the sidebar.

Go to the Atlas Search page for your cluster.

You can go the Atlas Search page from the sidebar, the Data Explorer, or your cluster details page.

In the sidebar, click Atlas Search under the Services heading.
From the Select data source dropdown, select your cluster and click Go to Atlas Search.

Click the Browse Collections button for your cluster.
Expand the database and select the collection.
Click the Search Indexes tab for the collection.

Click the cluster's name.
Click the Atlas Search tab.

Click Create Search Index.

Select the Atlas Search JSON editor for the Configuration Method and click Next.

Enter the Index Name, and set the Database and Collection.

In the Index Name field, enter diacritic-insensitive-tutorial.
Note
If you name your index default, you don't need to specify an index parameter when using the $search pipeline stage. Otherwise, you must specify the index name using the index parameter.
In the Database and Collection section, find the sample_mflix database, and select the movies collection.

Specify an index definition.

This index definition for the genres and title fields specifies a custom analyzer, diacriticFolder, using the following:

keyword tokenizer that tokenizes the entire input as a single token.
icuFolding token filter that applies character foldings such as accent removal and case folding.

The index definition specifies a string type for the genres and title fields. It also applies the custom analyzer named diacriticFolder on the title field.

Use the Atlas Search Visual Editor or Atlas Search JSON Editor in the Atlas user interface to create the index.

Click Next.
Click Refine Your Index.
In the Custom Analyzers section, click Add Custom Analyzer.
Select the Create Your Own radio button and click Next.
Type diacriticFolder in the Analyzer Name field.
Expand Tokenizer if it's collapsed and select keyword from the dropdown.
Expand Token Filters and click Add token filter.
Select icuFolding from the dropdown and click Add token filter to add the token filter to your custom analyzer.
Click Add to add the custom analyzer to your index.
In the Field Mappings section, click Add Field Mapping to apply the custom analyzer on the title field.
Select title from the Field Name dropdown and String from the Data Type dropdown.
In the properties section for the data type, select diacriticFolder from the Index Analyzer and Search Analyzer dropdowns.
Click Add.
Click Add Field Mapping again to index the genres field.
Select genres from the Field Name dropdown and String from the Data Type dropdown.
Click Add, then Save Changes.

Replace the default definition with the following:

1 {
2   "mappings": {
3     "fields": {
4       "genres": {
5         "type": "string"
6       },
7       "title": {
8         "analyzer": "diacriticFolder",
9         "type": "string"
10       }
11     }
12   },
13   "analyzers": [{
14     "charFilters": [],
15     "name": "diacriticFolder",
16     "tokenizer": {
17       "type": "keyword"
18     },
19     "tokenFilters": [{
20       "type": "icuFolding"
21     }]
22   }]
23 }

Click Next.

Click Create Search Index.

Close the You're All Set! Modal Window.

A modal window appears to let you know your index is building. Click the Close button.

Wait for the index to finish building.

The index should take about one minute to build. While it is building, the Status column reads Build in Progress. When it is finished building, the Status column reads Active.

Search the Collection

➤ Use the Select your language drop-down menu to set the language of the example in this section.

You can use the compound operator to combine two or more operators into a single query. The sample query in this section uses the compound operator to query the title and genres fields in the movies collection using multiple operators.

In this section, connect to your Atlas cluster and run the sample query against the sample_mflix.movies collection using the compound operator.

In Atlas, go to the Clusters page for your project.

If it is not already displayed, select the organization that contains your desired project from the Organizations menu in the navigation bar.
If it is not already displayed, select your desired project from the Projects menu in the navigation bar.
If the Clusters page is not already displayed, click Database in the sidebar.

Go to the Atlas Search page for your cluster.

You can go the Atlas Search page from the sidebar, the Data Explorer, or your cluster details page.

In the sidebar, click Atlas Search under the Services heading.
From the Select data source dropdown, select your cluster and click Go to Atlas Search.

Click the Browse Collections button for your cluster.
Expand the database and select the collection.
Click the Search Indexes tab for the collection.

Click the cluster's name.
Click the Atlas Search tab.

Go to the Search Tester.

Click the Query button to the right of the index to query.

View and edit the query syntax.

Click Edit Query to view a default query syntax sample in JSON format.

Run an Atlas Search diacritic-insensitive query.

This query uses the $search stage to query the collection using the compound operator. The compound operator uses the following clauses:

must clause to search for movie titles that begin with the term alle using the wildcard operator
should clause to specify preference for the Drama genre using the text operator

Copy and paste the following query into the Query Editor, and then click the Search button in the Query Editor.

1 [
2   {
3     "$search" : {
4       "index": "diacritic-insensitive-tutorial",
5       "compound" : {
6         "must": [{
7             "wildcard" : {
8               "query" : "alle*",
9               "path": "title",
10               "allowAnalyzedField": true
11         }
12         }],
13         "should": [{
14           "text": {
15             "query" : "Drama",
16             "path" : "genres"
17           }
18         }]
19       }
20     }
21   }
22 ]

SCORE: 1.2084882259368896  _id:  "573a13a1f29313caabd07bb6"
  plot: "A group of hip retro teenage outsiders become involved in an interscho…"
  genres:
    0: "Drama"
    1: "Family"
    2: "Sport"
  runtime: 103
  title: "Alley Cats Strike"
SCORE: 1.179288625717163  _id:  "573a13b1f29313caabd382a2"
  plot: "Famous pianist Zetterstrèm returns home to his native Denmark, to give…"
  genres:
    0: "Drama"
    1: "Romance"
    2: "Sci-Fi"
  runtime: 88
  title: "Allegro"
SCORE: 1  _id:  "573a1397f29313caabce5f15"
  plot: "An enthusiastic filmmaker thinks he's come up with a totally original …"
  genres:
    0: "Animation"
    1: "Comedy"
    2: "Fantasy"
  runtime: 75
  title: "Allegro non troppo"
SCORE: 1  _id:  "573a13d1f29313caabd8f84b"
  plot: "The eleven year old cycling talent Freddy is the son of a butcher in a…"
  genres:
    0: "Comedy"
  runtime: 100
  title: "Allez, Eddy!"

Expand your query results.

The Search Tester might not display all the fields in the documents it returns. To view all the fields, including the field that you specify in the query path, expand the document in the results.

The first document in the result includes diacritics in the title field because the diacriticFolder custom analyzer we used on the title field applied character folding on its values. Atlas Search returns documents with titles that begin with the query term alle because we used the keyword tokenizer, which tokenizes entire strings (or phrases) as a single token.

Alternatively, you can specify the standard tokenizer instead of the keyword tokenizer in the custom analyzer used on the title field. For the standard tokenizer, the Atlas Search results would contain documents with titles that begin or appear anywhere at the beginning of the word for the query term alle such as "Desde allè". To test this, edit your index definition to replace the keyword tokenizer on line 17 with standard tokenizer, save the index definition, and run the sample query.

Connect to your cluster in `mongosh`.

Open mongosh in a terminal window and connect to your cluster. For detailed instructions on connecting, see Connect via mongosh.

Use the `sample_mflix` database.

Run the following command at mongosh prompt:

use sample_mflix

Run an Atlas Search diacritic-insensitive query.

This query uses the $search stage to query the collection using the compound operator. The compound operator uses the following clauses:

must clause to search for movie titles that begin with the term alle using the wildcard operator
should clause to specify preference for the Drama genre using the text operator

The query uses the $project stage to:

Exclude all fields except title and genres
Add a field named score

1 db.movies.aggregate([
2   {
3     "$search" : {
4       "index": "diacritic-insensitive-tutorial",
5       "compound" : {
6         "must": [{
7             "wildcard" : {
8               "query" : "alle*",
9               "path": "title",
10               "allowAnalyzedField": true
11         }
12         }],
13         "should": [{
14           "text": {
15             "query" : "Drama",
16             "path" : "genres"
17           }
18         }]
19       }
20     }
21   },
22   {
23     "$project" : {
24       "_id" : 0,
25       "title" : 1,
26       "genres" : 1,
27       "score" : { "$meta": "searchScore" }
28     }
29   }
30 ])

{
  genres: [ 'Drama', 'Family', 'Sport' ],
  title: 'Alley Cats Strike',
  score: 1.2084882259368896
},
{
  genres: [ 'Drama', 'Romance', 'Sci-Fi' ],
  title: 'Allegro',
  score: 1.179288625717163
},
{
  genres: [ 'Animation', 'Comedy', 'Fantasy' ],
  title: 'Allegro non troppo',
  score: 1
},
{
  genres: [ 'Comedy' ],
  title: 'Allez, Eddy!',
  score: 1
}

Connect to your cluster in MongoDB Compass.

Open MongoDB Compass and connect to your cluster. For detailed instructions on connecting, see Connect via Compass.

Use the `movies` collection in the `sample_mflix` database.

On the Database screen, click the sample_mflix database, then click the movies collection.

Run an Atlas Search diacritic-insensitive query.

This query uses the following compound operator clauses to query the collection:

must clause to search for movie titles that begin with the term alle using the wildcard operator
should clause to specify preference for the Drama genre using the text operator

The query uses the $project stage to:

Exclude all fields except title and genres
Add a field named score

To run this query in MongoDB Compass:

Click the Aggregations tab.
Click Select..., then configure each of the following pipeline stages by selecting the stage from the dropdown and adding the query for that stage. Click Add Stage to add additional stages.

Pipeline Stage

Query

$search

{
  "index": "diacritic-insensitive-tutorial",
  "compound": {
    "must": [{
      "wildcard": {
        "path": "title",
        "query": "alle*",
        "allowAnalyzedField": true
      }
    }],
    "should": [{
      "text": {
        "query": "Drama",
        "path": "genres"
      }
    }]
  }
}

$project

{
  "_id": 0,
  "title": 1,
  "genres": 1,
  "score": {
    "$meta": "searchScore"
  }
}

If you enabled Auto Preview, MongoDB Compass displays the following documents next to the $project pipeline stage:

{
  genres: [ 'Drama', 'Family', 'Sport' ],
  title: 'Alley Cats Strike',
  score: 1.2084882259368896
},
{
  genres: [ 'Drama', 'Romance', 'Sci-Fi' ],
  title: 'Allegro',
  score: 1.179288625717163
},
{
  genres: [ 'Animation', 'Comedy', 'Fantasy' ],
  title: 'Allegro non troppo',
  score: 1
},
{
  genres: [ 'Comedy' ],
  title: 'Allez, Eddy!',
  score: 1
}

Set up and initialize the .NET/C# project for the query.

Create a new directory called diacritic-insensitive-example and initialize your project with the dotnet new command.
```
mkdir diacritic-insensitive-example
cd diacritic-insensitive-example
dotnet new console
```
Add the .NET/C# Driver to your project as a dependency.
```
dotnet add package MongoDB.Driver
```

Create the query in the `Program.cs` file.

Replace the contents of the Program.cs file with the following code.

The code example performs the following tasks:

Imports mongodb packages and dependencies.
Establishes a connection to your Atlas cluster.
Uses the following compound operator clauses to query the collection:
- must clause to search for movie titles that begin with the term alle using the wildcard operator
- should clause to specify preference for the Drama genre using the text operator
The query uses the $project stage to:
- Exclude all fields except title and genres
- Add a field named score
Iterates over the cursor to print the documents that match the query.

1 using MongoDB.Bson;
2 using MongoDB.Bson.Serialization.Attributes;
3 using MongoDB.Bson.Serialization.Conventions;
4 using MongoDB.Driver;
5 using MongoDB.Driver.Search;
6 
7 public class DiacriticInsensitiveExample
8 {
9     private const string MongoConnectionString = "<connection-string>";
10 
11     public static void Main(string[] args)
12     {
13         // allow automapping of the camelCase database fields to our MovieDocument
14         var camelCaseConvention = new ConventionPack { new CamelCaseElementNameConvention() };
15         ConventionRegistry.Register("CamelCase", camelCaseConvention, type => true);
16 
17         // connect to your Atlas cluster
18         var mongoClient = new MongoClient(MongoConnectionString);
19         var mflixDatabase = mongoClient.GetDatabase("sample_mflix");
20         var moviesCollection = mflixDatabase.GetCollection<MovieDocument>("movies");
21 
22         // define and run pipeline
23         var results = moviesCollection.Aggregate()
24             .Search(Builders<MovieDocument>.Search.Compound()
25                 .Must(Builders<MovieDocument>.Search.Wildcard(movie => movie.Title, "alle*", true))
26                 .Should(Builders<MovieDocument>.Search.Text(movie => movie.Genres, "Drama")),
27              indexName: "diacritic-insensitive-tutorial")
28             .Project<MovieDocument>(Builders<MovieDocument>.Projection
29                 .Include(movie => movie.Title)
30                 .Include(movie => movie.Genres)
31                 .Exclude(movie => movie.Id)
32                 .MetaSearchScore(movie => movie.Score))
33             .ToList();
34 
35         // print results
36         foreach (var movie in results)
37         {
38             Console.WriteLine(movie.ToJson());
39         }
40     }
41 }
42 
43 [BsonIgnoreExtraElements]
44 public class MovieDocument
45 {
46     [BsonIgnoreIfDefault]
47     public ObjectId Id { get; set; }
48     public string [] Genres { get; set; }
49     public string Title { get; set; }
50     public double Score { get; set; }
51 }

Before you run the sample, replace <connection-string> with your Atlas connection string. Ensure that your connection string includes your database user's credentials. To learn more, see Connect via Drivers.

Compile and run the `Program.cs` file.

dotnet run diacritic-insensitive-example.csproj

{ "genres" : ["Drama", "Family", "Sport"], "title" : "Alley Cats Strike", "score" : 1.2084882259368896 }
{ "genres" : ["Drama", "Romance", "Sci-Fi"], "title" : "Allegro", "score" : 1.1792886257171631 }
{ "genres" : ["Animation", "Comedy", "Fantasy"], "title" : "Allegro non troppo", "score" : 1.0 }
{ "genres" : ["Comedy"], "title" : "Allez, Eddy!", "score" : 1.0 }

Run an Atlas Search diacritic-insensitive query.

Create a file named diacritic-insensitive.go.

Copy and paste the following code into the diacritic-insensitive.go file.

The code example performs the following tasks:

Imports mongodb packages and dependencies.
Establishes a connection to your Atlas cluster.
Uses the following compound operator clauses to query the collection:
- must clause to search for movie titles that begin with the term alle using the wildcard operator
- should clause to specify preference for the Drama genre using the text operator
The query uses the $project stage to:
- Exclude all fields except title and genres
- Add a field named score
Iterates over the cursor to print the documents that match the query.

1 package main
2 
3 import (
4 	"context"
5 	"fmt"
6 
7 	"go.mongodb.org/mongo-driver/bson"
8 	"go.mongodb.org/mongo-driver/mongo"
9 	"go.mongodb.org/mongo-driver/mongo/options"
10 )
11 
12 func main() {
13 	// connect to your Atlas cluster
14 	client, err := mongo.Connect(context.TODO(), options.Client().ApplyURI("<connection-string>"))
15 	if err != nil {
16 		panic(err)
17 	}
18 	defer client.Disconnect(context.TODO())
19 
20 	// set namespace
21 	collection := client.Database("sample_mflix").Collection("movies")
22 
23 	// define pipeline stages
24 	searchStage := bson.D{{"$search", bson.M{
25 		"index": "diacritic-insensitive-tutorial",
26 		"compound": bson.M{
27 			"must": bson.M{
28 				"wildcard": bson.M{
29 					"path":               "title",
30 					"query":              "alle*",
31 					"allowAnalyzedField": true,
32 				},
33 			},
34 			"should": bson.D{
35 				{"text", bson.M{
36 					"path":  "genres",
37 					"query": "Drama"}}},
38 		},
39 	}}}
40 	projectStage := bson.D{{"$project", bson.D{{"title", 1}, {"genres", 1}, {"_id", 0}, {"score", bson.D{{"$meta", "searchScore"}}}}}}
41 
42 	// run pipeline
43 	cursor, err := collection.Aggregate(context.TODO(), mongo.Pipeline{searchStage, projectStage})
44 	if err != nil {
45 		panic(err)
46 	}
47 
48 	// print results
49 	var results []bson.D
50 	if err = cursor.All(context.TODO(), &results); err != nil {
51 		panic(err)
52 	}
53 	for _, result := range results {
54 		fmt.Println(result)
55 	}
56 }

Before you run the sample, replace <connection-string> with your Atlas connection string. Ensure that your connection string includes your database user's credentials. To learn more, see Connect via Drivers.

Run the following command to query your collection:

go run diacritic-insensitive.go

[{genres [Drama Family Sport]} {title Alley Cats Strike} {score 1.2084882259368896}]
[{genres [Drama Romance Sci-Fi]} {title Allegro} {score 1.179288625717163}]
[{genres [Animation Comedy Fantasy]} {title Allegro non troppo} {score 1}]
[{genres [Comedy]} {title Allez, Eddy!} {score 1}]

Ensure that your `CLASSPATH` contains the following libraries.

`junit`	4.11 or higher version
`mongodb-driver-sync`	4.3.0 or higher version
`slf4j-log4j12`	1.7.30 or higher version

Run an Atlas Search diacritic-insensitive query.

Create a file named DiacriticInsensitive.java.

Copy and paste the following code into the DiacriticInsensitive.java file.

The code example performs the following tasks:

Imports mongodb packages and dependencies.
Establishes a connection to your Atlas cluster.
Uses the following compound operator clauses to query the collection:
- must clause to search for movie titles that begin with the term alle using the wildcard operator
- should clause to specify preference for the Drama genre using the text operator
The query uses the $project stage to:
- Exclude all fields except title and genres
- Add a field named score
Iterates over the cursor to print the documents that match the query.

1 import static com.mongodb.client.model.Aggregates.project;
2 import static com.mongodb.client.model.Projections.*;
3 import com.mongodb.client.MongoClient;
4 import com.mongodb.client.MongoClients;
5 import com.mongodb.client.MongoCollection;
6 import com.mongodb.client.MongoDatabase;
7 import org.bson.Document;
8 import java.util.Arrays;
9 import java.util.List;
10 
11 public class DiacriticInsensitive {
12     public static void main(String[] args) {
13         // define clauses
14         List<Document> mustClauses =
15             List.of( new Document("wildcard", 
16                 new Document("path", "title")
17                 .append("query", "alle*")
18                 .append("allowAnalyzedField", true)));
19         List<Document> shouldClauses =
20             List.of( new Document("text",
21                 new Document("query", "Drama")
22                 .append("path", "genres")));
23         // define pipeline
24         Document agg = new Document( "$search",
25             new Document("index", "diacritic-insensitive-tutorial")
26             .append("compound",
27                 new Document("must", mustClauses)
28                 .append("should", shouldClauses)));
29 
30         // connect to your Atlas cluster
31         String uri = "<connection-string>";
32 
33         try (MongoClient mongoClient = MongoClients.create(uri)) {            
34             // set namespace
35             MongoDatabase database = mongoClient.getDatabase("sample_mflix");
36             MongoCollection<Document> collection = database.getCollection("movies");
37             
38             // run pipeline and print results
39             collection.aggregate(Arrays.asList(agg,
40                 project(fields(
41                     excludeId(), 
42                     include("title"), 
43                     include("genres"), 
44                     computed("score", new Document("$meta", "searchScore"))))))
45                 .forEach(doc -> System.out.println(doc.toJson()));
46         }
47     }
48 }

Note

To run the sample code in your Maven environment, add the following code above the import statements in your file.

package com.mongodb.drivers;

Before you run the sample, replace <connection-string> with your Atlas connection string. Ensure that your connection string includes your database user's credentials. To learn more, see Connect via Drivers.

Compile and run the DiacriticInsensitive.java file.

javac DiacriticInsensitive.java
java DiacriticInsensitive

{"genres": ["Drama", "Family", "Sport"], "title": "Alley Cats Strike", "score": 1.2084882259368896}
{"genres": ["Drama", "Romance", "Sci-Fi"], "title": "Allegro", "score": 1.179288625717163}
{"genres": ["Animation", "Comedy", "Fantasy"], "title": "Allegro non troppo", "score": 1.0}
{"genres": ["Comedy"], "title": "Allez, Eddy!", "score": 1.0}

Ensure that you add the following dependency to your project.

`mongodb-driver-kotlin-coroutine`	4.10.0 or higher version

Run an Atlas Search diacritic-insensitive query.

Create a file named DiacriticInsensitive.kt.

Copy and paste the following code into the DiacriticInsensitive.kt file.

The code example performs the following tasks:

Imports mongodb packages and dependencies.
Establishes a connection to your Atlas cluster.
Uses the following compound operator clauses to query the collection:
- must clause to search for movie titles that begin with the term alle using the wildcard operator
- should clause to specify preference for the Drama genre using the text operator
The query uses the $project stage to:
- Exclude all fields except title and genres
- Add a field named score
Prints the documents that match the query from the AggregateFlow instance.

1 import com.mongodb.client.model.Aggregates.project
2 import com.mongodb.client.model.Projections.*
3 import com.mongodb.kotlin.client.coroutine.MongoClient
4 import kotlinx.coroutines.runBlocking
5 import org.bson.Document
6 
7 fun main() {
8     // connect to your Atlas cluster
9     val uri = "<connection-string>"
10     val mongoClient = MongoClient.create(uri)
11 
12     // set namespace
13     val database = mongoClient.getDatabase("sample_mflix")
14     val collection = database.getCollection<Document>("movies")
15 
16     runBlocking {
17         // define clauses
18         val mustClauses = listOf(
19             Document(
20                 "wildcard",
21                 Document("path", "title")
22                     .append("query", "alle*")
23                     .append("allowAnalyzedField", true)
24             )
25         )
26 
27         val shouldClauses = listOf(
28             Document(
29                 "text",
30                 Document("query", "Drama")
31                     .append("path", "genres")
32             )
33         )
34 
35         // define pipeline
36         val agg = Document( "\$search",
37             Document("index", "diacritic-insensitive-tutorial")
38                 .append("compound", Document("must", mustClauses)
39                     .append("should", shouldClauses)
40                 )
41         )
42 
43         // run pipeline and print results
44         val resultsFlow = collection.aggregate<Document>(
45             listOf(
46                 agg,
47                 project(fields(
48                     excludeId(),
49                     include("title", "genres"),
50                     computed("score", Document("\$meta", "searchScore"))))
51             )
52         )
53         resultsFlow.collect { println(it) }
54     }
55 
56     mongoClient.close()
57 }

Before you run the sample, replace <connection-string> with your Atlas connection string. Ensure that your connection string includes your database user's credentials. To learn more, see Connect via Drivers.

Run the DiacriticInsensitive.kt file.

When you run the DiacriticInsensitive.kt program in your IDE, it prints the following documents:

Document{{genres=[Drama, Family, Sport], title=Alley Cats Strike, score=1.2084882259368896}}
Document{{genres=[Drama, Romance, Sci-Fi], title=Allegro, score=1.179288625717163}}
Document{{genres=[Animation, Comedy, Fantasy], title=Allegro non troppo, score=1.0}}
Document{{genres=[Comedy], title=Allez, Eddy!, score=1.0}}

Run an Atlas Search diacritic-insensitive query.

Create a file named diacritic-insensitive.js.

Copy and paste the following code into the diacritic-insensitive.js file.

The code example performs the following tasks:

Imports mongodb, MongoDB's Node.js driver.
Creates an instance of the MongoClient class to establish a connection to your Atlas cluster.
Uses the following compound operator clauses to query the collection:
- must clause to search for movie titles that begin with the term alle using the wildcard operator
- should clause to specify preference for the Drama genre using the text operator
The query uses the $project stage to:
- Exclude all fields except title and genres
- Add a field named score
Iterates over the cursor to print the documents that match the query.

1 const { MongoClient } = require("mongodb");
2 
3 // Replace the uri string with your MongoDB deployment's connection string.
4 const uri =
5   "<connection-string>";
6 
7 const client = new MongoClient(uri);
8 
9 async function run() {
10   try {
11     await client.connect();
12 
13     // set namespace
14     const database = client.db("sample_mflix");
15     const coll = database.collection("movies");
16 
17     // define pipeline
18     const agg = [{
19         '$search': {
20           'index': 'diacritic-insensitive-tutorial',
21           'compound': {
22                 'must': [{
23                     'wildcard': {
24                         'query': "alle*",
25                         'path': "title",
26                         'allowAnalyzedField': true
27                     }
28                 }],
29                 'should': [{'text': {'query': 'Drama', 'path': 'genres'}}]
30             }}},
31         { '$project': { '_id': 0, 'title': 1 , 'genres': 1, 'score': {'$meta': 'searchScore'}}}];
32            
33     // run pipeline
34     const result = await coll.aggregate(agg);
35 
36     // print results
37     await result.forEach((doc) => console.log(doc));
38     
39   } finally {
40     await client.close();
41   }
42 }
43 run().catch(console.dir);

Before you run the sample, replace <connection-string> with your Atlas connection string. Ensure that your connection string includes your database user's credentials. To learn more, see Connect via Drivers.

Run the following command to query your collection:

node diacritic-insensitive.js

{
  genres: [ 'Drama', 'Family', 'Sport' ],
  title: 'Alley Cats Strike',
  score: 1.2084882259368896
}
{
  genres: [ 'Drama', 'Romance', 'Sci-Fi' ],
  title: 'Allegro',
  score: 1.179288625717163
}
{
  genres: [ 'Animation', 'Comedy', 'Fantasy' ],
  title: 'Allegro non troppo',
  score: 1
}
{
  genres: [ 'Comedy' ],
  title: 'Allez, Eddy!',
  score: 1
}

Run an Atlas Search diacritic-insensitive query.

Create a file named diacritic-insensitive.py.

Copy and paste the following code into the diacritic-insensitive.py file.

The following code example:

Imports pymongo, MongoDB's Python driver, and the dns module, which is required to connect pymongo to Atlas using a DNS seed list connection string.
Creates an instance of the MongoClient class to establish a connection to your Atlas cluster.
Uses the following compound operator clauses to query the collection:
- must clause to search for movie titles that begin with the term alle using the wildcard operator
- should clause to specify preference for the Drama genre using the text operator
The query uses the $project stage to:
- Exclude all fields except title and genres
- Add a field named score
Iterates over the cursor to print the documents that match the query.

1 import pymongo
2 
3 # connect to your Atlas cluster
4 client = pymongo.MongoClient('<connection-string>')
5 
6 # define pipeline
7 pipeline = [
8   {'$search': {
9       'index': 'diacritic-insensitive-tutorial',
10       'compound': {
11         'must': [{'wildcard': {'path': 'title', 'query': 'alle*', 'allowAnalyzedField': True}}],
12         'should': [{'text': {'query': 'Drama', 'path': 'genres'}}]}}},
13   {'$project': {'_id': 0, 'title': 1, 'genres': 1, 'score': {'$meta': 'searchScore'}}}
14 ]
15 
16 # run pipeline
17 result = client['sample_mflix']['movies'].aggregate(pipeline)
18 
19 # print results
20 for i in result:
21     print(i)

Before you run the sample, replace <connection-string> with your Atlas connection string. Ensure that your connection string includes your database user's credentials. To learn more, see Connect via Drivers.

Run the following command to query your collection:

python diacritic-insensitive.py

{'genres': ['Drama', 'Family', 'Sport'], 'title': 'Alley Cats Strike', 'score': 1.2084882259368896}
{'genres': ['Drama', 'Romance', 'Sci-Fi'], 'title': 'Allegro', 'score': 1.179288625717163}
{'genres': ['Animation', 'Comedy', 'Fantasy'], 'title': 'Allegro non troppo', 'score': 1.0}
{'genres': ['Comedy'], 'title': 'Allez, Eddy!', 'score': 1.0}

Back

All Results

How to Run an

1	{
2	"mappings": {
3	"fields": {
4	"genres": {
5	"type": "string"
6	},
7	"title": {
8	"analyzer": "diacriticFolder",
9	"type": "string"
10	}
11	}
12	},
13	"analyzers": [{
14	"charFilters": [],
15	"name": "diacriticFolder",
16	"tokenizer": {
17	"type": "keyword"
18	},
19	"tokenFilters": [{
20	"type": "icuFolding"
21	}]
22	}]
23	}

1	[
2	{
3	"$search" : {
4	"index": "diacritic-insensitive-tutorial",
5	"compound" : {
6	"must": [{
7	"wildcard" : {
8	"query" : "alle*",
9	"path": "title",
10	"allowAnalyzedField": true
11	}
12	}],
13	"should": [{
14	"text": {
15	"query" : "Drama",
16	"path" : "genres"
17	}
18	}]
19	}
20	}
21	}
22	]

1	db.movies.aggregate([
2	{
3	"$search" : {
4	"index": "diacritic-insensitive-tutorial",
5	"compound" : {
6	"must": [{
7	"wildcard" : {
8	"query" : "alle*",
9	"path": "title",
10	"allowAnalyzedField": true
11	}
12	}],
13	"should": [{
14	"text": {
15	"query" : "Drama",
16	"path" : "genres"
17	}
18	}]
19	}
20	}
21	},
22	{
23	"$project" : {
24	"_id" : 0,
25	"title" : 1,
26	"genres" : 1,
27	"score" : { "$meta": "searchScore" }
28	}
29	}
30	])

1	using MongoDB.Bson;
2	using MongoDB.Bson.Serialization.Attributes;
3	using MongoDB.Bson.Serialization.Conventions;
4	using MongoDB.Driver;
5	using MongoDB.Driver.Search;
6
7	public class DiacriticInsensitiveExample
8	{
9	private const string MongoConnectionString = "<connection-string>";
10
11	public static void Main(string[] args)
12	{
13	// allow automapping of the camelCase database fields to our MovieDocument
14	var camelCaseConvention = new ConventionPack { new CamelCaseElementNameConvention() };
15	ConventionRegistry.Register("CamelCase", camelCaseConvention, type => true);
16
17	// connect to your Atlas cluster
18	var mongoClient = new MongoClient(MongoConnectionString);
19	var mflixDatabase = mongoClient.GetDatabase("sample_mflix");
20	var moviesCollection = mflixDatabase.GetCollection<MovieDocument>("movies");
21
22	// define and run pipeline
23	var results = moviesCollection.Aggregate()
24	.Search(Builders<MovieDocument>.Search.Compound()
25	.Must(Builders<MovieDocument>.Search.Wildcard(movie => movie.Title, "alle*", true))
26	.Should(Builders<MovieDocument>.Search.Text(movie => movie.Genres, "Drama")),
27	indexName: "diacritic-insensitive-tutorial")
28	.Project<MovieDocument>(Builders<MovieDocument>.Projection
29	.Include(movie => movie.Title)
30	.Include(movie => movie.Genres)
31	.Exclude(movie => movie.Id)
32	.MetaSearchScore(movie => movie.Score))
33	.ToList();
34
35	// print results
36	foreach (var movie in results)
37	{
38	Console.WriteLine(movie.ToJson());
39	}
40	}
41	}
42
43	[BsonIgnoreExtraElements]
44	public class MovieDocument
45	{
46	[BsonIgnoreIfDefault]
47	public ObjectId Id { get; set; }
48	public string [] Genres { get; set; }
49	public string Title { get; set; }
50	public double Score { get; set; }
51	}

1	package main
2
3	import (
4	"context"
5	"fmt"
6
7	"go.mongodb.org/mongo-driver/bson"
8	"go.mongodb.org/mongo-driver/mongo"
9	"go.mongodb.org/mongo-driver/mongo/options"
10	)
11
12	func main() {
13	// connect to your Atlas cluster
14	client, err := mongo.Connect(context.TODO(), options.Client().ApplyURI("<connection-string>"))
15	if err != nil {
16	panic(err)
17	}
18	defer client.Disconnect(context.TODO())
19
20	// set namespace
21	collection := client.Database("sample_mflix").Collection("movies")
22
23	// define pipeline stages
24	searchStage := bson.D{{"$search", bson.M{
25	"index": "diacritic-insensitive-tutorial",
26	"compound": bson.M{
27	"must": bson.M{
28	"wildcard": bson.M{
29	"path": "title",
30	"query": "alle*",
31	"allowAnalyzedField": true,
32	},
33	},
34	"should": bson.D{
35	{"text", bson.M{
36	"path": "genres",
37	"query": "Drama"}}},
38	},
39	}}}
40	projectStage := bson.D{{"$project", bson.D{{"title", 1}, {"genres", 1}, {"_id", 0}, {"score", bson.D{{"$meta", "searchScore"}}}}}}
41
42	// run pipeline
43	cursor, err := collection.Aggregate(context.TODO(), mongo.Pipeline{searchStage, projectStage})
44	if err != nil {
45	panic(err)
46	}
47
48	// print results
49	var results []bson.D
50	if err = cursor.All(context.TODO(), &results); err != nil {
51	panic(err)
52	}
53	for _, result := range results {
54	fmt.Println(result)
55	}
56	}

1	import static com.mongodb.client.model.Aggregates.project;
2	import static com.mongodb.client.model.Projections.*;
3	import com.mongodb.client.MongoClient;
4	import com.mongodb.client.MongoClients;
5	import com.mongodb.client.MongoCollection;
6	import com.mongodb.client.MongoDatabase;
7	import org.bson.Document;
8	import java.util.Arrays;
9	import java.util.List;
10
11	public class DiacriticInsensitive {
12	public static void main(String[] args) {
13	// define clauses
14	List<Document> mustClauses =
15	List.of( new Document("wildcard",
16	new Document("path", "title")
17	.append("query", "alle*")
18	.append("allowAnalyzedField", true)));
19	List<Document> shouldClauses =
20	List.of( new Document("text",
21	new Document("query", "Drama")
22	.append("path", "genres")));
23	// define pipeline
24	Document agg = new Document( "$search",
25	new Document("index", "diacritic-insensitive-tutorial")
26	.append("compound",
27	new Document("must", mustClauses)
28	.append("should", shouldClauses)));
29
30	// connect to your Atlas cluster
31	String uri = "<connection-string>";
32
33	try (MongoClient mongoClient = MongoClients.create(uri)) {
34	// set namespace
35	MongoDatabase database = mongoClient.getDatabase("sample_mflix");
36	MongoCollection<Document> collection = database.getCollection("movies");
37
38	// run pipeline and print results
39	collection.aggregate(Arrays.asList(agg,
40	project(fields(
41	excludeId(),
42	include("title"),
43	include("genres"),
44	computed("score", new Document("$meta", "searchScore"))))))
45	.forEach(doc -> System.out.println(doc.toJson()));
46	}
47	}
48	}

1	import com.mongodb.client.model.Aggregates.project
2	import com.mongodb.client.model.Projections.*
3	import com.mongodb.kotlin.client.coroutine.MongoClient
4	import kotlinx.coroutines.runBlocking
5	import org.bson.Document
6
7	fun main() {
8	// connect to your Atlas cluster
9	val uri = "<connection-string>"
10	val mongoClient = MongoClient.create(uri)
11
12	// set namespace
13	val database = mongoClient.getDatabase("sample_mflix")
14	val collection = database.getCollection<Document>("movies")
15
16	runBlocking {
17	// define clauses
18	val mustClauses = listOf(
19	Document(
20	"wildcard",
21	Document("path", "title")
22	.append("query", "alle*")
23	.append("allowAnalyzedField", true)
24	)
25	)
26
27	val shouldClauses = listOf(
28	Document(
29	"text",
30	Document("query", "Drama")
31	.append("path", "genres")
32	)
33	)
34
35	// define pipeline
36	val agg = Document( "\$search",
37	Document("index", "diacritic-insensitive-tutorial")
38	.append("compound", Document("must", mustClauses)
39	.append("should", shouldClauses)
40	)
41	)
42
43	// run pipeline and print results
44	val resultsFlow = collection.aggregate<Document>(
45	listOf(
46	agg,
47	project(fields(
48	excludeId(),
49	include("title", "genres"),
50	computed("score", Document("\$meta", "searchScore"))))
51	)
52	)
53	resultsFlow.collect { println(it) }
54	}
55
56	mongoClient.close()
57	}

1	const { MongoClient } = require("mongodb");
2
3	// Replace the uri string with your MongoDB deployment's connection string.
4	const uri =
5	"<connection-string>";
6
7	const client = new MongoClient(uri);
8
9	async function run() {
10	try {
11	await client.connect();
12
13	// set namespace
14	const database = client.db("sample_mflix");
15	const coll = database.collection("movies");
16
17	// define pipeline
18	const agg = [{
19	'$search': {
20	'index': 'diacritic-insensitive-tutorial',
21	'compound': {
22	'must': [{
23	'wildcard': {
24	'query': "alle*",
25	'path': "title",
26	'allowAnalyzedField': true
27	}
28	}],
29	'should': [{'text': {'query': 'Drama', 'path': 'genres'}}]
30	}}},
31	{ '$project': { '_id': 0, 'title': 1 , 'genres': 1, 'score': {'$meta': 'searchScore'}}}];
32
33	// run pipeline
34	const result = await coll.aggregate(agg);
35
36	// print results
37	await result.forEach((doc) => console.log(doc));
38
39	} finally {
40	await client.close();
41	}
42	}
43	run().catch(console.dir);

1	import pymongo
2
3	# connect to your Atlas cluster
4	client = pymongo.MongoClient('<connection-string>')
5
6	# define pipeline
7	pipeline = [
8	{'$search': {
9	'index': 'diacritic-insensitive-tutorial',
10	'compound': {
11	'must': [{'wildcard': {'path': 'title', 'query': 'alle*', 'allowAnalyzedField': True}}],
12	'should': [{'text': {'query': 'Drama', 'path': 'genres'}}]}}},
13	{'$project': {'_id': 0, 'title': 1, 'genres': 1, 'score': {'$meta': 'searchScore'}}}
14	]
15
16	# run pipeline
17	result = client['sample_mflix']['movies'].aggregate(pipeline)
18
19	# print results
20	for i in result:
21	print(i)

Create the Atlas Search Index

In Atlas, go to the .css-h15tq0{font-style:normal;font-weight:700;}Clusters page for your project.

Go to the Atlas Search page for your cluster.

Click Create Search Index.

Select the Atlas Search JSON editor for the Configuration Method and click Next.

Enter the Index Name, and set the Database and Collection.

Note

Specify an index definition.

Click Create Search Index.

Close the You're All Set! Modal Window.

Wait for the index to finish building.

Search the Collection

In Atlas, go to the Clusters page for your project.

Go to the Atlas Search page for your cluster.

Go to the Search Tester.

View and edit the query syntax.

Run an Atlas Search diacritic-insensitive query.

Expand your query results.

Connect to your cluster in mongosh.

Use the sample_mflix database.

Run an Atlas Search diacritic-insensitive query.

Connect to your cluster in MongoDB Compass.

Use the movies collection in the sample_mflix database.

Run an Atlas Search diacritic-insensitive query.

Set up and initialize the .NET/C# project for the query.

Create the query in the Program.cs file.

Compile and run the Program.cs file.

Run an Atlas Search diacritic-insensitive query.

Ensure that your CLASSPATH contains the following libraries.

Run an Atlas Search diacritic-insensitive query.

Note

Ensure that you add the following dependency to your project.

Run an Atlas Search diacritic-insensitive query.

Run an Atlas Search diacritic-insensitive query.

Run an Atlas Search diacritic-insensitive query.

In Atlas, go to the Clusters page for your project.

Connect to your cluster in `mongosh`.

Use the `sample_mflix` database.

Use the `movies` collection in the `sample_mflix` database.

Create the query in the `Program.cs` file.

Compile and run the `Program.cs` file.

Ensure that your `CLASSPATH` contains the following libraries.