Docs Menu
Docs Home
/
MongoDB Atlas
/ /

How to Define a Custom Analyzer and Run an Atlas Search Diacritic-Insensitive Query

On this page

  • Create the Atlas Search Index
  • Search the Collection

This tutorial describes how to create an index that uses a custom analyzer and run a diacritic-insensitive query against the sample_mflix.movies collection. It takes you through the following steps:

  1. Set up an Atlas Search index on the title and genres fields in the sample_mflix.movies collection.

  2. Run an Atlas Search compound query against the title and genres fields in the sample_mflix.movies collection using the wildcard and text operators.

Before you begin, ensure that your Atlas cluster meets the requirements described in the Prerequisites.

To create an Atlas Search index, you must have Project Data Access Admin or higher access to the project.

In this section, you will create an Atlas Search index on the title and genres fields in the sample_mflix.movies collection.

1
  1. If it is not already displayed, select the organization that contains your desired project from the Organizations menu in the navigation bar.

  2. If it is not already displayed, select your desired project from the Projects menu in the navigation bar.

  3. If the Clusters page is not already displayed, click Database in the sidebar.

2

You can go the Atlas Search page from the sidebar, the Data Explorer, or your cluster details page.

  1. In the sidebar, click Atlas Search under the Services heading.

  2. From the Select data source dropdown, select your cluster and click Go to Atlas Search.

  1. Click the Browse Collections button for your cluster.

  2. Expand the database and select the collection.

  3. Click the Search Indexes tab for the collection.

  1. Click the cluster's name.

  2. Click the Atlas Search tab.

3
4
5
  1. In the Index Name field, enter diacritic-insensitive-tutorial.

    Note

    If you name your index default, you don't need to specify an index parameter when using the $search pipeline stage. Otherwise, you must specify the index name using the index parameter.

  2. In the Database and Collection section, find the sample_mflix database, and select the movies collection.

6

This index definition for the genres and title fields specifies a custom analyzer, diacriticFolder, using the following:

  • keyword tokenizer that tokenizes the entire input as a single token.

  • icuFolding token filter that applies character foldings such as accent removal and case folding.

The index definition specifies a string type for the genres and title fields. It also applies the custom analyzer named diacriticFolder on the title field.

Use the Atlas Search Visual Editor or Atlas Search JSON Editor in the Atlas user interface to create the index.

  1. Click Next.

  2. Click Refine Your Index.

  3. In the Custom Analyzers section, click Add Custom Analyzer.

  4. Select the Create Your Own radio button and click Next.

  5. Type diacriticFolder in the Analyzer Name field.

  6. Expand Tokenizer if it's collapsed and select keyword from the dropdown.

  7. Expand Token Filters and click Add token filter.

  8. Select icuFolding from the dropdown and click Add token filter to add the token filter to your custom analyzer.

  9. Click Add to add the custom analyzer to your index.

  10. In the Field Mappings section, click Add Field Mapping to apply the custom analyzer on the title field.

  11. Select title from the Field Name dropdown and String from the Data Type dropdown.

  12. In the properties section for the data type, select diacriticFolder from the Index Analyzer and Search Analyzer dropdowns.

  13. Click Add.

  14. Click Add Field Mapping again to index the genres field.

  15. Select genres from the Field Name dropdown and String from the Data Type dropdown.

  16. Click Add, then Save Changes.

  1. Replace the default definition with the following:

1{
2 "mappings": {
3 "fields": {
4 "genres": {
5 "type": "string"
6 },
7 "title": {
8 "analyzer": "diacriticFolder",
9 "type": "string"
10 }
11 }
12 },
13 "analyzers": [{
14 "charFilters": [],
15 "name": "diacriticFolder",
16 "tokenizer": {
17 "type": "keyword"
18 },
19 "tokenFilters": [{
20 "type": "icuFolding"
21 }]
22 }]
23}
  1. Click Next.

7
8

A modal window appears to let you know your index is building. Click the Close button.

9

The index should take about one minute to build. While it is building, the Status column reads Build in Progress. When it is finished building, the Status column reads Active.


Use the Select your language drop-down menu to set the language of the example in this section.


You can use the compound operator to combine two or more operators into a single query. The sample query in this section uses the compound operator to query the title and genres fields in the movies collection using multiple operators.

In this section, connect to your Atlas cluster and run the sample query against the sample_mflix.movies collection using the compound operator.

1
  1. If it is not already displayed, select the organization that contains your desired project from the Organizations menu in the navigation bar.

  2. If it is not already displayed, select your desired project from the Projects menu in the navigation bar.

  3. If the Clusters page is not already displayed, click Database in the sidebar.

2

You can go the Atlas Search page from the sidebar, the Data Explorer, or your cluster details page.

  1. In the sidebar, click Atlas Search under the Services heading.

  2. From the Select data source dropdown, select your cluster and click Go to Atlas Search.

  1. Click the Browse Collections button for your cluster.

  2. Expand the database and select the collection.

  3. Click the Search Indexes tab for the collection.

  1. Click the cluster's name.

  2. Click the Atlas Search tab.

3

Click the Query button to the right of the index to query.

4

Click Edit Query to view a default query syntax sample in JSON format.

5

This query uses the $search stage to query the collection using the compound operator. The compound operator uses the following clauses:

  • must clause to search for movie titles that begin with the term alle using the wildcard operator

  • should clause to specify preference for the Drama genre using the text operator

Copy and paste the following query into the Query Editor, and then click the Search button in the Query Editor.

1[
2 {
3 "$search" : {
4 "index": "diacritic-insensitive-tutorial",
5 "compound" : {
6 "must": [{
7 "wildcard" : {
8 "query" : "alle*",
9 "path": "title",
10 "allowAnalyzedField": true
11 }
12 }],
13 "should": [{
14 "text": {
15 "query" : "Drama",
16 "path" : "genres"
17 }
18 }]
19 }
20 }
21 }
22]
SCORE: 1.2084882259368896 _id: "573a13a1f29313caabd07bb6"
plot: "A group of hip retro teenage outsiders become involved in an interscho…"
genres:
0: "Drama"
1: "Family"
2: "Sport"
runtime: 103
title: "Alley Cats Strike"
SCORE: 1.179288625717163 _id: "573a13b1f29313caabd382a2"
plot: "Famous pianist Zetterstrèm returns home to his native Denmark, to give…"
genres:
0: "Drama"
1: "Romance"
2: "Sci-Fi"
runtime: 88
title: "Allegro"
SCORE: 1 _id: "573a1397f29313caabce5f15"
plot: "An enthusiastic filmmaker thinks he's come up with a totally original …"
genres:
0: "Animation"
1: "Comedy"
2: "Fantasy"
runtime: 75
title: "Allegro non troppo"
SCORE: 1 _id: "573a13d1f29313caabd8f84b"
plot: "The eleven year old cycling talent Freddy is the son of a butcher in a…"
genres:
0: "Comedy"
runtime: 100
title: "Allez, Eddy!"
6

The Search Tester might not display all the fields in the documents it returns. To view all the fields, including the field that you specify in the query path, expand the document in the results.

The first document in the result includes diacritics in the title field because the diacriticFolder custom analyzer we used on the title field applied character folding on its values. Atlas Search returns documents with titles that begin with the query term alle because we used the keyword tokenizer, which tokenizes entire strings (or phrases) as a single token.

Alternatively, you can specify the standard tokenizer instead of the keyword tokenizer in the custom analyzer used on the title field. For the standard tokenizer, the Atlas Search results would contain documents with titles that begin or appear anywhere at the beginning of the word for the query term alle such as "Desde allè". To test this, edit your index definition to replace the keyword tokenizer on line 17 with standard tokenizer, save the index definition, and run the sample query.

1

Open mongosh in a terminal window and connect to your cluster. For detailed instructions on connecting, see Connect via mongosh.

2

Run the following command at mongosh prompt:

use sample_mflix
3

This query uses the $search stage to query the collection using the compound operator. The compound operator uses the following clauses:

  • must clause to search for movie titles that begin with the term alle using the wildcard operator

  • should clause to specify preference for the Drama genre using the text operator

The query uses the $project stage to:

  • Exclude all fields except title and genres

  • Add a field named score

1db.movies.aggregate([
2 {
3 "$search" : {
4 "index": "diacritic-insensitive-tutorial",
5 "compound" : {
6 "must": [{
7 "wildcard" : {
8 "query" : "alle*",
9 "path": "title",
10 "allowAnalyzedField": true
11 }
12 }],
13 "should": [{
14 "text": {
15 "query" : "Drama",
16 "path" : "genres"
17 }
18 }]
19 }
20 }
21 },
22 {
23 "$project" : {
24 "_id" : 0,
25 "title" : 1,
26 "genres" : 1,
27 "score" : { "$meta": "searchScore" }
28 }
29 }
30])
{
genres: [ 'Drama', 'Family', 'Sport' ],
title: 'Alley Cats Strike',
score: 1.2084882259368896
},
{
genres: [ 'Drama', 'Romance', 'Sci-Fi' ],
title: 'Allegro',
score: 1.179288625717163
},
{
genres: [ 'Animation', 'Comedy', 'Fantasy' ],
title: 'Allegro non troppo',
score: 1
},
{
genres: [ 'Comedy' ],
title: 'Allez, Eddy!',
score: 1
}

The first document in the result includes diacritics in the title field because the diacriticFolder custom analyzer we used on the title field applied character folding on its values. Atlas Search returns documents with titles that begin with the query term alle because we used the keyword tokenizer, which tokenizes entire strings (or phrases) as a single token.

Alternatively, you can specify the standard tokenizer instead of the keyword tokenizer in the custom analyzer used on the title field. For the standard tokenizer, the Atlas Search results would contain documents with titles that begin or appear anywhere at the beginning of the word for the query term alle such as "Desde allè". To test this, edit your index definition to replace the keyword tokenizer on line 17 with standard tokenizer, save the index definition, and run the sample query.

1

Open MongoDB Compass and connect to your cluster. For detailed instructions on connecting, see Connect via Compass.

2

On the Database screen, click the sample_mflix database, then click the movies collection.

3

This query uses the following compound operator clauses to query the collection:

  • must clause to search for movie titles that begin with the term alle using the wildcard operator

  • should clause to specify preference for the Drama genre using the text operator

The query uses the $project stage to:

  • Exclude all fields except title and genres

  • Add a field named score

To run this query in MongoDB Compass:

  1. Click the Aggregations tab.

  2. Click Select..., then configure each of the following pipeline stages by selecting the stage from the dropdown and adding the query for that stage. Click Add Stage to add additional stages.

Pipeline Stage
Query
$search
{
"index": "diacritic-insensitive-tutorial",
"compound": {
"must": [{
"wildcard": {
"path": "title",
"query": "alle*",
"allowAnalyzedField": true
}
}],
"should": [{
"text": {
"query": "Drama",
"path": "genres"
}
}]
}
}
$project
{
"_id": 0,
"title": 1,
"genres": 1,
"score": {
"$meta": "searchScore"
}
}

If you enabled Auto Preview, MongoDB Compass displays the following documents next to the $project pipeline stage:

{
genres: [ 'Drama', 'Family', 'Sport' ],
title: 'Alley Cats Strike',
score: 1.2084882259368896
},
{
genres: [ 'Drama', 'Romance', 'Sci-Fi' ],
title: 'Allegro',
score: 1.179288625717163
},
{
genres: [ 'Animation', 'Comedy', 'Fantasy' ],
title: 'Allegro non troppo',
score: 1
},
{
genres: [ 'Comedy' ],
title: 'Allez, Eddy!',
score: 1
}

The first document in the result includes diacritics in the title field because the diacriticFolder custom analyzer we used on the title field applied character folding on its values. Atlas Search returns documents with titles that begin with the query term alle because we used the keyword tokenizer, which tokenizes entire strings (or phrases) as a single token.

Alternatively, you can specify the standard tokenizer instead of the keyword tokenizer in the custom analyzer used on the title field. For the standard tokenizer, the Atlas Search results would contain documents with titles that begin or appear anywhere at the beginning of the word for the query term alle such as "Desde allè". To test this, edit your index definition to replace the keyword tokenizer on line 17 with standard tokenizer, save the index definition, and run the sample query.

1
  1. Create a new directory called diacritic-insensitive-example and initialize your project with the dotnet new command.

    mkdir diacritic-insensitive-example
    cd diacritic-insensitive-example
    dotnet new console
  2. Add the .NET/C# Driver to your project as a dependency.

    dotnet add package MongoDB.Driver
2
  1. Replace the contents of the Program.cs file with the following code.

    The code example performs the following tasks:

    • Imports mongodb packages and dependencies.

    • Establishes a connection to your Atlas cluster.

    • Uses the following compound operator clauses to query the collection:

      • must clause to search for movie titles that begin with the term alle using the wildcard operator

      • should clause to specify preference for the Drama genre using the text operator

      The query uses the $project stage to:

      • Exclude all fields except title and genres

      • Add a field named score

    • Iterates over the cursor to print the documents that match the query.

    1using MongoDB.Bson;
    2using MongoDB.Bson.Serialization.Attributes;
    3using MongoDB.Bson.Serialization.Conventions;
    4using MongoDB.Driver;
    5using MongoDB.Driver.Search;
    6
    7public class DiacriticInsensitiveExample
    8{
    9 private const string MongoConnectionString = "<connection-string>";
    10
    11 public static void Main(string[] args)
    12 {
    13 // allow automapping of the camelCase database fields to our MovieDocument
    14 var camelCaseConvention = new ConventionPack { new CamelCaseElementNameConvention() };
    15 ConventionRegistry.Register("CamelCase", camelCaseConvention, type => true);
    16
    17 // connect to your Atlas cluster
    18 var mongoClient = new MongoClient(MongoConnectionString);
    19 var mflixDatabase = mongoClient.GetDatabase("sample_mflix");
    20 var moviesCollection = mflixDatabase.GetCollection<MovieDocument>("movies");
    21
    22 // define and run pipeline
    23 var results = moviesCollection.Aggregate()
    24 .Search(Builders<MovieDocument>.Search.Compound()
    25 .Must(Builders<MovieDocument>.Search.Wildcard(movie => movie.Title, "alle*", true))
    26 .Should(Builders<MovieDocument>.Search.Text(movie => movie.Genres, "Drama")),
    27 indexName: "diacritic-insensitive-tutorial")
    28 .Project<MovieDocument>(Builders<MovieDocument>.Projection
    29 .Include(movie => movie.Title)
    30 .Include(movie => movie.Genres)
    31 .Exclude(movie => movie.Id)
    32 .MetaSearchScore(movie => movie.Score))
    33 .ToList();
    34
    35 // print results
    36 foreach (var movie in results)
    37 {
    38 Console.WriteLine(movie.ToJson());
    39 }
    40 }
    41}
    42
    43[BsonIgnoreExtraElements]
    44public class MovieDocument
    45{
    46 [BsonIgnoreIfDefault]
    47 public ObjectId Id { get; set; }
    48 public string [] Genres { get; set; }
    49 public string Title { get; set; }
    50 public double Score { get; set; }
    51}
  2. Before you run the sample, replace <connection-string> with your Atlas connection string. Ensure that your connection string includes your database user's credentials. To learn more, see Connect via Drivers.

3
dotnet run diacritic-insensitive-example.csproj
{ "genres" : ["Drama", "Family", "Sport"], "title" : "Alley Cats Strike", "score" : 1.2084882259368896 }
{ "genres" : ["Drama", "Romance", "Sci-Fi"], "title" : "Allegro", "score" : 1.1792886257171631 }
{ "genres" : ["Animation", "Comedy", "Fantasy"], "title" : "Allegro non troppo", "score" : 1.0 }
{ "genres" : ["Comedy"], "title" : "Allez, Eddy!", "score" : 1.0 }

The first document in the result includes diacritics in the title field because the diacriticFolder custom analyzer we used on the title field applied character folding on its values. Atlas Search returns documents with titles that begin with the query term alle because we used the keyword tokenizer, which tokenizes entire strings (or phrases) as a single token.

Alternatively, you can specify the standard tokenizer instead of the keyword tokenizer in the custom analyzer used on the title field. For the standard tokenizer, the Atlas Search results would contain documents with titles that begin or appear anywhere at the beginning of the word for the query term alle such as "Desde allè". To test this, edit your index definition to replace the keyword tokenizer on line 17 with standard tokenizer, save the index definition, and run the sample query.

1
  1. Create a file named diacritic-insensitive.go.

  2. Copy and paste the following code into the diacritic-insensitive.go file.

    The code example performs the following tasks:

    • Imports mongodb packages and dependencies.

    • Establishes a connection to your Atlas cluster.

    • Uses the following compound operator clauses to query the collection:

      • must clause to search for movie titles that begin with the term alle using the wildcard operator

      • should clause to specify preference for the Drama genre using the text operator

      The query uses the $project stage to:

      • Exclude all fields except title and genres

      • Add a field named score

    • Iterates over the cursor to print the documents that match the query.

    1package main
    2
    3import (
    4 "context"
    5 "fmt"
    6
    7 "go.mongodb.org/mongo-driver/bson"
    8 "go.mongodb.org/mongo-driver/mongo"
    9 "go.mongodb.org/mongo-driver/mongo/options"
    10)
    11
    12func main() {
    13 // connect to your Atlas cluster
    14 client, err := mongo.Connect(context.TODO(), options.Client().ApplyURI("<connection-string>"))
    15 if err != nil {
    16 panic(err)
    17 }
    18 defer client.Disconnect(context.TODO())
    19
    20 // set namespace
    21 collection := client.Database("sample_mflix").Collection("movies")
    22
    23 // define pipeline stages
    24 searchStage := bson.D{{"$search", bson.M{
    25 "index": "diacritic-insensitive-tutorial",
    26 "compound": bson.M{
    27 "must": bson.M{
    28 "wildcard": bson.M{
    29 "path": "title",
    30 "query": "alle*",
    31 "allowAnalyzedField": true,
    32 },
    33 },
    34 "should": bson.D{
    35 {"text", bson.M{
    36 "path": "genres",
    37 "query": "Drama"}}},
    38 },
    39 }}}
    40 projectStage := bson.D{{"$project", bson.D{{"title", 1}, {"genres", 1}, {"_id", 0}, {"score", bson.D{{"$meta", "searchScore"}}}}}}
    41
    42 // run pipeline
    43 cursor, err := collection.Aggregate(context.TODO(), mongo.Pipeline{searchStage, projectStage})
    44 if err != nil {
    45 panic(err)
    46 }
    47
    48 // print results
    49 var results []bson.D
    50 if err = cursor.All(context.TODO(), &results); err != nil {
    51 panic(err)
    52 }
    53 for _, result := range results {
    54 fmt.Println(result)
    55 }
    56}
  3. Before you run the sample, replace <connection-string> with your Atlas connection string. Ensure that your connection string includes your database user's credentials. To learn more, see Connect via Drivers.

  4. Run the following command to query your collection:

    go run diacritic-insensitive.go
    [{genres [Drama Family Sport]} {title Alley Cats Strike} {score 1.2084882259368896}]
    [{genres [Drama Romance Sci-Fi]} {title Allegro} {score 1.179288625717163}]
    [{genres [Animation Comedy Fantasy]} {title Allegro non troppo} {score 1}]
    [{genres [Comedy]} {title Allez, Eddy!} {score 1}]

The first document in the result includes diacritics in the title field because the diacriticFolder custom analyzer we used on the title field applied character folding on its values. Atlas Search returns documents with titles that begin with the query term alle because we used the keyword tokenizer, which tokenizes entire strings (or phrases) as a single token.

Alternatively, you can specify the standard tokenizer instead of the keyword tokenizer in the custom analyzer used on the title field. For the standard tokenizer, the Atlas Search results would contain documents with titles that begin or appear anywhere at the beginning of the word for the query term alle such as "Desde allè". To test this, edit your index definition to replace the keyword tokenizer on line 17 with standard tokenizer, save the index definition, and run the sample query.

1
junit
4.11 or higher version
mongodb-driver-sync
4.3.0 or higher version
slf4j-log4j12
1.7.30 or higher version
2
  1. Create a file named DiacriticInsensitive.java.

  2. Copy and paste the following code into the DiacriticInsensitive.java file.

    The code example performs the following tasks:

    • Imports mongodb packages and dependencies.

    • Establishes a connection to your Atlas cluster.

    • Uses the following compound operator clauses to query the collection:

      • must clause to search for movie titles that begin with the term alle using the wildcard operator

      • should clause to specify preference for the Drama genre using the text operator

      The query uses the $project stage to:

      • Exclude all fields except title and genres

      • Add a field named score

    • Iterates over the cursor to print the documents that match the query.

    1import static com.mongodb.client.model.Aggregates.project;
    2import static com.mongodb.client.model.Projections.*;
    3import com.mongodb.client.MongoClient;
    4import com.mongodb.client.MongoClients;
    5import com.mongodb.client.MongoCollection;
    6import com.mongodb.client.MongoDatabase;
    7import org.bson.Document;
    8import java.util.Arrays;
    9import java.util.List;
    10
    11public class DiacriticInsensitive {
    12 public static void main(String[] args) {
    13 // define clauses
    14 List<Document> mustClauses =
    15 List.of( new Document("wildcard",
    16 new Document("path", "title")
    17 .append("query", "alle*")
    18 .append("allowAnalyzedField", true)));
    19 List<Document> shouldClauses =
    20 List.of( new Document("text",
    21 new Document("query", "Drama")
    22 .append("path", "genres")));
    23 // define pipeline
    24 Document agg = new Document( "$search",
    25 new Document("index", "diacritic-insensitive-tutorial")
    26 .append("compound",
    27 new Document("must", mustClauses)
    28 .append("should", shouldClauses)));
    29
    30 // connect to your Atlas cluster
    31 String uri = "<connection-string>";
    32
    33 try (MongoClient mongoClient = MongoClients.create(uri)) {
    34 // set namespace
    35 MongoDatabase database = mongoClient.getDatabase("sample_mflix");
    36 MongoCollection<Document> collection = database.getCollection("movies");
    37
    38 // run pipeline and print results
    39 collection.aggregate(Arrays.asList(agg,
    40 project(fields(
    41 excludeId(),
    42 include("title"),
    43 include("genres"),
    44 computed("score", new Document("$meta", "searchScore"))))))
    45 .forEach(doc -> System.out.println(doc.toJson()));
    46 }
    47 }
    48}

    Note

    To run the sample code in your Maven environment, add the following code above the import statements in your file.

    package com.mongodb.drivers;
  3. Before you run the sample, replace <connection-string> with your Atlas connection string. Ensure that your connection string includes your database user's credentials. To learn more, see Connect via Drivers.

  4. Compile and run the DiacriticInsensitive.java file.

    javac DiacriticInsensitive.java
    java DiacriticInsensitive
    {"genres": ["Drama", "Family", "Sport"], "title": "Alley Cats Strike", "score": 1.2084882259368896}
    {"genres": ["Drama", "Romance", "Sci-Fi"], "title": "Allegro", "score": 1.179288625717163}
    {"genres": ["Animation", "Comedy", "Fantasy"], "title": "Allegro non troppo", "score": 1.0}
    {"genres": ["Comedy"], "title": "Allez, Eddy!", "score": 1.0}

The first document in the result includes diacritics in the title field because the diacriticFolder custom analyzer we used on the title field applied character folding on its values. Atlas Search returns documents with titles that begin with the query term alle because we used the keyword tokenizer, which tokenizes entire strings (or phrases) as a single token.

Alternatively, you can specify the standard tokenizer instead of the keyword tokenizer in the custom analyzer used on the title field. For the standard tokenizer, the Atlas Search results would contain documents with titles that begin or appear anywhere at the beginning of the word for the query term alle such as "Desde allè". To test this, edit your index definition to replace the keyword tokenizer on line 17 with standard tokenizer, save the index definition, and run the sample query.

1
mongodb-driver-kotlin-coroutine
4.10.0 or higher version
2
  1. Create a file named DiacriticInsensitive.kt.

  2. Copy and paste the following code into the DiacriticInsensitive.kt file.

    The code example performs the following tasks:

    • Imports mongodb packages and dependencies.

    • Establishes a connection to your Atlas cluster.

    • Uses the following compound operator clauses to query the collection:

      • must clause to search for movie titles that begin with the term alle using the wildcard operator

      • should clause to specify preference for the Drama genre using the text operator

      The query uses the $project stage to:

      • Exclude all fields except title and genres

      • Add a field named score

    • Prints the documents that match the query from the AggregateFlow instance.

    1import com.mongodb.client.model.Aggregates.project
    2import com.mongodb.client.model.Projections.*
    3import com.mongodb.kotlin.client.coroutine.MongoClient
    4import kotlinx.coroutines.runBlocking
    5import org.bson.Document
    6
    7fun main() {
    8 // connect to your Atlas cluster
    9 val uri = "<connection-string>"
    10 val mongoClient = MongoClient.create(uri)
    11
    12 // set namespace
    13 val database = mongoClient.getDatabase("sample_mflix")
    14 val collection = database.getCollection<Document>("movies")
    15
    16 runBlocking {
    17 // define clauses
    18 val mustClauses = listOf(
    19 Document(
    20 "wildcard",
    21 Document("path", "title")
    22 .append("query", "alle*")
    23 .append("allowAnalyzedField", true)
    24 )
    25 )
    26
    27 val shouldClauses = listOf(
    28 Document(
    29 "text",
    30 Document("query", "Drama")
    31 .append("path", "genres")
    32 )
    33 )
    34
    35 // define pipeline
    36 val agg = Document( "\$search",
    37 Document("index", "diacritic-insensitive-tutorial")
    38 .append("compound", Document("must", mustClauses)
    39 .append("should", shouldClauses)
    40 )
    41 )
    42
    43 // run pipeline and print results
    44 val resultsFlow = collection.aggregate<Document>(
    45 listOf(
    46 agg,
    47 project(fields(
    48 excludeId(),
    49 include("title", "genres"),
    50 computed("score", Document("\$meta", "searchScore"))))
    51 )
    52 )
    53 resultsFlow.collect { println(it) }
    54 }
    55
    56 mongoClient.close()
    57}
  3. Before you run the sample, replace <connection-string> with your Atlas connection string. Ensure that your connection string includes your database user's credentials. To learn more, see Connect via Drivers.

  4. Run the DiacriticInsensitive.kt file.

    When you run the DiacriticInsensitive.kt program in your IDE, it prints the following documents:

    Document{{genres=[Drama, Family, Sport], title=Alley Cats Strike, score=1.2084882259368896}}
    Document{{genres=[Drama, Romance, Sci-Fi], title=Allegro, score=1.179288625717163}}
    Document{{genres=[Animation, Comedy, Fantasy], title=Allegro non troppo, score=1.0}}
    Document{{genres=[Comedy], title=Allez, Eddy!, score=1.0}}

The first document in the result includes diacritics in the title field because the diacriticFolder custom analyzer we used on the title field applied character folding on its values. Atlas Search returns documents with titles that begin with the query term alle because we used the keyword tokenizer, which tokenizes entire strings (or phrases) as a single token.

Alternatively, you can specify the standard tokenizer instead of the keyword tokenizer in the custom analyzer used on the title field. For the standard tokenizer, the Atlas Search results would contain documents with titles that begin or appear anywhere at the beginning of the word for the query term alle such as "Desde allè". To test this, edit your index definition to replace the keyword tokenizer on line 17 with standard tokenizer, save the index definition, and run the sample query.

1
  1. Create a file named diacritic-insensitive.js.

  2. Copy and paste the following code into the diacritic-insensitive.js file.

    The code example performs the following tasks:

    • Imports mongodb, MongoDB's Node.js driver.

    • Creates an instance of the MongoClient class to establish a connection to your Atlas cluster.

    • Uses the following compound operator clauses to query the collection:

      • must clause to search for movie titles that begin with the term alle using the wildcard operator

      • should clause to specify preference for the Drama genre using the text operator

      The query uses the $project stage to:

      • Exclude all fields except title and genres

      • Add a field named score

    • Iterates over the cursor to print the documents that match the query.

    1const { MongoClient } = require("mongodb");
    2
    3// Replace the uri string with your MongoDB deployment's connection string.
    4const uri =
    5 "<connection-string>";
    6
    7const client = new MongoClient(uri);
    8
    9async function run() {
    10 try {
    11 await client.connect();
    12
    13 // set namespace
    14 const database = client.db("sample_mflix");
    15 const coll = database.collection("movies");
    16
    17 // define pipeline
    18 const agg = [{
    19 '$search': {
    20 'index': 'diacritic-insensitive-tutorial',
    21 'compound': {
    22 'must': [{
    23 'wildcard': {
    24 'query': "alle*",
    25 'path': "title",
    26 'allowAnalyzedField': true
    27 }
    28 }],
    29 'should': [{'text': {'query': 'Drama', 'path': 'genres'}}]
    30 }}},
    31 { '$project': { '_id': 0, 'title': 1 , 'genres': 1, 'score': {'$meta': 'searchScore'}}}];
    32
    33 // run pipeline
    34 const result = await coll.aggregate(agg);
    35
    36 // print results
    37 await result.forEach((doc) => console.log(doc));
    38
    39 } finally {
    40 await client.close();
    41 }
    42}
    43run().catch(console.dir);
  3. Before you run the sample, replace <connection-string> with your Atlas connection string. Ensure that your connection string includes your database user's credentials. To learn more, see Connect via Drivers.

  4. Run the following command to query your collection:

    node diacritic-insensitive.js
    {
    genres: [ 'Drama', 'Family', 'Sport' ],
    title: 'Alley Cats Strike',
    score: 1.2084882259368896
    }
    {
    genres: [ 'Drama', 'Romance', 'Sci-Fi' ],
    title: 'Allegro',
    score: 1.179288625717163
    }
    {
    genres: [ 'Animation', 'Comedy', 'Fantasy' ],
    title: 'Allegro non troppo',
    score: 1
    }
    {
    genres: [ 'Comedy' ],
    title: 'Allez, Eddy!',
    score: 1
    }

The first document in the result includes diacritics in the title field because the diacriticFolder custom analyzer we used on the title field applied character folding on its values. Atlas Search returns documents with titles that begin with the query term alle because we used the keyword tokenizer, which tokenizes entire strings (or phrases) as a single token.

Alternatively, you can specify the standard tokenizer instead of the keyword tokenizer in the custom analyzer used on the title field. For the standard tokenizer, the Atlas Search results would contain documents with titles that begin or appear anywhere at the beginning of the word for the query term alle such as "Desde allè". To test this, edit your index definition to replace the keyword tokenizer on line 17 with standard tokenizer, save the index definition, and run the sample query.

1
  1. Create a file named diacritic-insensitive.py.

  2. Copy and paste the following code into the diacritic-insensitive.py file.

    The following code example:

    • Imports pymongo, MongoDB's Python driver, and the dns module, which is required to connect pymongo to Atlas using a DNS seed list connection string.

    • Creates an instance of the MongoClient class to establish a connection to your Atlas cluster.

    • Uses the following compound operator clauses to query the collection:

      • must clause to search for movie titles that begin with the term alle using the wildcard operator

      • should clause to specify preference for the Drama genre using the text operator

      The query uses the $project stage to:

      • Exclude all fields except title and genres

      • Add a field named score

    • Iterates over the cursor to print the documents that match the query.

    1import pymongo
    2
    3# connect to your Atlas cluster
    4client = pymongo.MongoClient('<connection-string>')
    5
    6# define pipeline
    7pipeline = [
    8 {'$search': {
    9 'index': 'diacritic-insensitive-tutorial',
    10 'compound': {
    11 'must': [{'wildcard': {'path': 'title', 'query': 'alle*', 'allowAnalyzedField': True}}],
    12 'should': [{'text': {'query': 'Drama', 'path': 'genres'}}]}}},
    13 {'$project': {'_id': 0, 'title': 1, 'genres': 1, 'score': {'$meta': 'searchScore'}}}
    14]
    15
    16# run pipeline
    17result = client['sample_mflix']['movies'].aggregate(pipeline)
    18
    19# print results
    20for i in result:
    21 print(i)
  3. Before you run the sample, replace <connection-string> with your Atlas connection string. Ensure that your connection string includes your database user's credentials. To learn more, see Connect via Drivers.

  4. Run the following command to query your collection:

    python diacritic-insensitive.py
    {'genres': ['Drama', 'Family', 'Sport'], 'title': 'Alley Cats Strike', 'score': 1.2084882259368896}
    {'genres': ['Drama', 'Romance', 'Sci-Fi'], 'title': 'Allegro', 'score': 1.179288625717163}
    {'genres': ['Animation', 'Comedy', 'Fantasy'], 'title': 'Allegro non troppo', 'score': 1.0}
    {'genres': ['Comedy'], 'title': 'Allez, Eddy!', 'score': 1.0}

The first document in the result includes diacritics in the title field because the diacriticFolder custom analyzer we used on the title field applied character folding on its values. Atlas Search returns documents with titles that begin with the query term alle because we used the keyword tokenizer, which tokenizes entire strings (or phrases) as a single token.

Alternatively, you can specify the standard tokenizer instead of the keyword tokenizer in the custom analyzer used on the title field. For the standard tokenizer, the Atlas Search results would contain documents with titles that begin or appear anywhere at the beginning of the word for the query term alle such as "Desde allè". To test this, edit your index definition to replace the keyword tokenizer on line 17 with standard tokenizer, save the index definition, and run the sample query.

Back

All Results

Next

How to Run an