Configure Online Archive
On this page
Important
Feature unavailable in Serverless Instances
Serverless instances don't support this feature at this time. To learn more, see Serverless Instance Limitations.
Overview
You can configure data in a collection to be archived by specifying an archiving rule. The archiving rule for a:
Time series collection is a combination of a time that is used to determine when to archive data and a numeric value representing the number of days that the Atlas cluster stores the data.
Standard collection can be one of the following:
A combination of a date that is used to determine when to archive data and a numeric value representing the number of days that the Atlas cluster stores the data.
A custom query that is used to select the documents to archive.
To configure your Atlas cluster for online archive:
Create an archiving rule by providing the collection namespace and the criteria for selecting data to archive in the collection.
(Optional) Specify commonly queried fields to partition archived data.
When you configure an Online Archive on your cluster, Atlas creates 2 federated database instances on your cluster for your archive only and for your cluster and archive.
Required Access
To create an Online Archive, you must have Project Data Access Admin
access or higher to the project.
To watch for an archive to be available, you must have Project Read Only
access or higher to the project.
Configure Online Archive Through the Atlas CLI
Note
Online archive doesn't archive data below the size of 5 MiB after 7 days. For 7 days immediately after Atlas creates an archive, Atlas archives all data. After 7 days, Atlas archives data only when your data size reaches 5 MiB.
To create an online archive for a cluster using the Atlas CLI, run the following command:
atlas clusters onlineArchives create [options]
To watch for a specific online archive to become available using the Atlas CLI, run the following command:
atlas clusters onlineArchives watch <archiveId> [options]
To learn more about the syntax and parameters for the previous commands, see the Atlas CLI documentation for atlas clusters onlineArchives create and atlas clusters onlineArchives watch.
Tip
See: Related Links
Configure Online Archive Through the API
Note
Online archive doesn't archive data below the size of 5 MiB after 7 days. For 7 days immediately after Atlas creates an archive, Atlas archives all data. After 7 days, Atlas archives data only when your data size reaches 5 MiB.
To configure an online archive from the API, send a POST
request to
the onlineArchives endpoint.
Note
If you use the DATE
criteria, you must specify the date
field
as part of the partition keys.
If the cluster already has an Active
online archive with the same
archiving rule for the same database and collection, the operation will
fail. However, if the existing online archive is in Paused
or
Deleted
state, the new online archive is created and its status is
set to Active
. To learn more about the syntax and options, see
API.
Configure Online Archive Through the User Interface
Note
Online archive doesn't archive data below the size of 5 MiB after 7 days. For 7 days immediately after Atlas creates an archive, Atlas archives all data. After 7 days, Atlas archives data only when your data size reaches 5 MiB.
To configure an online archive, in your Atlas UI:
In Atlas, go to the Clusters page for your project.
If it is not already displayed, select the organization that contains your desired project from the Organizations menu in the navigation bar.
If it is not already displayed, select your desired project from the Projects menu in the navigation bar.
If the Clusters page is not already displayed, click Database in the sidebar.
Create an Archiving Rule by providing the following information.
Specify the collection namespace, which includes the database name, the dot (
.
) separator, and the collection name (that is,<database>.<collection>
), in the Namespace field.You can't modify the namespace once the online archive is created.
Select the cloud provider region where you want to store your archived data.
Tip
We recommend that you select the same region as your cluster if possible because you might incur higher data transfer cost if you choose a different region.
Atlas displays the cloud provider regions based on the cloud provider where your cluster is deployed. For multi-cloud clusters, Atlas displays the cloud provider regions of the highest priority provider. Atlas displays a next to the region that closely or exactly matches the region where your cluster is deployed.
For Atlas clusters deployed on AWS, you can select one of the following regions:
Data Federation RegionsAWS RegionsVirginia, USAus-east-1Oregon, USAus-west-2Sao Paulo, Brazilsa-east-1Irelandeu-west-1London, Englandeu-west-2Frankfurt, Germanyeu-central-1Tokyo, Japanap-northeast-1Mumbai, Indiaap-south-1Singaporeap-southeast-1Sydney, Australiaap-southeast-2Montreal, Canadaca-central-1For Atlas clusters deployed on Azure, you can select an Azure region only if there are no other Online Archives on the cluster that are using AWS. If an existing Online Archive on the cluster uses AWS for storing archived data, you can only select AWS regions for any new Online Archives on that cluster.
Note
For a cluster deployed on Azure, if you have existing Online Archives that use AWS and you delete them, you must wait five days before you can create a new Online Archive that uses Azure. Within this five-day period, any attempts to create a new Online Archive still default to AWS.
For Atlas clusters deployed on Azure, you can select one of the following regions:
Data Federation RegionsAzure RegionsVirginia, USAUS_EAST_2
NetherlandsEUROPE_WEST
For Atlas clusters deployed on Google Cloud, you can select any supported AWS region.
Data Federation RegionsAWS RegionsVirginia, USAus-east-1Oregon, USAus-west-2Sao Paulo, Brazilsa-east-1Irelandeu-west-1London, Englandeu-west-2Frankfurt, Germanyeu-central-1Tokyo, Japanap-northeast-1Mumbai, Indiaap-south-1Singaporeap-southeast-1Sydney, Australiaap-southeast-2Montreal, Canadaca-central-1Note
Once Atlas creates the online archive, you can't modify the storage region.
Specify the criteria for selecting documents to archive for the type of collection you want to archive.
For a standard collection, specify the criteria for selecting documents to archive under the Date Match or Custom Criteria tab in the Atlas User Interface.
To select documents from the collection using a combination of a date field and number of days:
Specify an already indexed date field from the documents in the collection. To specify a nested field, use the dot notation.
Specify the number of days to keep the data in the Atlas cluster.
Choose the date format of the specified date field.
If you choose any of the following formats, the value of specified date field must be the BSON type
long
:EPOCH_SECONDS
EPOCH_MILLIS
EPOCH_NANOSECONDS
Important
You can't modify the date field once the online archive is created.
To select documents from the collection using a custom filter, specify a valid JSON filter to run. Atlas uses the specified custom filter with the db.collection.find(filter) command. You can't use the empty document argument (
{}
) to return all documents. You can use MongoDB Atlas operators such as $expr to take advantage of all of the aggregation operators as shown in the following examples.Note
The following examples assume that all documents include
bucket_end_date
fields with datetime values. In the following examples, Atlas archives all documents that don't include abucket_end_date
field and all documents where thebucket_end_date
is not a datetime value.Example
In this custom filter example, when the current date exceeds the date in the
bucket_end_date
field in the documents, Atlas subtracts thirty days (specified in milliseconds) from the current date and then archives data after that many days, hours, and minutes.{ "$expr": { "$lte": [ "$bucket_end_date", { "$subtract": [ "$$NOW", 2592000000 ] } ] } } In this custom filter example, when the current date exceeds the date inside an
objectId
, Atlas subtracts thirty days (specified in milliseconds) from the current date and then archives data after that many days, hours, and minutes.{ "$expr": { "$lte": [ {"$toDate": "$_id"}, { "$subtract": [ "$$NOW", 2592000000 ] } ] } } If you use $expr in the custom filter, sometimes the Atlas cluster might be unable to use an index for archiving data.
Note
$NOW
is only supported on Atlas clusters running MongoDB 4.2 or later.Important
Online Archive for timeseries collection is available as a Preview. The feature and corresponding documentation might change at any time in the Preview stage.
Online Archive for timeseries collection isn't available for Atlas clusters deployed on Azure and Google Cloud.
To archive documents in a time series collection, select the This is a Time-Series Collection checkbox and specify the following:
Name of the field which contains the date in each time series document. This must correspond to the
timeField
in the time series collection. To specify a nested field, use the dot notation. You can't modify the time field once the online archive is created.Number of days to keep the data in the Atlas cluster.
Date format of the specified date field. The date field value must be in ISODate format.
Note
Atlas runs an index sufficiency query to determine the efficiency of the archival process. If the number of documents scanned to the number of documents returned is 10 or more, the query result triggers an
Index Sufficiency Warning
. This warning indicates that you have insufficient indexes for an efficient archival process. For date-based archives, you must index the date field. For custom criteria that use an expression, Atlas might first convert a value before it evaluates it against the query.
Specify how many days you want to store data in the online archive and a time window when you want Atlas to run the archiving job.
(Optional) Specify a Deletion Age Limit.
By default, Atlas doesn't delete archived data. However, if you specify the Deletion Age Limit, you can specify between
7
to9125
days (25 years) to keep archived data. Atlas deletes archived data after the number of days you specify here. This data expiration rule takes effect24
hours after you set the Deletion Age Limit.Warning
Once Atlas deletes the data, you can't recover the data.
(Optional) Specify a Schedule Archiving Window.
By default, Atlas periodically runs a query to archive data. However, you can toggle the Schedule Archiving Window to explicitly schedule the time window during which you want Atlas to archive data. You can specify the following:
Frequency. You can choose to run the job every day, on a specific day of the week, or on a specific date every month. If you wish to schedule the data archiving job on the 29th, 30th, or 31st of every month, Atlas doesn't run the archiving job for months without these dates (for example, February).
Time window, in hours. Select the period of time during which you want Atlas to run the data archiving job. You must specify a minimum of two hours. If a running job doesn't complete during the specified time window, Atlas continues to run the job until it completes.
Specify the two most frequently queried fields in your collection to create partitions in your online archive.
Note
Archive must have at least one partition field.
Enter up to two most commonly queried fields from the collection in the Second most commonly queried field and Third most commonly queried field fields respectively. To specify nested fields, use the dot notation. Do not include quotes (""
) around nested fields that you specify using dot notation.
Warning
You can't specify field names that contain periods (.
) for
partitioning.
The specified fields are used to partition your archived data. Partitions are similar to folders. The date field is in the first position of the partition by default for the Date Match criteria. You can move another field to the first position of the partition if you frequently query by that field.
The order of fields listed in the path is important in the same way as it is in Compound Indexes. Data in the specified path is partitioned first by the value of the first field, and then by the value of the next field, and so on. Atlas supports queries on the specified fields using the partitions.
For example, suppose you are configuring the online archive for the movies
collection in the sample_mflix
database. If your archived field is the released
date field, which you moved to the third position, your first queried field is title
, and your second queried field is plot
, your partition will look similar to the following:
/title/plot/released
Atlas creates partitions first for the title
field, followed by the plot
field, and then the released
field. Atlas uses the partitions for queries on the following fields:
the
title
field,the
title
field and theplot
field,the
title
field and theplot
field and thereleased
field.
Atlas can also use the partitions to support a query on the title
and released
fields. However, in this case, Atlas would not be as efficient in supporting the query as it would be if the query were on the title
and plot
fields only. Partitions are parsed in order; if a query omits a particular partition, Atlas is less efficient in making use of any partitions that follow that. Since a query on title
and released
omits plot
, Atlas uses the title
partition more efficiently than the released
partition to support this query.
Atlas can't use the partitioning strategy to efficiently support queries on fields not specified here. Also, Atlas can't use the partitions to support queries that include the following fields without the title
field:
the
plot
field,the
released
field, orthe
plot
andreleased
fields.
Enter up to two most commonly queried fields in the
documents in the Most commonly queried field
and Second most commonly queried field fields
respectively. To specify nested fields, use the
dot notation. Do
not include quotes (""
) around nested fields that you
specify using dot notation.
The specified fields are used to partition your archived data. Partitions are similar to folders. The order of fields listed in the path is important in the same way as it is in Compound Indexes. Data in the specified path is partitioned first by the value of the first field, and then by the value of the next field. Atlas supports queries on the specified fields using the partitions.
For example, suppose you are configuring the online
archive for the movies
collection in the
sample_mflix
database. If your most queried field is
the genres
field and your second queried field is
title
, your partition will look similar to the
following:
/genres/title
Atlas creates partitions first for the genres
field, followed by the title
field. Atlas uses
the partitions for queries on the following fields:
the
genres
field,the
genres
field and thetitle
field.
Atlas can also use the partitions to support a query
on the title
field only. However, in this case,
Atlas wouldn't be as efficient in supporting the
query as it would be if the query were on the genres
field only or the genres
and title
fields.
Partitions are parsed in order; if a query omits a
particular partition, Atlas is less efficient in
making use of any partitions that follow that. Since a
query on title
omits genres
, Atlas doesn't
use the genres
partition to support this query.
Also, Atlas is less efficient in using the partitions
to support a query on the title
field followed by the
genres
field.
Atlas can't use the partitions to support queries on fields not specified here.
Enter up to two most commonly queried fields from the collection in the Second most commonly queried field and Third most commonly queried field fields respectively. To specify nested fields, use the dot notation. Do not include quotes (""
) around nested fields that you specify using dot notation.
Warning
You can't specify field names that contain periods (.
) for
partitioning.
The specified fields are used to partition your archived data. Partitions are similar to folders. The date field is in the first position of the partition by default for the Date Match criteria. You can move another field to the first position of the partition if you frequently query by that field.
The order of fields listed in the path is important in the same way as it is in Compound Indexes. Data in the specified path is partitioned first by the value of the first field, and then by the value of the next field, and so on. Atlas supports queries on the specified fields using the partitions.
For example, suppose you are configuring the online archive for the movies
collection in the sample_mflix
database. If your archived field is the released
date field, which you moved to the third position, your first queried field is title
, and your second queried field is plot
, your partition will look similar to the following:
/title/plot/released
Atlas creates partitions first for the title
field, followed by the plot
field, and then the released
field. Atlas uses the partitions for queries on the following fields:
the
title
field,the
title
field and theplot
field,the
title
field and theplot
field and thereleased
field.
Atlas can also use the partitions to support a query on the title
and released
fields. However, in this case, Atlas would not be as efficient in supporting the query as it would be if the query were on the title
and plot
fields only. Partitions are parsed in order; if a query omits a particular partition, Atlas is less efficient in making use of any partitions that follow that. Since a query on title
and released
omits plot
, Atlas uses the title
partition more efficiently than the released
partition to support this query.
Atlas can't use the partitioning strategy to efficiently support queries on fields not specified here. Also, Atlas can't use the partitions to support queries that include the following fields without the title
field:
the
plot
field,the
released
field, orthe
plot
andreleased
fields.
Choose fields that contain only characters supported on AWS. To learn more about the characters to avoid, see Creating object key names. Atlas skips and doesn't archive documents that contain unsupported characters.
Choose fields that do not contain polymorphic data. Atlas determines the data type of a partition field by sampling 10 documents from the collection. Atlas will not archive a document if the specified field value in a document does not match values in other documents in the same collection.
Choose fields that you query frequently and order them from the most frequently queried in the first position to the least queried field in the last position. For example, if you frequently query on the date field, then leave the date field in the first position. But if you frequently query on another field, then that field should be in the first position.
Note
For Online Archives created before June 2023, MongoDB doesn't
recommend string
type fields with high cardinality as a query
field for Online Archives. For fields of type string
with
high cardinality, Atlas creates a large number of partitions.
This doesn't apply to Online Archives created after June 2023.
To learn more, read the MongoDB blog post.
Atlas supports the following partition attribute types:
date
double
int
long
objectId
string
boolean
To learn more about the supported partition attribute types, see Partition Attribute Types.
Note
While partitions improve query performance, queries that don't contain these fields require a full collection scan of all archived documents, which will take longer and increase your costs. To learn more about how partitions improve your query performance in Atlas Data Federation, see Data Structure in S3.
Click Next to review and confirm the online archive settings. You can review the following archiving rule settings:
The name of the database and collection
The name of the cloud provider and the cloud provider region
The name of the date field (for Date Match only)
The number of days to keep data on the Atlas cluster (for Date Match only)
The number of days after which to delete archived data
The frequency and time window for archiving data
The custom query to use to identify data to archive (for Custom Criteria only)
The partition fields
Click Back to edit these settings if needed.
Copy and run the displayed query in your mongosh
shell to see the documents that match the criteria in the rule you defined in step 5.
You can run explain on
the query to check whether it uses an index. Proceed to the next step
to create the index if the fields are not indexed. If the fields are
already indexed, skip to step 11.
(Optional) Copy and run the displayed query in your mongosh
to create the required index. This ensures that your data is indexed for optimal performance.
Verify and confirm your archiving rule.
Click Begin Archiving in the Confirm an online archive tab.
Click Confirm in the Begin Archiving window.
Note
Once your document is queued for archiving, you can no longer edit the document. See Restore Archived Data to move archived data back into the live Atlas cluster.
Limitations
You can create up to 50 online archives per cluster and up to 20 can be active per cluster. The following limitations apply:
You can configure multiple online archives in the same namespace, but only one can be active at any given time.
You can't create multiple online archives on the same fields in the same collection.
You can't access your online archive during the following scenarios:
A full outage of the primary region of your cluster.
An outage of AWS S3 where your archived data is stored.
You can't use an archiving rule for more than one collection.
Note
If your goal is to archive data from several collections, you must create an archiving rule for each collection.
You can't archive data below the size of 5 MiB after 7 days. For 7 days immediately after Atlas creates an archive, Atlas archives all data. After 7 days, Atlas archives data only when your data size reaches 5 MiB.