Atlas Data Lake
About Atlas Data Lake
MongoDB Atlas Data Lake is now an analytic-optimized object storage service for extracted data. Atlas Data Lake provides an analytic storage service optimized for flat or nested data with low latency query performance.
Prerequisites
Atlas Data Lake requires an M10
or higher backup-enabled Atlas cluster
with cloud backup jobs running on a specified cadence. To learn more
about cloud backups, see Back Up Your Cluster.
Supported Types of Data Source
Atlas Data Lake supports collection snapshots from Atlas clusters as a data source for extracted data. Atlas Data Lake automatically ingests data from the snapshots, and partitions and stores data in an analytics-optimized format. It doesn't support creating pipelines for Views.
Data Storage Format and Query Support
Atlas Data Lake stores data in an analytic oriented format that is based on open source standards with support for polymorphic data. Data is fully managed, partition level indexed, and balanced as data grows. Atlas Data Lake optimizes data extraction for analytic type queries. When Atlas Data Lake extracts new data, it re-balances existing files to ensure consistent performance and minimize data scan.
Atlas Data Lake stores data in a format that best fits its structure to allow for fast point-queries and aggregate queries. For point-queries, Atlas Data Lake's storage format improves performance by finding partitions faster. Aggregate type queries only scan the column required to provide results. Additionally, Atlas Data Lake partition indexes improve performance for aggregate queries by returning results directly from the partition index without needing to scan underlying files.
Sample Uses
You can use Atlas Data Lake to:
Isolate analytical workloads from your operational cluster.
Provide a consistent view of cluster data from a snapshot for long running aggregations using
$out
.Query and compare across versions of your cluster data at different points in time.
Atlas Data Lake Regions
Atlas Data Lake provides optimized storage in the following AWS regions:
Data Lake Regions | AWS Regions |
---|---|
Virginia, USA | us-east-1 |
Oregon, USA | us-west-2 |
Sao Paulo, Brazil | sa-east-1 |
Ireland | eu-west-1 |
London, England | eu-west-2 |
Frankfurt, Germany | eu-central-1 |
Mumbai, India | ap-south-1 |
Singapore | ap-southeast-1 |
Sydney, Australia | ap-southeast-2 |
Atlas Data Lake automatically selects the region closest to your Atlas cluster for storing ingested data.
Billing
You incur Atlas Data Lake charges per GB per month based on the AWS region where the ingested data is stored. You incur Atlas Data Lake costs for the following items:
Ingestion of data from your data source
Storage on the cloud object storage
Extraction Costs
Atlas Data Lake charges you for the resources utilized to extract, upload, and transfer data. Atlas Data Lake charges for the snapshot export operations is based on the following:
Cost per GB for snapshot extraction
Cost per hour on the AWS server for snapshot export download
Cost per GB per hour for snapshot export restore storage
Cost per IOPS per hour for snapshot export storage IOPS
Storage Costs
Atlas Data Lake charges for storing and accessing stored data is based on the following:
Cost per GB per day
Cost for every one thousand storage access requests when querying Data Lake datasets using Atlas Data Federation. Each access request corresponds to a partition of data from a Data Lake dataset that Atlas Data Federation fetches to process for a query.
Note
You can now set limits on the amount of data that Atlas Data Federation processes for your queries to control costs. To learn more, see Manage Atlas Data Federation Query Limits.
To learn more, see the Atlas pricing page.