Docs Menu
Docs Home
/

Atlas Data Lake

On this page

  • About Atlas Data Lake
  • Sample Uses
  • Atlas Data Lake Regions
  • Billing

MongoDB Atlas Data Lake is now an analytic-optimized object storage service for extracted data. Atlas Data Lake provides an analytic storage service optimized for flat or nested data with low latency query performance.

Atlas Data Lake requires an M10 or higher backup-enabled Atlas cluster with cloud backup jobs running on a specified cadence. To learn more about cloud backups, see Back Up Your Cluster.

Atlas Data Lake supports collection snapshots from Atlas clusters as a data source for extracted data. Atlas Data Lake automatically ingests data from the snapshots, and partitions and stores data in an analytics-optimized format. It doesn't support creating pipelines for Views.

Atlas Data Lake stores data in an analytic oriented format that is based on open source standards with support for polymorphic data. Data is fully managed, partition level indexed, and balanced as data grows. Atlas Data Lake optimizes data extraction for analytic type queries. When Atlas Data Lake extracts new data, it re-balances existing files to ensure consistent performance and minimize data scan.

Atlas Data Lake stores data in a format that best fits its structure to allow for fast point-queries and aggregate queries. For point-queries, Atlas Data Lake's storage format improves performance by finding partitions faster. Aggregate type queries only scan the column required to provide results. Additionally, Atlas Data Lake partition indexes improve performance for aggregate queries by returning results directly from the partition index without needing to scan underlying files.

You can use Atlas Data Lake to:

  • Isolate analytical workloads from your operational cluster.

  • Provide a consistent view of cluster data from a snapshot for long running aggregations using $out.

  • Query and compare across versions of your cluster data at different points in time.

Atlas Data Lake provides optimized storage in the following AWS regions:

Data Lake Regions
AWS Regions
Virginia, USA
us-east-1
Oregon, USA
us-west-2
Sao Paulo, Brazil
sa-east-1
Ireland
eu-west-1
London, England
eu-west-2
Frankfurt, Germany
eu-central-1
Mumbai, India
ap-south-1
Singapore
ap-southeast-1
Sydney, Australia
ap-southeast-2

Atlas Data Lake automatically selects the region closest to your Atlas cluster for storing ingested data.

You incur Atlas Data Lake charges per GB per month based on the AWS region where the ingested data is stored. You incur Atlas Data Lake costs for the following items:

  • Ingestion of data from your data source

  • Storage on the cloud object storage

Atlas Data Lake charges you for the resources utilized to extract, upload, and transfer data. Atlas Data Lake charges for the snapshot export operations is based on the following:

  • Cost per GB for snapshot extraction

  • Cost per hour on the AWS server for snapshot export download

  • Cost per GB per hour for snapshot export restore storage

  • Cost per IOPS per hour for snapshot export storage IOPS

Atlas Data Lake charges for storing and accessing stored data is based on the following:

  • Cost per GB per day

  • Cost for every one thousand storage access requests when querying Data Lake datasets using Atlas Data Federation. Each access request corresponds to a partition of data from a Data Lake dataset that Atlas Data Federation fetches to process for a query.

    Note

    You can now set limits on the amount of data that Atlas Data Federation processes for your queries to control costs. To learn more, see Manage Atlas Data Federation Query Limits.

To learn more, see the Atlas pricing page.

Next

Get Started