NoSQL databases are designed to break away from the rows and columns of the relational database model. But it’s a common mistake to think that NoSQL databases don’t have any sort of data model. A useful description of how the data will be organized is the beginning of a schema.
Relational databases have had generations of users and developers to work out standard design methods. Various formal tools exist for describing the relationships between the main objects in a business domain, and these formal descriptions can then be used to dictate how the data will be stored.
The same types of standard data modeling tools are not available for NoSQL data modeling. One recommendation is to begin with a business domain model expressed in a form that can be incorporated in an application, such as a JSON document. Another important design driver is the types of data access that need to be supported. Some use cases require access via a query language and others require access by one or more applications.
Because no business or application domain is static, change over time must also be taken into account. When it comes to flexibility, NoSQL database schemas are much less costly to revise.
Generally speaking, because NoSQL databases are designed to store data that does not have a fixed structure that is specified prior to developing the physical model, developers focus on the physical data model. They are typically developing applications for massive, horizontally distributed environments. This puts an emphasis on figuring out how the scalability and performance of the system will work. But they still need to think about the data model they will use to organize the data.
A fundamental property of NoSQL databases is the need to optimize data access, which puts the focus on query patterns and business workflows. The first step is to establish business requirements, and work out the specific needs of the people who will use the application. The goal for schema design is to plan keys and indexes that are fast and effective for application queries and that complement workflow patterns.
There are multiple ways to go about selecting a primary key or deciding which fields should be indexed. Reviewing the needs of the users may lead to some likely candidates for indexes.
Consider how users will query the data and how often. One way to do this is to review actual queries after the fact, which may reveal patterns that can guide which fields to index.
One of the advantages of NoSQL databases is that they are relatively easy to modify in response to changes in either business requirements, query patterns, or the data itself. This means that there will be an iterative process of schema design throughout the lifetime of the application.
Some decisions may be suggested by the structure of the data itself. In a database of automobiles, many of the component systems are the same, so the schema might reflect inheritance, with optional add-ons. In a logistics application, adjacency might be used to group delivery addresses, suggesting a tree structure for the schema.
How often will data and documents be updated? This varies greatly depending on the nature of the application. A feed of real-time sensor data may update thousands of times per day, depending on how it is being stored. There are multiple ways to model time-series data in a document database such as MongoDB, such as one document per data point or bucketing data points into one document per minute.
Sometimes the predicted size of the index also has a bearing on how data is stored. For example, one MongoDB customer in financial services considered having one document per transaction, which would have created a predicted index size of 280GB. Choosing to bucket transactions by month reduced that size to 30GB.
What is a NoSQL data model? Since NoSQL databases vary considerably in how they are designed, data models for each of the four main types will naturally reflect these differences. Put another way, the type of NoSQL database used by an application will be chosen for a specific use case that takes advantage of these specific differences. Evaluating the best data model for the use case is a key consideration in deciding which NoSQL database is the best fit for your needs.
Each of the four main types of NoSQL databases is based on a specific way of storing data. This provides the logic for a data model in each case:
Document databases store data in the document data type, which is similar to a JSON document or object. Each document stores pairs of fields and values, with a wide variety of data types and data structures being used as values. Developers typically embed the structure of the document’s fields and values in the code objects in their applications. Queries are used to retrieve field values, and a number of powerful query languages have been developed to exploit the variety of field value types.
Each item in a key-value database consists of a key and a value, making this the simplest type of database. The data model consists of two parts: a string with some relationship to the data, such as a filename or URL, which is used as the key, and the data, which is treated as a single collection. Data is retrieved using the direct request method (provide the key and get the data) rather than through the use of a query language.
Wide-column stores use a table form, but in a flexible and scalable way. Each row consists of a key and one or more related columns, which are called column families. Each row’s key-column family can have different numbers of columns and the columns can have different kinds of data. Data is retrieved using a query language. The column structure lends itself to fast aggregation queries.
Graph databases consist of nodes connected by edges. Data items like names, prices, and products are stored in the nodes, while the edges store information about how the nodes are related. In a basic sense, nodes and edges are the data model. But at the scale of the whole database, the data model changes as edges are stored with new kinds of relationship data, and nodes are stored with new kinds of data items. Node and relationship information is typically retrieved using specialized query languages, but some graph databases use SQL.
Organization of data in a NoSQL database must lend itself toward increased scalability, better system performance, and optimized data access. By considering the schema design and NoSQL data model that would be the best fit for your data, you can more easily decide which NoSQL database will be the right choice for your needs.
NoSQL covers four main types of database design, so the details of each type will be different. What they have in common is that the physical storage will be distributed and horizontally partitioned.
NoSQL databases do not have a schema in the same rigid way that relational databases have a schema. Each of the four main types of NoSQL database has an underlying structure that is used to store the data. But the details of how the data is organized is very flexible, sometimes even to the point of being called “schema-less,” which is actually misleading.
Of the four main types of NoSQL databases, document databases, wide-column, and graph databases generally have specific query languages that take advantage of their respective strengths.
NoSQL databases fall into four main categories or types. One thing they have in common is that they do not use the rigid tabular row-and-column data model that traditional relational databases (sometimes called SQL databases) use.
Instead, NoSQL databases have a data model that reflects their particular category. Document databases can store a great deal of information in a single document and can nest documents. Key-value stores have a simple data model, just as their name implies. Wide column stores feature more variation in data types and the number of columns in use than row-oriented relational databases. Graph databases have data models based on graph theory, with data models made up of nodes and edges that relate those nodes.