Monday, February 1, 2016

Temporal Database Design with NoSQL

Managing data as a function of time in a database is a common requirement for many applications and data warehousing systems. Knowing when a data element or group of elements have changed and over what period of time the data is valid over, is often a required feature in many applications and analytical systems.

While not easy compared to traditional CRUD database development, supporting this type of bitemporal management functionality using a traditional RDBMS such as MySQL or Oracle is a fairly well understood by data modelers and database designers. Such temporal data modeling can be done in a variety of ways in a relational database for both OLTP and OLAP style applications. For example, Oracle and IBM DB2 have built-in extensions for managing bitemporal dimensionality at the table and schema level. It is also possible to roll your own solution with any of the major RDBMS engines by applying time dimension columns (very carefully) to your schema and then with the appropriate DML and transactions manage the updating and insertion of new change records. To do this precisely and 100% consistently the database is required to support durable ACID transactions, something all RDBMS have in spades. See wikipedia links for a background on temporal database models.

Now this is all great, temporal and bitemporal table/schema design is an understood concept by data architects in the RDBMS world. Now how do you do this if you are on the Big Data and NoSql bandwagon? To begin with most NoSQL databases lack support for ACID transactions, which is a prerequisite for handling temporal operations on slow changing dimensions (temporal data) and bitemporal dimensions (valid time dimension and transaction time dimension). ACID transactions are required in order to properly mark expired records as new records are being appended. Records must never overlap and must properly and precisely be expired as new valid time and transaction time record slices are added to the database.

NoSQL databases such as Cassandra and Couchbase are powerful database engines that can be leveraged for a wide segment of data processing and storage needs. NoSQL databases offer many benefits including built in distributed storage/processing, flexible schema modeling and efficient sparse data management. Many of these benefits come at a price although that limit NoSQL database applicability in cases where durable ACID transactions are required for scenarios such as managing multi-row, multi-table transactions for both OLTP and OLAP data processing.

To address this limitation in NoSQL databases, a NoSQL such as Couchbase or Cassandra, for example, can be paired with an ACID database in such a way (the pairing is both operationally and at a schema design level) as to allow using the NoSQL database for what is best at while supporting bitemporal operations via pairing with a RDBMS. Under the hood this is done seamlessly by having a data serialization and deserialization API that synchronizes and coordinates DML operations between the RDBMS and the NoSQL database. The schema design structure provides a polyglot database framework that supports temporal and bitemporal data modeling and provides a data access and query API that supports durable bitemporal operations while supporting the flexibility and advantages of a NoSQL database modeling (document, key/value...etc).

This approach can be applied to NoSQL databases in both OLTP, data warehousing and Big Data environments. So leverage your favorite NoSQL database with best of both worlds! Get your polyglot engines going, your favorite NoSQL database just got bitemporal! Contact Grand Logic to learn how we can help you build your next bitemporal database environment.

No comments: