Saturday, March 9, 2013

Putting NoSQL in Perspective

Deciding between a NoSQL database or a relational database system is about understanding the trade-offs that led to the creation of NoSQL to begin with. NoSQL systems have advantages over traditional SQL databases because they give up certain RDBMS features in order to gain other performance, scalability and developer usability capabilities.

What NoSQL gives up (this varies by NoSQL engine):
  • Relationships between entities (like tables) are limited to non-existent. For example, you usually can't join tables or models together in a query. Traditional concepts like data normalization don't really apply. But you still must do proper modeling based on the capabilities of the particular NoSQL system. NoSQL data modeling varies by product and whether you are using a document vs column based NoSQL engine. For example, how you might model your data in MongoDB vs HBase varies because each solution offers significantly different capabilities.
  • Limited ACID transactions. The level of read consistency and atomic write/commit capabilities across one or more tables/entities varies by NoSQL engine.
  • No standard domain language like SQL for expressing ad-hoc queries. Each NoSQL has its own API and some of the NoSQL vendors have limited ad-hoc query capability.
  • Less structured and rigid data model. NoSQL typically forces/gives more responsibility at the application layer for the developer to "do the right thing" when it comes to data relationships and consistency. Think of NoSQL as a schema on read instead of the traditional schema on write.

What NoSQL offers:
  • Easier to shard and distribute the data across a cluster of servers. Partly because of lack of data relationships it is easier to shard and distribute data across a cluster and capacity more incrementally and horizontally. This can give much higher read/write scalability and fail-over capabilities, for example.
  • Can more easily deploy on cheaper commodity hardware (and in the cloud) and expand scalability more incrementally and economically.
  • Don't need as much up-front DBA type of support. But if your NoSQL gets big you will spend a lot of time doing admin work regardless.
  • NoSQL has a looser data model, so you can have sparser data sets and variable data sets organized in documents or name/value column sets. Data models are not as hard wired.
  • Schema migrations can be easier but puts burden on application layer (developer) to adjust to changes in the data model.
  • Depending on what type of application you are building, NoSQL can make getting started a little easier since you need less time planning for your data model. So for collecting high velocity and variable data, NoSQL can be great. But for modeling a complex ERP application it may not be such a great fit.

Like with most things, there is no sliver bullet here with NoSQL. There are many different products available these days and each has its own particular specialties and pros/cons. In general, you should think of NoSQL as a complementary data storage solution and not as a complete replacement of your relational/SQL systems, but this will depend on your applications and product functionality requirements. For example, how you use NoSQL in a analtyics environment vs a OLTP setting can greatly effect how you use NoSQL and which specific NoSQL engine you choose. Keep in mind you may end up using more than oen NoSQL product within your environment based the specific capabilities of each - this is not uncommon.

Relational databases are also evolving, for example, new hybrid NoSQL oriented storage engines are coming out based around MySQL. Also products like NuoDB and VoltDB (what some are calling NewSQL) are trying to evolve relational databases beyond the vertical scaling and legacy storage computing restrictions of the past, by using a fundamentally different architecture from the ground up. Keep you seat belts fastened, the database landscape has not been this innovative and fast moving in decades.

No comments: