What is “Big Data” and “NoSQL”?

Modern day applications are becoming increasingly complex and larger in scale.  Additionally, consumers are demanding near instant access to information on demand.  The combination of these factors create daunting technical challenges.  For example, pulling data from large table stores can take several seconds to several hours depending on various contributing factors, such as the complexity of the query or the distribution of the data across tables.  Solving a problem like this could be accomplished a number of ways.  Indexing the tables can bring some relief, but also come at a cost for inserting and updating records.  Buying larger servers would also be helpful, except that those servers can become extremely costly for any organization.  Another strategy could involve batch jobs to aggregate data, but the overall maintenance for planning the solution and maintaining batch jobs could easily turn into a monotonous operational routine.

NoSQL databases is a term that describes a new class of databases that have been designed to solve specific types of problems.  They are commonly associated with the term “Big Data”.  They do not follow the traditional relational (RDMS) model, but usually implement specific principals that are useful to solving a problem. MongoDB is an example of a NoSQL database.  MongoDB is known as a “document store” database.  This means that – instead of storing each data point as into a column of a table – the entire object is stored into a table as a JSON object.  MongoDB could auto-generate a unique identifier to identify the object, or the identifier could be manually set.  The records within the object can still be queried individually, however, some indexing work is needed to make such queries efficient.

MongoDB is a NoSQL database that solves the problem of when your system may not have a strictly defined data schema.  It also is a great tool for rapid prototyping.  That is because there is no work needed to define your database to use MongoDB.  MongoDB will accept the data however you want it stored and allow it to be queried.  The database does not need to be defined prior to using it.  As well, each record can have different fields – if desired.  Gone are the days of needing NULL values to populate fields there is no intention of using.

MongoDB is one example of a NoSQL database that helps solve a specific problem.  There are many other NoSQL databases out there, such as Redis, Cassandra, Riak, Neo4j, ElasticSearch, and HBase to name a few.  Redis, for example, is an in-memory database that allows for unparalleled, blazing fast retrieval of data.  Where RDMS databases can join tables, a NoSQL database like Neo4j can create richer, multi-dimensional relationships between tables.  Specifically, if you had a table of restaurants and another of people, the two tables can have multiple direct relationships established.  This relationship structure could also be accomplished with RDMS, however, the Neo4j relationship model reduces the overheard of joining multiple tables to create these relationships, thus generating greater power for searching data more efficiently.

Many databases, like Cassandra and HBase, are built to scale horizontally.  This is, if there is a need for more processing power, these databases can provide that power by adding additional low cost machines.  Unlike RDMS databases, these databases can distribute their data across any number of machines.  When data is requested, these databases use a technology called MapReduce to aggregate the distributed data and send a single stream of results.  The power in this structure is the reduction of the hardware system constraints.  Company’s like Facebook are able to consume an unlimited quantity of data using a similar database structure.

NoSQL databases can be used in combination of RDMS databases and other NoSQL databases.  For example, Redis could sit on top of Oracle to provide instant access to the most commonly used or queried data in the Oracle database.  These databases carry a relative light footprint due to their specialized intent, therefore it is possible to have multiple NoSQL databases on a single server.

Each NoSQL database has a purpose.  It is critical that the purpose is understood before deciding NoSQL database to use.  Implementation of these databases is relatively simple once the concepts are understood; however, there are varying levels of support available for each one.  When deciding whether to use a NoSQL database for a business, the safe bet will only fall back to RDMS databases like SQL Server or Oracle.  NoSQL database, however, provide a strong competitive advantage, when used correctly.