Progress is a good thing. Through progress new technology is created which can result in various actions or tasks being made easier, quicker, more cost effective, or even just viable. As progress marches forward in the IT sector there are two key technological traps that we can often find ourselves falling into. The most common trap is the argument that new technology is always an improvement over old technology. In the rush to adopt exciting, new, but ultimately unproven technology we can open our businesses to potentially risky and costly situations. Being an early adopter of new technology can give you a head start, but if you bet on the wrong horse, you can end up being left with a hefty bill, wasted time and worthless, unsupported technology to show for it. Think of the various format wars over the years; VHS v's Betamax and HD-DVD v's Blu-ray, to give two examples. Those who put their money behind Betamax and HD-DVD ended up spending a lot for a minimal long term return.
You may be reading this thinking that I am encouraging you to become a technological hermit, attempting to steer you clear of new technology until you can be one-hundred-percent sure that it is correct for your business. However, the danger of this is that you can get left behind while everyone else is reaping the rewards of progress. This brings us to our second technological trap: 'this is how we’ve always done it, why make the change'? This is a perfectly fair argument. Why introduce a new technology, at added expense, time and training, when the current technology serves your business perfectly well? It is true that there will be the above short term downsides, however, the long term benefits could be transformational for your business.
Below we will explore a particular recent technology that you may not yet be familiar with but has significant potential benefits for your business. The name of this technology is NoSQL. It covers quite a broad area and it isn’t necessarily for everyone, but we will investigate its advantages and disadvantages, allowing you to be in full possession of the facts before deciding whether it is applicable to your business.
The Emergence of NoSQL
20 years ago, when Amazon was in its infancy, it is probably safe to say that no one could have foreseen the extent to which data would be generated, transmitted, stored and manipulated in 2014 on the internet. As time rolled by the number of businesses existing in the electronic universe was increasing, with the emergence of further household names, like Google, Netflix, eBay, YouTube and Facebook, serving a wide spectrum of purposes. While the communities they served were different, one thing that they all had in common was that they would eventually be generating colossal amounts of unstructured or semi-structured data.
The social media revolution, exploding with the availability of Facebook, for everyone aged 13 or over, in September 2006, highlighted a ticking time bomb that needed attention. What needed to be addressed was how to store the vast amounts of data that was being generated for these websites. This wasn’t the only problem. The unstructured or semi-structured data generated on these websites was not naturally suited to the standard relational database, as the form of this data could dramatically change from one entry to the next. In addition to this, these companies wanted the flexibility to quickly add new features or types of data to be stored for use on their websites.
Using a standard relational database in such a quickly evolving environment could be like trying to hammer a nail with a screwdriver; entirely possible but often slow, painful and probably best avoided. Realising this fact, these enterprising companies started developing software to store and access this data more easily, more efficiently and ultimately very quickly. Today this software has found itself under the label of NoSQL. While the term NoSQL implies turning one’s back on SQL, it actually covers an even broader rejection of the relational database model as a whole. While relational databases have a well-defined mathematical foundation, NoSQL data stores cover such a wide variety of specialised solutions to different problems that there is not a single well-defined description of software falling under this term. We will, however, cover the broad categories of NoSQL data stores in the following section, highlighting the problems they can solve. Perhaps one of these solutions will be particularly relevant to your business.
The Flavours of NoSQL
Trying to place NoSQL data stores into convenient well-defined categories can be akin to herding cats, as they often don’t want to conform to just one definition. Below I will attempt to categorise them as best as I can, but be aware that there is some simplification here and that what they support is likely to evolve quite quickly over the coming months and years. The key categories are:
This is a very simple method of storing data. You just have a unique key and data associated with that key. The key is used to find the stored data. While simple, key-value stores are extremely powerful, storing data in volatile memory and pushing it to disk when required. Another powerful aspect is the inherent expectation of being distributed over many machines, allowing for highly available data stores, significantly reducing downtime. Amazon’s Dynamo and Apache Cassandra are two key examples of this technology. Twitter and LinkedIn use key-value stores.
This is where data is stored by column, rather than by row, as would be found in a relational database. This is ideal for sparsely populated databases, where one would typically expect to find many nulls. If no data is present for a column, that column is not stored or reserved. This can result in large reductions in the storage requirements for sparsely populated databases. Two examples of NoSQL implementations of this architecture are Google’s BigTable, a proprietary system, and HBase,a similar but open-source offering, with Facebook and Yahoo! being two big names making use of the technology.
This solution takes a more holistic approach to storing data, favouring keeping it together in a bundle or ‘document’. This allows many documents with varying data and structures to be kept together in a collection or database. One huge advantage to this is that you are not penalised as your business grows and your data model changes. Perhaps some of your documents need a new field or you no longer need another field? Simple, add the field to the documents as the data appears or remove it when appropriate. No messy nulls and no painful head scratching moments as you resolve issues with constraints. All the data relating to a document is within the document itself; you have one place to look. CouchDB and MongoDB are two of the more popular variants of the document store, with two proponents of this technology being FourSquare and the BBC.
Where the links between data (edges) become as important as the data itself (nodes), graph databases come into their own. These are quite specialised data stores, particularly suited to social networks, where it is important to distinguish between those who should have access to your profile (friends) and those who shouldn’t. If it is important to know exactly the relationship between one piece of data and another, the graph database may be the solution to your problem. Neo4j and AllegroGraph are two of the more common examples of graph databases, with Adobe, HP and NASA being some of its users.
Is NoSQL for Everyone?
NoSQL is not the grand solution to all data storage problems for businesses. Firstly, there is the requirement for the business to train or employ someone with a good knowledge of storage using NoSQL technology. While there are plenty of highly skilled DBA's who are very familiar with the realm of relational databases, software classed as NoSQL is so new that those who may call themselves ‘experts’ in the field are still essentially taking their first baby steps into uncharted territory.
Secondly, plenty of businesses who are not global technological giants like Facebook or Amazon are functioning effectively using only well-structured relational databases. For a significant portion of our clients, relational databases are an excellent solution to their problems, particularly when the specifications of the data to be stored don’t regularly change and also considering the support available when using software such as Microsoft SQL Server, provided by a large, global, well-established company. Support for NoSQL technologies is extremely varied. With the exception of a few NoSQL solutions (e.g. Amazon SimpleDB and Google Datastore), most are open source and offer communities of varied sizes who can help if something goes wrong. As this technology is becoming more established, firms offering NoSQL solutions and support are starting to appear. Waterstons' consultants are keen to stay ahead of the game and as a result of this we have explored some of the possibilities of NoSQL technology during our regular ‘Hack Days’ and we believe that it is showing significant promise, even for certain aspects of the ‘ordinary’ business.
NoSQL Use for The ‘Ordinary’ Business
While NoSQL has definitely found a home in businesses that have to manage huge amounts of data not suited to a traditional relational database, are there any advantages to businesses outside of this particular sphere? NoSQL should not be ruled out if your business is far from the bleeding edge of technology and I’ll give a number of reasons why:
- NoSQL can offer you the flexibility of easily adjusting and adapting the data that you store as your business grows and your priorities change. You are not limited to a rigid structure like those employed by relational databases.
- There is the potential for significant financial savings, while maintaining or exceeding the performance of popular commercial relational databases. NoSQL data stores are perfectly suited to be run across many low-cost computers, whereas relational databases typically have had to be run on individual high-cost, high-performance servers.
- If you are prepared to distribute your database over a number of lower cost machines, NoSQL data stores can provide an ‘always on’ solution. If you want to make any upgrades to one of your machines, or simply replace one, your users can still access the database due to its distributed nature. The same principle applies to software updates that are made on your machines. Essentially, NoSQL equals no downtime.
- NoSQL data stores can be an effective tool for taking a holistic approach to analysing the growth and status of your business. You can pull in extremely varied data from all over your business into a single data store and analyse as much or as little of this data as you desire.
Data storage is no different to any other problem to be solved. Make sure you use the right tool for the job. I know I would be very concerned if the plumber I had hired opened his only tool box and all that it contained was an array of spanners. Relational databases are not the only solution to your data storage problems. Look at the list above and ask yourself whether any of the reasons are applicable to your business. Denying the existence of NoSQL could deny your business the opportunity of a significant growth spurt.
NoSQL is a term which covers a broad spectrum of solutions to a variety of data storage problems. Implementing this in your business can provide vast increases in performance in certain circumstances, however, administer the appropriate solution to a given problem, whether that be NoSQL or a relational database. If relational databases are working well for your business at the moment and you are not adversely affected by performance or costs, you don’t need to immediately look for the first NoSQL data store solution that you can find; stick with your current model. However, do not underestimate the potential that NoSQL has for helping you to grow your business. keep your eyes peeled over the next few years. This is a technology for the connected generation.