I’m not sure if kids these days still do this, but when I was a gamer and I had just beat someone soundly, I simply typed “gg”. It’s short for “good game” but it really means “I just dominated you” or “game over, loser”. When you use Mongo with Hadoop, it’s kinda like that. With Mongo, you get a flexible, scalable database that excels at real-time processing. More and more startups are using it today and it’s our primary database (here’s why). With Hadoop, you get a distributed processing framework that handles everything you can’t or don’t want to process in real-time. It’s even easier to scale than Mongo, Amazon’s productized it (Elastic Map/Reduce), and it’s the swiss army knife of Big Data. Here are some reasons why they are a match made in data store heaven:
You Complement Me
Mongo is fast; it’s optimized for speed. Say goodbye to transactions and joins and other features you may not need. You can use it as your primary database to support your application in real-time. It’s not so good on large datasets or complex querying. You lost joins remember? And you want to shard my what? This is where Hadoop comes in. If you’re thinking about doing Big Data analytics, take those Nginx logs and crunch those numbers in your Hadoop Cluster. Or if you’re tinkering with the latest machine learning algorithms to predict your users’s preferences–Taste Graph anyone?–it comes in handy.
Two Words: Map and Reduce
Have Your Cake and Eat It Too
If you don’t know what to choose for your task, you can always use both at the same time. With a plugin, you can use mongo as an input or output for Hadoop. It even has some optimizations for splitting the input on every chunk in a sharded environment. We’ve tried this for one of our features and it works very nicely. Eventually, if your data requirements may grow such that you’ll have to go fully into Hadoop, but you can get away with this hybrid approach for a long time. If you’re looking to speed up processing time, you could farm out some data to Hadoop, have your cluster crunch the data in bite-size chunks, and do some more processing in your application–all within a Resque job.