Why Google is Dying to be More Social

It’s tiring to hear that Google doesn’t have “social” in its DNA. I left a comment on the article, but I must elaborate on this battle of epic proportions happening in the tech world.

Arrogance is a Ruse

Google is or can be perceived as being arrogant. It’s unlikely that a company that has transformed the world as Google has can avoid a tinge of arrogance. Hence, it is naive to think that Google’s attempts at “social”, namely Google+, are merely whimsical dalliances of an aging giant. I don’t think anyone at Shoreline realistically thinks Google+ will ever overtake Facebook. The arrogance is a ruse to throw us off the scent–the smell of deepening fear.

Beyond PageRank and Wide Open Spaces

Google’s stated mission is to “organize the world’s information and make it universally accessible and useful”. When what all the world knew was Yahoo’s labyrinthian category hierarchy, Google’s algorithm for indexing and searching the growing sea of information was a quite a paradigm shift. This approach is still dominant today but the sea itself is changing. The Internet used to give Google free reign to crawl and index. Today, with digital fortresses like social networks and pay walls, it’s become more and more difficult for Google to complete its mission. For a while, Google had a deal to incorporate tweets into their search results. As of late, the relationship is still strained. Then there was the whole Google-Facebook address book debacle. More and more, the world’s information has evolved from merely flat web pages to intricate graphs containing not only people but brands, topics, and even pets. Companies like Badgeville and lately Facebook as well, go further, building “behavior graphs” where the connections between the content are rich verbs like “watched”, “purchased”, and “performed”.

Social or Bust

Google’s frustration is apparent and with good reason. Think back to Google’s ordeal with the Chinese government last year. The situation is different, but the problem is the same: the world is not as open as Google likes. The government writes the rules in China and Google must operate by them. It took Google a while to realize that. The same battle rages on between Android and iPhone. For better or worse, Apple keeps tight control over its iWorld. Some people like that approach, but for the rest of the market, Android is bringing order to the chaos. Google+ is an attempt to break down yet another set of barriers in another arena. It’s the counterweight to Facebook and its unique way of looking at the world. Whether or not you believe what Google believes, you should respect them for sticking to the mission, even if they don’t always clearly articulate them. However, you really have to wonder how Google will organize the world’s information when it has no access to large portions of it. Or perhaps what will happen if Google stopped fighting these fights.

How to Use MongoDB with Engineyard

We have scaled our primary database, MongoDB, on Engineyard over the last year. We started with a single mongod process running version 1.6.4, then went to 1.8.1 and now at the latest stable version. We’ve used master-slave, replica set, and also replica set + sharding. We’ve learned a lot about how to deploy MongoDB on Engineyard and wanted to share that with the startup community. I’ve even heard that Engineyard is working on productizing some of the configuration so you can launch a MongoDB just as easily as you use MySQL. While that’s still in the works, here are some specific tips from our experience that may save you time.

Getting Started

A good place to start is Engineyard’s best practices for MongoDB and custom Chef recipes. You should also familiarize yourself with the MongoDB documentation and perhaps even check out the Jira project that is viewable to the public. We’ve found the documentation for Mongo through and up-to-date. We deployed our first MongoDB environment using the example recipe. I suggest playing around with that for a while to get comfortable–e.g. mongo console, tutorials and such. Once you’re comfortable with a simple replica set, you can move on to more advanced recipes for sharding only if you need it. The transition isn’t too bad because a sharded environment is composed of individual replica sets.

Take Advantage of the Flexibility

We run multiple configurations in different environments because it’s expensive to shard every environment or even have replica sets. In your staging or developent environments, you probably only need a single MongoDB instance. The good news is that your application configuration can easily accomodate different these different configuration. For a sharded environment, you’re connecting to the cluster through “mongos”–this means your cluster looks like a single mongod process to your app. Having good recipes is the key to have everything work as they should. Develop some recipes for your app using the example, Engineyard, and other people (like us!) to customize the configuration to your needs. For example, we specify the version of MongoDB we use on every environment to have absolute control.

Know Your ORM and Drivers

When your recipes are good to go, you don’t need to change them much if at all over time. Getting your application working well with MongoDB can be a challenge. Recently, there have been big transitions in this world–Rails 2.x vs. 3.x, Mongoid 2.3.x vs. 2.2.x, BSON, BSON-ext mongo-ruby-driver 1.3.x vs 1.4.x vs 1.5.x There can be conflicts with Rails and Mongoid if you’re not careful about where you are on the Rails 3 divide. Drivers had issues going from 1.3.x to 1.5.x since 1.4.x was release and then quickly yanked from rubygems. On top of that, MongoDB was transitioning from 1.8.x to 2.0.x, which was a big change. The chaos will subside but knowing where all the pieces lie for you is crucial to avoiding problems. A configuration that worked for us was:

  • Rails 3.1.x
  • Mongoid 2.4.x
  • mongo-ruby-driver, bson, bson-ext 1.5.x
  • MongoDB 2.0.x
I recommend Engineyard and 10gen support to help with big deployment. We pay both for premium support but there are plenty of documentation resources and free support available if you’re strapped for cash. Of course, you can always find help on twitter along with @MongoQuestion as well as quora.

Special thanks for Ines Sombra, MongoDB expert on the Engineyard Data Team, who has helped us tremendously over the last year!

MongoDB + Hadoop = gg

I’m not sure if kids these days still do this, but when I was a gamer and I had just beat someone soundly, I simply typed “gg”. It’s short for “good game” but it really means “I just dominated you” or “game over, loser”. When you use Mongo with Hadoop, it’s kinda like that. With Mongo, you get a flexible, scalable database that excels at real-time processing. More and more startups are using it today and it’s our primary database (here’s why). With Hadoop, you get a distributed processing framework that handles everything you can’t or don’t want to process in real-time. It’s even easier to scale than Mongo, Amazon’s productized it (Elastic Map/Reduce), and it’s the swiss army knife of Big Data. Here are some reasons why they are a match made in data store heaven:

You Complement Me

Mongo is fast; it’s optimized for speed. Say goodbye to transactions and joins and other features you may not need. You can use it as your primary database to support your application in real-time. It’s not so good on large datasets or complex querying. You lost joins remember? And you want to shard my what? This is where Hadoop comes in. If you’re thinking about doing Big Data analytics, take those Nginx logs and crunch those numbers in your Hadoop Cluster. Or if you’re tinkering with the latest machine learning algorithms to predict your users’s preferences–Taste Graph anyone?–it comes in handy.

Two Words: Map and Reduce

Map/Reduce (M/R) is at the core of Hadoop. It allows you to break down complex tasks into manageable chunks of data and processing. Mongo took a page out of Hadoop’s book when it included an implementation of M/R. It makes it even easier, in my view, because writing the functions in Hadoop’s native Java is usually more confusing then writing Javascript for Mongo. Add some Ruby and you’ve got dynamic M/R in your Rails app! You can write mappers and reducers in Mongo to validate your Hadoop Java code on a smaller data set. And when you’re ready for the big show, you can fire up your 1000-server cluster to find the question to 42.

Have Your Cake and Eat It Too

If you don’t know what to choose for your task, you can always use both at the same time. With a plugin, you can use mongo as an input or output for Hadoop. It even has some optimizations for splitting the input on every chunk in a sharded environment. We’ve tried this for one of our features and it works very nicely. Eventually, if your data requirements may grow such that you’ll have to go fully into Hadoop, but you can get away with this hybrid approach for a long time. If you’re looking to speed up processing time, you could farm out some data to Hadoop, have your cluster crunch the data in bite-size chunks, and do some more processing in your application–all within a Resque job.

Art Credit

Why I Joined a Startup

There I was, working for Andrew Ng, known for his research in machine learning and computer vision. I was lead on Stanford AI Robot (STAIR), getting the project off the ground and teaching the robot to wheel around Gates and open doors with its robotic arm. Needless to say, I was very fortunate to have an interesting project. And yet something was missing. As Larry Smith said, “What you want is passion–it is beyond interest“. There I was, having a quarter-life crisis.

I don’t have an accent and though from time to time I talk like a fob or a banana for fun, my parents did a pretty good job raising me in the Chinese tradition balanced by America’s progressive values. In high school, my weekends were mostly spent studying for the SATs and when my parents weren’t looking, playing the original Quake on a 13-inch CRT. When I wasn’t in China visiting family during the summers at Yale, I worked at research labs on things like Molecular Electronics and the Autonomous Vehicles. I didn’t do internships in industry because that wasn’t in the plan. The plan was to get that Ph.D. because in Chinese culture, the degree is king. In a culture whose recent history rewarded political office to the best scholars and with it a life of fame and fortune, this mindset is hard to shake.

There I was, fulfilling the plan but not the passion. So I took baby steps. I worked on a few startup projects with friends and acquaintances and learned a bit of Ruby on Rails, which was in its toddlerhood (1.0, baby!). I wanted to build things that had impact right then–I knew I had at least a few decades to wait until robots would advance beyond glorified vacuums. I tried to read about the startup life but barely scratched the surface. To be honest, I didn’t really prepare myself properly. I had the comfort of graduate school to cushion me while I did some part-time work for a company called Howcast. One day I was working from home part-time and the next day I was “WFH” full-time. I still played basketball with my grad school buddies after work. There I was, a newly minted startup guy.

There are many reasons not to join a startup so back then I listed many of them to myself. People with young children or other family responsibilities. Nope, I left that box unchecked. People who can’t handle high-stress or high-intensity situations. Nope, my personality craves andrenaline–like playing FPS for 16 hours straight w/o a bathroom break or popping a few Lactaid pills before doing the Milk Challenge. Ultimately, this was a lame game to play and since “everyone was doing it”, I definitely got sucked in by the hype as well.

Here I am, a startup guy. I can tell you that you’ll learn so much more and have so much more responsibility at a startup, but that may not always be true. I can tell you that the potential rewards are greater than the risks, but I won’t go on the record with that. I can tell you that you’ll have more impact than teaching or working at a big company but that’s not a truism either. When you’re in a quarter-life or mid-life or three-quarters-life crisis, it’s a “solution” you’re looking for. It’s the sign in the road that says “Happiness 42 miles”. Well, there were budget cuts and those signs were never installed. No path to follow, no one end result to optimize for. And guess what, startup life may or may not be for you. It’s like saying, “Surprise! You’re on the Truman Show!” My reasons have changed over time but one thing hasn’t really changed. Here I am, a guy who isn’t ever bored for long.

Lessons from Codecademy


My wife wants me to teach her Ruby so I did something very DRY: I tried out Codecademy. This new EdTech site teaches you to program with interactive lessons. I know they’re onto something because I have many things to say and suggestions that could apply to any website.


The UI is very clean and the flow is better than average, especially for the level of complexity in the lesson creator. The progression is very clear and the gamification adds some color (more on this later). I love the live interaction using the console and the conversational style of the lessons. I haven’t done any of the advanced Javascript lessons, but I did spend a few hours and managed to create my own lesson for Ruby.

Things to Improve

Make You Model Structure Crystal Clear

It’s pretty hard to grok the way things are structured on the site. There’s a diagram in the documentation, but one of the main things holding me back was understanding the connections between Topic, Section, Exercise, Lesson, etc. I often tried to reference the examples form the core Javascript lessons but there seemed to be a disconnect from that and the lesson creator. This is understandably new ground and complex, but whether I’m a teacher or the student on the site, I won’t get far if I don’t get the model structure.

Poor Linkage of Documentation to UI

Creating a lesson could be simpler. There seems to be an entire documentation section, but it would have saved me a lot of headache if there were links in the lesson creator directly to the pertinent sections of the documentation. For example, when you’re trying to specify the teacher code that checks the student submission, there’s a pop-out with three examples. What is critically missing is the fact that you have access to three variables called “code”, “result”, and “error”. Without this nugget, the teacher will be scratching their head for a bit. In fact, I would suggest making the configuration for the core lessons from Javascript viewable by lesson creators (I was trying to mimic it anyway) or even creating a lesson to teach you how to use the lesson creator.

Get Serious About Gamification

Even though the badges are cute, it’s obvious that the game mechanics are not well thought out. I’m not the game designer in my company, but I’ve been around them long enough to know that you probably want to have leaderboards, levels, progress, more messaging on the “how” and “why” of the system, etc. As a student and a teacher in the system, it wasn’t clear to me where I was in relation to other players and where I should be going. This part’s not as big of a deal, since they seem to have their core competency down, engagement is a big part of any crowd-sourced content site.

Overall, Codecademy has captured the core experience, which is why I would use it and from what I hear 500,000+ other people would too. One additional philosophical argument I would echo from this piece by Audrey Watters is that the site is lacking the conceptual component. Sure, there’s a “glossary” that you can link to from what I can glean in the Markdown examples (yeah, I made the connection!), but it’s almost an after thought. It’s equally important to weave conceptual learning into this experience as well. But it’s likely they’ll add that soon since they only launched last August.

Engineyard or Heroku?

Web-based software is becoming more and more service-oriented. You probably know about software as a service (SaaS – e.g. Salesforce) and infrastructure as a service (IaaS – e.g. Amazon) but probaby not platform as a service (PaaS). Companies like Engineyard and Heroku make it easier to launch and scale a web application. For the most part, you can setup an account, run a few commands for basic configuration and you’ve launched in the Cloud. This is an order of magnitude easier than it was just 5 years ago. If you just felt your heart skip a beat, please take a few minutes to catch your breath. When you’re ready, let me help you decide what platform to launch your next application on.

Heroku is great for quick applications. The best Rails candidates I interview deploy their code challenges on there and send me a link. The smallest configuration on Heroku gives you a 5MB shared database and 1 “dyno”, their unit of measure for CPU. It’s free and you can even get add-on’s like New Relic performance monitoring thrown in. To scale up, you can pay more for dynos and a more powerful database. Setup is fairly painless, though I had to google for help when their “happy-path” documentation didn’t work exactly the way it was intended. The most annoying thing is that for the smallest configuration, the first request you make to the site takes 30-60 seconds to complete because their servers are dynamically provisioning you resources. Heroku hides much of the Operations side of things from you and that’s an intentional choice. It’s clear that they’d like to make the deployment process idiot-proof. I think that’s great for interview code challenges and school projects but if you’re serious about your application, you need to use Engineyard.


You need to be in control. Over the years I’ve seen EBS volumes become read-only all of a sudden, MongoDB refuse to resolve internal hostnames for a subset of instances, and even instances mysteriously become unreachable or “disappear”. Putting your servers in the Cloud has a huge drawback: you don’t have physical access to the hardware. When things go wrong–and they inevitably do–you need to have access. Since you’re saving money by not buying your own hardware up front and using IaaS, the next best thing is having as much control of your instance as possible. With Engineyard, you can ssh into every single machine and customize your own recipes. Essentially, they setup the default configuration but at the end of the day you can do whatever you want. For everything from Cron to Redis, I customize the configuration and understand exactly how processes are running. I analyze the CPU and memory usage and even the IO performance because I need to tune things like how many workers should be on each application instance. Control is critical when debugging issues. You’ll need to install custom monitoring and benchmarking to find out why your Memcached servers are maxing out at 80% memory utilization for example.

You need expertise. PaaS is your world-class Operations team. Even if you hired one or two Operations people, it is unlikely they will be experts in all the technologies you’re using. Given that I’ve never used paid Heroku Support, I can’t speak to their experts, but Engineyard definitely knows their stuff. They try to cover as many areas as possible in-house and through partnerships, cover others. I know that they have a great in-house MongoDB presence because their team helped me with my Sharding migration. I’ve gotten DBAs from Percona to look over my slow mysql queries, advice from Durran Jordan of Mongoid over IRC, and just as I was looking into Neo4j for some graph-based projects, I heard that they were actively talking with Neo Technology about a partnership. When your application has a problem and Google doesn’t help, you can either call a really smart friend like a contestant on “Who Wants to Be a Millionaire” or you can leverage the collective expertise of Engineyard and their network. They know your application and infrastructure really well and you may be seeing the same problem as another one of their clients. This expertise model works and just makes sense.

You need support. When you’re in the Cloud, you need someone to have your back. You can install a Heroku add-on like “Redis To Go”, but what happens when things break? As a engineer I’m pretty paranoid and rightfully so. I’m weary of things labeled “X to go” or “Y in a box”. Furthermore, I believe that if you’re going to use a technology in production, you should install it yourself. It’s critical that you develop a long-term relationship with a team that knows your specific application and infrastructure. When you read an email from Engineyard at 9am telling you how at 5am they detected a problem and brought your site back up without waking you from your precious 6 hours of sleep, you’ll know what I’m talking about. When it’s 3am and you’ve been wrestling with an issue that’s keeping your site down and your friends at Engineyard are still up and walking you through hell, you’ll know what I’m talking about. Heroku support may be just as good, I don’t know. But I’ve been in the trenches with these guys and I can tell you that whoever you choose, you better be able to count on them when the CEO is calling you to ask when the site will be back up.

If you’re serious about your application, you’ll take these points into consideration and weigh them heavily against cost and hype. Engineyard’s worth the money.
Disclosure: Badgeville is an Engineyard customer and so was Howcast when I was there