Open Source and The Evolution of Full-Text Search


A few months back, I was asked to estimate how long it would take to implement scalable full-text search. I instinctively cringed and started to give my standard expectation-setting reply where references would be made to the complexity of Google and keywords like “stemming” would be floated like mines in the ocean. But I caught myself–instead, I replied “I have a few options in mind but let me do some research and get back to you”.

With Open Source software today, the pace of innovation is so fast that in many situations it makes sense to spend time researching the latest and the greatest even it’s only been months since you last checked. This was my third run-in with seach. The first was with Ferret and its infamous fling with the Rails community a few years back. Because of performance reasons, Engineyard added it to a naughty list (as seen above) and there it has remained. My second involved Sphinx. It took me months to write integration tests, tweak performance, and configure the options. With elasticsearch, it was like using an iPhone for the first time minus the $299 + 2-year contract.

What can Open Source do for me (an engineer)? Full-text search is one of many tools that has matured after years of iteration within the community. Instead of continuously reinventing the wheel, we help each other build ever more powerful components. In fact, joining the effort could mean as little as using software and reporting bugs. Open Source is also a great way to learn new techniques and shed bad habits. Because everyone is watching, transparency encourages accountability.

What can Open Source do for me (an entrepreneur)? In the past, Open Source software meant unreliable software to entrepreneurs. Today, it’s our ticket to a quick launch and fast iteration. Along with its commercial cousin, Software as a Service (Saas), Open Source can take care of many non-core parts of your application like talking to the database and rendering pages (Rails), sending emails (SendGrid), and now search (elasticsearch). No matter how impressive it would be for your engineers to build you a custom email solution, their time would likely be better spent figuring out how to get those pins to stick to the board.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s