The CB Insights Search APIs are by far the most utilized and computationally intensive calls made within our platform. These calls originate in many places including our web app, Chrome plugin and also via direct API calls from customers who’ve integrated us directly into their systems (often CRM / dealflow management systems). Among other things, these APIs power the feeds available on MyCBI as well as Deal Search, Company Search and People Search.
In other words, they’re very critical to powering the intelligence layer that we’ve built on top of CB Insights data.
CB Insights hit an inflection point in 2014 in terms of customers. But with more customers, certain problems arose. The canary in the coal mine was a climbing CPU and as a result, we started to hear things like “searches are slower than they used to be”. These one-off complaints then started to grow in number and despite our best efforts, things were not improving.
And so we needed to take a look at changing things more dramatically.
CB Insight Search Circa 2014
In the end, this boiled down to a fundamental architectural problem. The core of our Search API engine was making inefficient calls directly to our database and therefore wasn’t scaling well across our constantly expanding data set. It looked something like this:
CB Insights Search with Amazon CloudSearch
We migrated our entire infrastructure over to Amazon in 2014. Among many benefits of being within their ecosystem is the seemingly endless number of tools and resources that we get readily available. One of them is CloudSearch, which gives our search tools the ability to, in real-time, process text-heavy incoming data streams in a way that doesn’t disrupt the user experience. The result is an extremely fast and flexible search solution that takes the data our crawlers are parsing via the Cruncher as it hits our database.
We migrated this change last week – the evening of February 25th, 2015 to be exact. Here’s a snapshot of our Deal Search response time including performance both before and after (compliments of our New Relic dashboard), delineated by the green line. The improvement is nothing short of remarkable:
More precisely, the average response time for a Deal Search call on CB Insights took approximately 2.91 seconds before the migration to CloudSearch and now averages 0.58 seconds . This represents an 80% improvement or said another way, CB Insights search is 5 times faster now that Amazon CloudSearch is in place.
If you’ve used CloudSearch and have had success with it or have any tricks that you’ve found to improve performance further, we’d love to discuss in the comments. If you have questions on what we’ve done with Amazon CloudSearch, feel free to ask as well.
And, of course, if you want to be part of the team that makes the magic happen, CB Insights is growing like a weed and hiring.


