Starting in 2010 and for the following four years, CB Insights failed to get any sort of data network effect going. But then, things changed.
We have been in love with the idea of the data flywheel since we started CB Insights.
The data flywheel is the idea that more users get you more data which lets you build better algorithms and ultimately a better product to get more users. Rinse & repeat.
It is more commonly known as data network effects.
Andreessen Horowitz’s Alex Rampell refers to these activities as writing to the database (submitting) and reading from the database (consuming) and states “The reads become disproportionately more valuable as more people are using a central repository of data.”
We always knew that data network effects were good. Unfortunately, and is often the case, knowing is easier than doing.
We have no idea what we’re doing
Starting in 2010 and for the next 4 years, we failed to get any sort of data network effect going.
We contacted hundreds of VCs, private equity firms, and corporate development teams and asked them to submit their data, all while talking about our great platform and all the benefits it’d have for them.
We reached out to thousands of startups asking them to do the same.
And the reaction was an overwhelming, deafening silence.
We spent countless hours making the process of submitting data easier, faster, and less painful.
Given our sheer inability to get any sort of network effect going, we turned our attention almost exclusively to building machine learning software to algorithmically extract structured data from unstructured sources. We dubbed it The Cruncher, and with a high degree of confidence, we can say it’s the industry’s best product for algorithmic extraction of data from unstructured sources. This technology helps us extract data more quickly and more efficiently than anyone else in our space. (Just look at the large offshore operations / bloated headcount others in the data space have for proof).
So given the success of The Cruncher and our utter lack of success getting data network effects going, our data was 95% algorithmically extracted from 2010 to 2014.
All that outreach and effort and a measly 5% came from data submissions.
And then the flywheel started turning. What?!
We sort of gave up on the data flywheel to be candid. We tried for a while to get it going and none of our strategies worked, so it became less of a topic of conversation and less of a focus.
The Cruncher was doing just fine.
Maybe we didn’t need data network effects?
And then things changed.
We started getting emails from people asking how they could update or submit their data to us.
We had been asking for four years and folks couldn’t be bothered. And now, they were actually writing to us asking how they could get into our platform.
Enlightened self-interest happened
In 2015, several things occurred which individually and collectively changed things:
- Our newsletter started to take off
- We started getting a lot of media citations and attention on our data & insights
- We started doing market maps. Our first ever Periodic Table was issued in late 2014 (The Periodic Table of Tech) and our visual treatment of industries got a lot of attention
- Limited Partners (LPs) started becoming customers and the firms/people they invest in (GPs or General Partners) started to hear about this
- We did rankings of investors (VCs, angels, angel groups) and they reported seeing increased dealflow and media attention as a result. And those who weren’t on the rankings wanted to know how to get on.
In short, we went from an unknown and potentially annoying request for data that probably entailed extra work to something that could benefit their business.
We finally were appealing to their enlightened self-interest.
What is enlightened self-interest?
Wikipedia has a good, succinct definition of enlightened self-interest:
Enlightened self–interest is a philosophy in ethics which states that persons who act to further the interests of others (or the interests of the group or groups to which they belong), ultimately serve their own self–interest.
And this is what was missing before.
We were asking for data which was great for CB Insights, but had no benefit to those submitting the information. In the words of A16Z’s Alex Rampell, there was no incentive to write to the database.
But now there was.
- Potential exposure in the newsletter: 280,000+ subscribers; 1500+ added per week (subscriber count as of June 8, 2017)
- Inclusion of your firm or your companies into our market maps
- Media exposure
- Visibility amongst companies aka dealflow
- LPs visibility
In short, and unlike before, CB Insights now brought something to the table.
Some of the best investors in the world started submitting data to us after we inquired about doing a teardown of their firms. Fred Wilson of Union Square Ventures analyzed our USV teardown which created more attention on the data.
So with the flywheel starting to turn, we had to figure out ways to accelerate it.
Rankings & Market Maps are crack
If there is anything that elicits a Pavlovian response from investors and startups, it is rankings and/or market maps.
They are candy.
They are CB Insights’ crack.
Being mentioned in these is something folks now refer to in their press releases, websites, pitchdecks, etc.
Our rankings of VCs with The New York Times for the last two years have generated 5,000+ submissions of data alone on the deadline date. We get inquiries regularly asking about the rankings and when they’re coming out as folks want to submit their data.
Marketing and communications teams at firms and/or their PR teams now want to submit data to us.
This resulted in us getting non-public data on deals, valuations, etc. as well.
And because the newsletter reaches so many influential folks, the rankings and market maps which are data-driven, accurate and which generally kick-ass, get a lot of attention.
And so the combination of reach via our newsletter + rankings crack = more data.
Our market maps which highlight emerging companies and models in a variety of industries cover a lot of ground ranging from the warehouse of the future to attacking the grocery store to consumer robotics startups to the market for bitcoin & blockchain startups. An example market map for artificial intelligence startups attacking retail is shown below.
Every time we do one of these, companies and investors reach out and submit data as they want to be included in the next market map.
Another big driver of the flywheel – no briefings
It’s also worth mentioning that we get lots of inbound from investors and startups to brief our analysts.
We don’t do these.
Besides the inherently unscaleable nature of doing such live briefings and after trying a few, we found they didn’t do a better job getting us the data we wanted. And for CB Insights, data is oxygen.
And so we simply tell folks that if they want to get in front of our research team, the best way to do so is by giving us your data via The CB Insights Editor.
We don’t do briefings. We don’t want PR firms to reach out.
Just give us your data.
They typically oblige.
Wait. But that’s not really a network effect.
Yes — you’re right.
Right now, we’ve just shown you how having more clients and a bigger, more influential newsletter gets us more data.
The beauty of submitted data (which comes to us via The CB Insights Editor) is that companies and investors submit information to us that helps us improve our machine learning algorithms.
They provide us with:
- New news sources as proof of their investments, partnerships, and customers. These sources feed into The Cruncher giving us a larger index of sites to search.
- Competitor data — They list competitors which helps us improve our Spotify-inspired similar company recommendations
- Financing data & valuations – They give us data on their earliest rounds of financing (including angels and accelerators) plus valuations at levels of granularity otherwise not available which lets us develop new analytics
- Taxonomy –They provide tags when they submit data that are helping us with tagging and classification of companies
- New training sets – They give us partnership or market sizing data, thus providing us a training set of data upon which to build new classifiers so we can add new types of data to the CB Insights platform
All of this data improves the machine learning algorithms we’ve built and lets us develop new ones.
Ultimately, this improves our product and lets us deliver more high quality research, which then attracts more newsletter subscribers and clients, and thus keeps the flywheel spinning.
The data network effect for CB Insights looks a bit like this:
We’re happy to finally see the flywheel turning after so long.
If we keep doing what we’re doing (high quality product coupled with high quality research), we expect this will continue.
But we’re still trying to figure out new ways to get more data.
Here are some things we’ve tried or are contemplating trying:
- Request for startups – We’ve used the newsletter to let people know about upcoming rankings or market maps. We did this for our India fintech map as well as for an upcoming Latin American fintech market map. Every time we do a request for startups in the newsletter, we get hundreds of submissions from investors and startups.
- Anonymized benchmark data – We’re also looking at whether investors and companies will give us data that we can then aggregate across multiple investors/companies to provide them back with valuable information on industry trends and data (valuations, multiples, etc).
- Featuring submitters in our newsletter – To provide further incentive beyond being featured in rankings and market maps, we’ve also begun to highlight investors & companies that submit their data in the newsletter. For these folks, there isn’t a better way to get in front of investors, acquirers, or buyers of your company or portfolio company’s products.
For those interested in more on data network effects and data acquisition, we recommend:
- This A16Z podcast featuring partners Alex Rampell and Vijay Pande
- This post by Firstmark Capital Partner Matt Turck
- This post on data acquisition strategies by Moritz Mueller-Freitag (credit also for first graphic used in this post)