Yes — an algorithm to identify unicorns is here.
In 2015, we worked with The New York Times to predict 50 future unicorns (companies that would eventually be valued at $1 billion or more).
To date, 24 of them have hit that mark (48%).
- 4 companies were acquired for $1B+, with the largest being Ele.me, acquired for $9.5B by Alibaba.
- 8 companies have gone public, all valued at more than $1B. Chinese used car marketplace Uxin had the highest valuation at IPO ($2.8B). Most recently, Elastic went public in October at a $2.5B valuation.
- 12 companies remain private unicorns. Together, they have raised over $8.0B in total funding.
- Of the non-unicorns (26), 14 went on to raise additional equity funding totaling $1.4B.
At the risk of sounding immodest, that is pretty good.
And if we were a venture firm, this kind of hit rate would make us legendary.
More recently, in December 2017, we identified the Artificial Intelligence 100 — a ranking of the top 100 private AI companies. In the 11+ months since the AI 100 was unveiled, we’ve seen:
- 32 of the AI 100 raised additional equity financing totaling $3.9B. SenseTime raised the most equity financing — $1.2B since making the AI 100 list.
- 3 were acquired — the largest acquisition went to Flatiron Health, which sold to Roche Holdings for $1.9B in Feb’18.
Algorithms to understand startup health
In 2010, we approached the National Science Foundation with the idea that we could use publicly available information and non-traditional signals to assess the health of private companies.
Having worked at American Express before, we had seen the challenges of assessing the health of smaller private companies aka “thin file” companies. We believed we could use the vast amounts of unstructured and semi-structured information that is being created to shine a light on and understand the health of these opaque companies.
The NSF agreed and in 2010, they gave us $150,000
Here is the CB Insights financing history from CB Insights (yes very meta)
Our initial traction / results with that first grant from the NSF resulted in 2 additional grants totaling $1M in 2011 and 2013 as you can see above.
With NSF support, we worked towards a model dubbed Mosaic that would aggregate and synthesize information about these companies from disparate sources and programmatically assess the health of startups.
We believed we could make understanding and identifying the best tech startups less of a crapshoot. Think of Mosaic as a FICO score for startups.
If we could do this with Mosaic, our belief was that capital, partnerships, talent, time and attention can flow to the right companies and misallocations of these resources would be minimized.
So how does Mosaic work?
What signals feed the Mosaic algorithm?
The Mosaic score is comprised of 3 individual models – what we call the 3 M’s.
- Market – how healthy is the industry the company is in?
- Money – what is the financial health of the company?
- Momentum – how much traction does the company have?
Each of the M’s relies on different signals.
Most attempts we’ve seen to quantify tech company health have almost exclusively focused on momentum. This is necessary but not sufficient.
Below is a bit on each model (although all the signals utilized are not revealed for obvious reasons).
The quality of the market or industry a company competes in is critical. If you are part of an industry which is in favor, that serves as a tailwind to push you along. Conversely, being in an out of favor space means fewer investors, partners, media and more.
Said another way – you don’t want to be a daily deals company today.
The market model looks at the number of companies in an industry, the financing and exit momentum in the space as well as the overall quality and quantity of investors participating in that area.
The money model is all about assessing the financial health of a company, i.e. do we think they’re going to run out of money? Our model here looks at things including burn rate and the quality of the investors and syndicate that may be part of the company as well as their financing position relative to industry peers & competitors.
The final model is momentum where we look at a variety of volume and frequency signals including social media, news/media, sentiment, and partnership & customer momentum among other signals.
We look at these on an absolute and relative basis vs peers/industry comparables. The relative piece is critical as it ensures that for example, enterprise software companies who may get less media attention or who spend less time on social media are not penalized versus consumer-focused tech companies.
Each of the 3 M models is scored on a scale of 0-1000 and drive the overall Mosaic score (also on 1000 point scale with 1000 being the best score).
So what would an ideal distribution of Mosaic scores look like and how are we doing against it?
The ideal and real distributions of Mosaic scores
If Mosaic was perfect, the distribution would look like.
In this distribution, the Mosaic scores of healthy companies (green) would be weighted towards higher scores and the Mosaic scores of the unhealthy companies (red) would be at the lower end of the range.
The challenges in getting to this theoretical perfect distribution should be quite obvious.
In short, there is immense fog around private tech startup companies which are inherently opaque organisms. In some instances, they actually use media and other channels to actively obfuscate their true performance.
So how does Mosaic do?
Here is the overall distribution of Mosaic scores. The median is 420.
Here is the Mosaic score distribution for unhealthy vs healthy companies.
So how did we backtest Mosaic?
We looked at companies with positive and negative outcomes (described below) and then looked at their Mosaic score 12 months prior.
A positive outcome includes:
- Successful IPO
- Acquisition w/ valuation that is greater than last private valuation
- A $1B+ private market valuation
A negative outcome includes:
- Bankruptcy / death
- Asset Sale
- Acquisition (Talent)
The mean score for companies with positive outcomes was 740. The mean for negative outcomes was 470.
When we analyze companies with a Mosaic score >= 740, the outcomes are as follows:
- 83% had a positive outcome
- 17% had a negative outcome
On the flip side, when we analyze companies with a Mosaic score <= 470, the outcomes are as follows:
- 97% had a negative outcome
- 3% had a positive outcome
What’s next for Mosaic?
We have 4 primary areas of focus for Mosaic going forward.
- More signals – We’ve introduced new data to CBI such as patents, earnings transcripts, market sizings, etc which we believe can integrate into the 3 M models. There are also a host of other data sets that we have on our radar to capture.
- Refinement of signals – Today, we consider business relationships (customer and partnerships) in our momentum model. With the entity extraction work we’ve done to extract the names of these partners/customers and track their progress, we believe a model that weighs these Business Relationships in a more nuanced way (based on size of customer for example) will offer additional precision to Mosaic scores.
- Gathering more data from companies themselves – To battle the opacity challenge, we have developed and will continue to develop tools that allow companies to paint the most complete picture of themselves for Mosaic. Currently, tens of thousands of companies and investors already update their data via The Editor, and we believe we can continue to improve these tools to ensure that companies are putting the most accurate view of their performance in front of investors, customers, partners and more. (Note: With company submitted data, we understand the challenges as well, i.e. companies don’t report their bad news and so have created ways to mitigate the selection bias inherent in company submitted data)
- Extending to other industries – We believe there are other sectors such as consumer goods & services that Mosaic can relatively seamlessly be extended to. In the medium-term, we believe a Mosaic score for biotech/pharma would be very compelling.
We’re excited to extend the work we’ve done with Mosaic to demystify the health of emerging tech companies even further.
To see our NY Times Future 50 unicorn picks and the 42% we got right, you can download the full report here.