Competitive relationships often violate rules of transitivity and symmetry. This has made it one of the thornier data science / machine learning problems that we've faced at CB Insights.
One of the hardest data challenges we’ve had at CB Insights is building an algorithm to identify competitors.
While we augment and deliver algorithmic recommendations to an internal team that curates competitor relationships before clients see them, our aspiration is to build an algorithm that does this at high enough fidelity (near perfect) that it doesn’t require human assistance.
This is challenging for many reasons. Competitor relationships:
- Often violate transitivity
- Are often asymmetric
Violating transitivity – So if A is a competitor of B, and B is a competitor of C, A is not necessarily a competitor of C. For example, Lyft is a competitor of Uber and Uber is a competitor of Grubhub, but Lyft and Grubhub are not competitors.
But violating transitivity is not the only problem.
Asymmetric competitor relationships – Sometimes, company A considers company B as a competitor, but company B doesn’t consider A a competitor. Or company A is just a feature of competitor B. In this case, if you ask who A competes with, folks might say B — but if you ask who B’s competitors are, A would not come up.
Consider SaaS vendor Mindbody, which makes fitness, wellness, & gym management software. One of its products is a point-of-sale device for its clients. Square is a competitor. But given its vertical focus, you might not agree that Mindbody is a competitor of Square.
Irrespective of where you come out on this, it is a point which can be argued.
Asymmetry doesn’t just come from product/customer focus — as the Mindbody / Square example highlights.
Geography can also influence asymmetry. Does an online shoe retailer in China compete with one in the United States or with one in India?
Sometimes, even when the core business is the same, the stage of company creates asymmetric competitor relationships.
Take Alibaba and Yamibuy: Both are engaged in e-commerce and target the Asian market. But Yamibuy is an early-stage startup that is targeting Alibaba, and so while the company likely perceives Alibaba as a competitor, it’s unclear if it should be listed as a competitor of Alibaba until it reaches a scale that suggests they are truly competitors.
These types of asymmetric competitive relationships make this an even more interesting (and difficult) problem.
Of course, one can try to throw humans at this problem, but that doesn’t work for several reasons:
- It’s expensive and difficult to scale when trying to do this for hundreds of thousands of companies
- It requires domain knowledge, especially in areas like biotech or enterprise software
- Businesses change product and focus areas over time, often subtly, which results in changes to who competitors are. The volume, velocity, and variety of these changes are impossible for human curators to keep on top of
Our model to identify competitors is the most sophisticated of anything available. It uses a variety of signals, ranging from search data to co-mentions in the media to keyword description overlap, and a ton of other factors. But we’re still not at the point of providing our clients with algorithmic recommendations that have not gone through some level of human review — i.e., they can’t just be pushed to production.
If you find this problem interesting or have solved similar challenges, we’re aggressively hiring on our engineering and data science / machine learning teams.
BTW, learn more about our algorithm that had a 48% hit rate on predicting future unicorns. (FYI, no VC has that kind of hit rate)