Inside the alt-data world's 'unhedgeable risk': What happens when data streams like Yodlee and JumpShot go dark and hedge funds have to pick up the pieces
Mar 4, 2020
Tammer Kamel built — and sold — a business around finding unique datasets financial services clients can't get anywhere else. His company, Quandl, which was bought by Nasdaq at the end of 2018 , relies on third-party data streams of unique information, such as satellite images or credit card transactions, to create products hedge funds and asset managers can use to either generate or reinforce their investment strategies. But collecting rare data sets is only half the battle. Data aggregators like Quandl run the risk of having the originators they pull from shut off with little-to-no heads up. That was the case in 2018 when a data stream Quandl was using essentially went dark overnight after it was acquired by a company that no longer want to license data. The issue sits at the core of every data aggregator in the alternative data space, Kamel told Business Insider. "With uniqueness comes this risk that if the source evaporates, there is no substitute for it. That is why it was valuable in the first place," Kamel said. "It is a problem, and it is largely an unhedgeable risk." While frustrating, Kamel acknowledged the data product being wound down wasn't the end of the world. Quandl's customers, which include everything from quant funds to corporates and counts over 400,000 users on its platform, understood the situation and were willing to continue to work with the company. But clients aren't always so understanding, especially if the reason a data source is shut down stems from regulatory issues. "That's the nightmare scenario," Kamel said. "The data set disappearing. That's annoying. If the dataset proves to be ill-gotten, that's the disaster." The risk is becoming ever more real as Envestnet's Yodlee and Avast's Jumpshot have either been temporarily or permanently shut down due to privacy concerns and Congressional inquiries. Yodlee, which has data on credit-card transactions, and Jumpshot, which provided metrics on clickstream data, are far from the only ones affected though. Hedge funds and data aggregators build investment models and research projects off the raw data provided by companies like Yodlee and Jumpshot. There has already been collateral damage, as Connexity, a Los Angeles-based online retailer conglomerate, shut down its website traffic analytics arm Hitwise within a week of Jumpshot closing because it was extremely reliant on the now-shuttered data stream. "Any model, any strategy we build cannot be dependent on one or two data providers," said Yin Lou, the head of Wolfe Research's quant division. How hedge funds protect themselves
Lou said the firm uses more than 100 outside data providers, with 80% of providers having a clear back-up in case the primary option went dark. For example, the firm uses Ravenpack data for social media tracking, and backs up it with Refinitiv's social-media data if needed. While there's not a risk to the entire business of the largest hedge funds, losing a data feed as widely used as Yodlee could disproportionately affect some teams, Lou said. "I do know many consumer PMs were using Yodlee data extensively," he added. Smaller hedge funds too might run into problems of overreliance on a single dataset, not because they are not careful, but because they can't afford to pay so many different data providers, said Steve Iannini, who runs alt-data company P Street Advisors. Those that can afford both the data streams and the personnel can turn that information into investment ideas make up the top of the hedge-fund industry — quant funds like Two Sigma and Renaissance Technologies and multi-strategy funds like Point72, Citadel, and ExodusPoint. "There's a relative few number of hedge funds that are really in the data game," Iannini said. At Campbell & Co., a Baltimore-based hedge fund that runs billions in its systematic strategies, the firm has built an expectation for errors in data feeds, said Kevin Cole, the firm's chief investment officer. Cole said the firm has had practice dealing with data streams going dark during government shutdowns, which temporarily halts data like economic output and unemployment figures. When a data feed is cut off, Campbell has to decide how long a model can run without certain data. If it goes on long enough, the manager will allocate away from the model or even shut it down for a period until it feels like it can either replace the data or the feed comes back. "It should be assumed it will happen, not that is an exception to the fact," he said. Some investors are choosing to buy less alternative data altogether, instead taking the matter into their own hands. That's the tactic for Mike Chen, director of portfolio management at $43 billion PanAgora Asset Management. Roughly half the alternative data the firm currently ingests is collected by PanAgora itself, Chen told Business Insider. With the explosion of the space in recent years, Chen said it's harder to find data sets that aren't being pitched across the Street, thus losing their value. Chen also said PanAgora's investing strategy of finding data to reinforce ideas as opposed to finding ideas in data lends itself to sourcing its own data as opposed to going to vendors. "We are becoming ever more cynical," Chen said. "Compared to three to five years ago, PanAgora now has a much higher onboard threshold for alternative data sets. As a result, our adoption rate of external alternative data sets have been lower." An industry in need of rules
While the Yodlee scrutiny from Congress has been interpreted as a shot across the bow for the entire alternative data industry, many providers are hoping it gives more clarity to an industry that many say is still in its Wild West days. Emmett Kilduff, founder and CEO of data aggregator Eagle Alpha, told Business Insider the industry would welcome some help from rule-makers. Kilduff, whose firm has over 1,200 datasets on its platform, said vendors, intermediaries, and even buyers would be happy to adapt their practices to meet standards that might be set. "It's not helpful, frankly, that there is not enough regulation or guidance from regulators or governments," he said. "It's a gap that needs to be corrected." Iannini considers himself pretty lucky on the regulation front: His company, satellite-imagery-focused P Street Advisors, is one of the few alt-data providers that have a clear legal and regulatory framework to work with. The government has already laid out the rules for how high the resolution of a picture from a satellite can be and the biggest buyer of satellite images is the US government, Iannini said, so surprises like the ones that ensnared Yodlee are unlikely. Others feel the regulatory spotlight would be better suited on another industry: advertisers. Quandl's Kamel said while it's easy to pitch Wall Street as the boogeyman who is making money off of people's personal information, that's far from the truth. Investors are happy to have anonymized data, as they are looking for overall market trends. It's ad companies that are in search of specific data about individuals to better understand how to sell them products. "Hedge funds or the finance industry doesn't give a damn about your personal information," Kamel said. "The stuff that ad tech is doing with your data is far creepier and far more pernicious and far more threatening than anything Wall Street is ever going to do with consumer data because it doesn't matter. The aggregate is all that matters to Wall Street." However, some feel the outlook for data aggregators is much more grim. Marta Lopata, chief growth officer at Thinknum, told Business Insider firms can only consider themselves truly protected if they go about sourcing their own data. Prior to launching Thinknum, which sells companies data it scrapes from the web, Lopata said a lot of thought was put into where the startup would be best suited to source data. Something like the internet, which is a constant source of never-ending data, seemed like a better bet than places that might have been out of their control, she added. "I think the way you can really protect yourself in a space is choosing to bet on data sources that you originate. I think that's really the future of the alt-data space," Lopata said. "If you cannot be the originator of the data source, you are going to be at risk because you can't control whether the data originators will cut you off or the regulations will change and you're not completely in control. It's a very volatile space."