From using synthetic data to train AI, to protecting patient privacy, to new data streams from wearables, we examine how AI is changing how healthcare data is collected and used.
In the hype and dystopian pop culture narrative surrounding AI, the most important step goes unnoticed: the data janitor work.
Here we dig into some of the trends that are reshaping healthcare data — from using synthetic data to train AI, to protecting patient privacy, to new data streams from wearables — with some lessons from Apple, Google, and fake puppies.
We will discuss AI and medical big data at our Future of Health conference on October 2-3, 2019 in NYC.
The trouble with data
Most AI algorithms today are trained with input-output pairs: “Here are 10,000 MRI scans, and this is what a tumor looks like.”
Your algorithm can then do only one thing: identify tumors in scans. But it can do that one thing really well, at scale, and with a high degree of accuracy.
But patient data is sensitive, siloed, and messy, and begs some basic questions about access and accuracy for algorithm training:
- If you’re a small health AI startup, what’s the source of data for the problem you’re trying to solve?
- Is data (such as medical scans) accurately annotated?
- How do you get training data for rare diseases?
- How do you protect patient privacy and still train AI models?
Here are some ways in which big tech companies and startups are tackling these questions.
What healthcare and Android keyboard have in common
Federated learning is an emerging approach to training AI with sensitive user data while protecting privacy.
Google first introduced federated learning for Android keyboards (Gboard). Now OWKIN — a Google Ventures-backed company that raised a follow-on Series A round this month — is taking a similar approach with patient data.
In a nutshell, patient data never leaves the hospital premises and is not sent to a central cloud server. The model is updated on-premise at the hospital using local data, and only these updates (and model updates from other participating hospitals) are sent to the cloud.
In a recent interview with MIT Technology Review, OWKIN co-founder Thomas Clozel talks about collaborations with cancer centers in the US and UK, and a soon-to-be published paper on using federated learning in real-world healthcare settings.
But the article also highlights some important infrastructural challenges:
If you’re active on Twitter and have been following AI closely, you may have come across people playing around with something called generative adversarial networks (GANs) to create realistic fake images, from puppies to burgers to flowers.
An interesting emerging trend is using AI itself to help generate more “realistic” synthetic images to train AI. Nvidia, for instance, used GANs to create fake MRI images with brain tumors.
Nvidia’s research paper concluded that augmenting real-world data with synthetic MRI images helped improve tumor segmentation.
Bridging the “reality gap” between synthetic data and real-world data is a challenge in applications such as robotics. It’ll be interesting to see how this takes off in diagnostics.
AI needs doctors
AI companies need medical experts to annotate images to teach algorithms how to identify anomalies.
The approach here has been largely collaborative. Tech giants and government agencies that are investing heavily in annotation are making the datasets publicly available to other researchers.
DeepMind’s AI for detecting eye disease involved rigorous work to make sure the data is accurate and in the right format. For example, around 1,000 scans were graded by junior ophthalmologists, and any disagreements in labeling were resolved by a certified senior specialist with over 10 years of experience.
In an interview with South China Morning Post, Chinese unicorn Yitu Technology reports that it has a team of 400 doctors working part time just to label medical data, and adds that higher salary ranges in the US may make this an expensive option for startups here.
Beyond EHR data
Apple is tackling data interoperability issues in healthcare by bringing on hundreds of health institutions and popular EHR vendors like Cerner and Epic as partners. We wrote about what this means for clinical trials here.
Apart from this, Apple is building a healthcare ecosystem around its two most popular products: Apple Watch and iPhone (we discuss the results of an Apple Heart Study involving over 400,000 participants below in our Spotlight ).
This hints at future possibilities for real-time data collection and analytics in healthcare, beyond reliance on EHR data.
This report was created with data from CB Insights’ emerging technology insights platform, which offers clarity into emerging tech and new business strategies through tools like:
- Earnings Transcripts Search Engine & Analytics to get an information edge on competitors’ and incumbents’ strategies
- Patent Analytics to see where innovation is happening next
- Company Mosaic Scores to evaluate startup health, based on our National Science Foundation-backed algorithm
- Business Relationships to quickly see a company’s competitors, partners, and more
- Market Sizing Tools to visualize market growth and spot the next big opportunity