Predict your next investment

Datomize company logo
INTERNET | Internet Software & Services / Application & Data Integration
datomize.com

See what CB Insights has to offer

Founded Year

2020

Stage

Seed VC | Alive

Total Raised

$6M

Last Raised

$6M | 1 yr ago

Mosaic Score

+10 points in the past 30 days

What is a Mosaic Score?
The Mosaic Score is an algorithm that measures the overall financial health and market potential of private companies.

About Datomize

Datomize analyzes the use of Personal Data (PII) for the purpose of business growth and operational optimization,

Datomize Headquarter Location

28 HaArba'a St North tower, Flr 25

6473925,

Israel

Latest Datomize News

Data Roadblocks for AI – Most common challenges and how to avoid them

May 13, 2021

In this special guest feature, Dr. Sigal Shaked, Co-founder and CTO, at Datomize , discusses the best approach to overcome data challenges and achieve strong data governance so you’re in line with regulations – especially important for the banking and healthcare industries. Sigal has extensive experience of more than 15 years working with data in different fields and for various needs, both as a researcher and as an implementer, with a deep understanding of the underlying issues behind working with data. Sigal aspires to lead the best solutions for existing needs, armed with learning machines and their super powers, and equipped with human creativity that no machine would compete with. Artificial intelligence can add $15.7 trillion to the world economy by 2030, equivalent to China and India’s combined output. However, without a steady stream of complete and reliable data, machine learning models can’t provide valuable and trusted insights. Preparing data is a huge challenge. Data scientists often spend 80% of their time cleaning and managing data rather than training models. Here is a drill-down of the most common data challenges facing data scientists. AI/ML models are starved for data  According to a  McKinsey survey , out of 100 organizations that have piloted AI in at least one of their functions, 24 stated that the largest barrier in AI implementation is the lack of usable and relevant data. Linear algorithms need hundreds of examples per class, while more complex algorithms need tens of thousands to millions of data sets. When a model is trained with insufficient data, there is a high risk that it won’t work effectively when new data is added. Even in cases where a large quantity of data available, there is still a chance that the data will not be usable due to personal privacy laws. There are many regulations that prevent the use of sensitive data to feed machine learning models, including: General Data Protection Regulation (GDPR), Payment Card Industry Data Security Standard (PCI DSS), Health Insurance Portability and Accountability Act ( HIPAA ), Federal Information Security Management Act of 2002 (FISMA), Family Educational Rights and Privacy Act (FERPA), Gramm–Leach–Bliley Act (GLBA). It’s especially challenging to find safe data to feed AI/ML models for the medical and finance industries since the data is so sensitive. Data can also be biased. A machine learning model will make assumptions based on whatever data it reads. If that data tells a skewed or incomplete story, the rules it creates will be fundamentally unsound. Training data needs to be an accurate representation of the population, including data sets from every category. For example, even though black women are 42 percent more likely to die from breast cancer , machine learning models were fed by mammography images overwhelmingly from white women, making them inaccurate when reaching conclusions for the all women. There is also the issue of low data quality. Even if the data is safe and representative of every segment in the population, it can still be unusable because it’s incomplete, irrelevant, or out of date. Many enterprises have an inconsistent data vocabulary because data resides in silos in different regions, business units, and geographies. Steps to Overcoming Data Challenges In order to collect the data that’s needed it’s possible to create systems to harvest data from different sources. If you know the tasks that a machine learning algorithm will perform, then you can create a data-gathering mechanism in advance to collect the data internally. However, this data may be unusable due to regulations, so it will have to be anonymized. At the most basic level, you have techniques like data generalization, pseudo-anonymization, and data masking, but none of these methods are secure enough to deter hackers. Data swapping reassigns data points from one person to another, making the data more secure, but less useful for analysis to glean insights about real-life scenarios. Similarly, perturbation and differential privacy add random noise to obscure details that need to be kept confidential, making the data difficult to analyze. Enterprises, government agencies, and academic institutions provide open-source data. But intellectual property issues can limit the use of data only for research and not for commercial purposes. This data also tends to be useful only for industries that are not highly regulated. Therefore it can be hard to find data used in medical and financial industries. Another option is for enterprises is to generate their own synthetic data sets that contain the same schema and statistical properties as their “real” counterpart. Enterprises can then have more control over data quality and can tweak the data set characteristics based on AL/ML objectives. Generating synthetic data can also provide the scale global organizations need to create the quantity, variety, and granularity of data required so that the resulting models are unbiased, accurate, and complete. Even after all the necessary is collected, a data governance system needs to be implemented to keep the data pipeline working. Without a firm data strategy and enterprise-wide supply of data on-demand, machine learning models will no longer be relevant. Data governance costs need to be built into projects to ensure machine learning models stay on track. In our data-driven world, machine learning models are becoming a given for reaching insights that streamline operations, identify new revenue streams and provide engaging customer experiences. But until enterprises have a healthy pipeline of quality data, the quality and wisdom of the insights gleaned by AI/ML models are at risk. Sign up for the free insideBIGDATA  newsletter . Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Predict your next investment

The CB Insights tech market intelligence platform analyzes millions of data points on venture capital, startups, patents , partnerships and news mentions to help you see tomorrow's opportunities, today.

Expert Collections containing Datomize

Expert Collections are analyst-curated lists that highlight the companies you need to know in the most important technology spaces.

Datomize is included in 1 Expert Collection, including Artificial Intelligence.

A

Artificial Intelligence

8,323 items

This collection includes startups selling AI SaaS, using AI algorithms to develop their core products, and those developing hardware to support AI workloads.

Datomize Web Traffic

Rank
Page Views per User (PVPU)
Page Views per Million (PVPM)
Reach per Million (RPM)
CBI Logo

Datomize Rank

CB Insights uses Cookies

CBI websites generally use certain cookies to enable better interactions with our sites and services. Use of these cookies, which may be stored on your device, permits us to improve and customize your experience. You can read more about your cookie choices at our privacy policy here. By continuing to use this site you are consenting to these choices.