The Transform Technology Summits start October 13th with Low-Code/No Code: Enabling Enterprise Agility. Register now!
Gretel.ai, a platform for generating synthetic and privacy-preserving data, today announced that it raised $50 million in a series B led by Anthos Capital with participation from Section 32, Greylock, and Moonshots Capital. The funds bring the company’s total raised to $65.5 million and will be used to support product development, according to CEO Ali Golshan, with a particular focus on expansion into new use cases.
Synthetic data, which is used to develop and test software systems in tandem with real-world data, has come into vogue as companies increasingly embrace digitization during the pandemic. In a recent survey of executives, 89% of respondents said synthetic data will be essential to staying competitive. And according to Gartner, by 2030, synthetic data will overshadow real data in AI models.
Gretel provides a platform that enable developers to experiment, collaborate, and share data with other teams, divisions, and organizations. Customers can synthesize, transform, and classify data using a combination of tools and APIs, which apply AI techniques to generate synthetic stand-ins for production data.
“Gretel’s tools enable developers and data practitioners to remove significant bottlenecks and enable ‘privacy by design,’” Golshan told VentureBeat via email. “[With it, customers can] synthesize data to boost underrepresented data sets for training machine learning and AI models, synthesize data to train machine learning and AI models where the synthesized data produced does not contain sensitive or personally identifiable information data, [and] transform data to power preproduction environments and testing with anonymized data.”
Gretel, which is headquartered in San Diego, was founded in 2020 by Golshan, Alexander Watson, John Myers, and Laszlo Bock. Bock was the former SVP of people at Google, while Watson led security startup Harvest.ai until it was acquired by Amazon for around $20 million in 2017.
According to Golshan, the pandemic has accelerated the trend toward stricter data privacy regulation and compliance — and, subsequently, the demand for privacy tools to mitigate those and other risks related to users’ privacy.
Fifty-one percent of consumers surveyed aren’t comfortable sharing their personal information, according to a Privitar survey. And in a Veritas report, 53% of respondents say they would spend more money with trusted organizations, with 22% saying they would spend up to 25% more with a business that takes data protection seriously.
This current business environment is also pushing companies to move faster to stay competitive, which also creates risk. Across the board, security experts cite the pace of technology adoption as a major contributing factor to the current cybercrime environment. And research published by KPMG suggests that a large number of organizations have increased their investments in AI during the pandemic to the point that executives are now concerned about moving too fast.
While synthetic data closely mirrors real-world data, mathematically or statistically, the jury’s out on its efficacy. A paper published by researchers at Carnegie Mellon outlines the challenges with simulation that impede real-world development, including reproducibility issues and the so-called “reality gap,” where simulated environments don’t adequately represent reality.
Other research suggests the synthetic data can be as good for training a model compared with data based on actual events or people, however. For example, Nvidia researchers have demonstrated a way to use data created in a virtual environment to train robots to pick up objects like cans of soup, a mustard bottle, and a box of Cheez-Its in the real world.
“In the privacy space, there are traditional companies more focused on compliance and regulations, and there are startups focusing on synthetic data for niche applications, but Gretel has taken a much more scalable approach by making forward-looking synthetic data and privacy tools available to developers as APIs,” Golshan said. “Synthetic data is one tool in the suite of privacy tools that we offer, which includes classification and transformation using advanced AI capabilities.”
A growing toolset
Gretel claims its platform is tech- and vertical-agnostic, compatible with a range of frameworks, apps, and programming languages. It covers tasks such as data labeling through the aforementioned API, as well as report generation for high-level scores and metrics that help assess the quality of Gretel’s synthetic data.
Heading off rivals including Tonic, Delphix, Mostly AI, and Hazy, Gretel says it’s working with life sciences, financial, gaming, and technology brands on “transformative” applications, like creating synthetic medical records that can be shared between health care organizations. Gretel is in the beta stage of its release and not currently charging users or customers, but Golshan says that it’s reached proof-of-concept with several prospects and expects these companies to transition into paying customers once the platform enters general availability early next year.
“We have almost 75,000 downloads of our open source distribution — Gretel’s ‘open core’ version of its synthesizer,” Golshan said. “We have 20 full-time staff and are expanding rapidly … By year-end 2022, we anticipate hiring 50 to 75 more staff, which will include more engineers and researchers, marketers, product managers, developer advocates, and sales.”
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.
Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more
Become a member