Gretel Announces General Availability of Gretel Navigator, Empowering Enterprises with High-Quality Synthetic Data on Demand

Gretel Announces General Availability of Gretel Navigator, Empowering Enterprises with High-Quality Synthetic Data on Demand
image

Compound AI system enables developers to generate, edit, and augment tabular data using natural language prompts

Gretel, a leader in synthetic data generation, announced the general availability of Gretel Navigator, the agent-based, compound generative AI system built to automate data creation and curation processes for AI development. With simple natural language or SQL prompts, Gretel Navigator enables users to create, edit, and augment tabular data, and design realistic, high-quality test and training datasets from scratch. Developers can also leverage existing datasets to generate insight-rich synthetic data on demand.

“Whether you’re building a retrieval-augmented generation (RAG) system, training a foundation model, or fine-tuning an LLM for a specific task, high-quality data is the single most important ingredient for success,” said Ali Golshan, Co-founder and CEO at Gretel. “But the status quo for acquiring that data is broken. Scraping the web leads to inconsistent quality, and de-identifying private data does not offer adequate protections. Meanwhile, manual data labeling is time-consuming and costly, translating into weeks and months of data preparation before the real work can even begin.”

Gretel Navigator addresses traditional challenges with data acquisition head-on by enabling developers to generate customizable, realistic synthetic datasets that mimic real-world patterns without compromising individual privacy. Navigator supports a wide range of data formats, modalities, and context-specific optimizations to streamline workflows and expedite AI projects.

“With Navigator, developers can design the data they need 10x faster than manual curation techniques,” said Alex Watson, Co-founder and CPO at Gretel. “Developers can go from zero to model-ready data products in a matter of hours. Navigator puts them in the driver’s seat, enabling them to focus on innovation instead of data janitorial work.”

Gretel Navigator is powered by an ensemble of pre-trained AI models, including Gretel’s custom tabular Large Language Model (LLM) which was trained on a diverse curation of public and proprietary datasets, including electronic health records, financial documents and market data, and other industry-specific formats. This enables the system to generate high-quality, vertical-specific, synthetic tabular data that is crucial for enterprise AI applications.

“Having access to high-quality and safe tabular and text data on-demand has dramatically enhanced how we operate and the speed at which we deliver value to our clients,” said Pablo Cebro, Head of Technology Platforms for Client Technology at Ernst & Young. “Data quality and safety is top priority for EY. The data we generate with Gretel Navigator is frankly better than real data. It’s more complete, accurate, and cost effective. It has significantly expedited our product development and AI roadmap.”

Gretel Navigator incorporates privacy-enhancing technologies, like differential privacy, and addresses key AI development challenges, such as domain knowledge gaps and historical biases in limited real-world datasets. It also prevents issues like model drift, and boosts overall model accuracy for high-value AI applications. By enabling secure, real-time access and tailored optimizations of sensitive or proprietary training data, Navigator empowers developers to build state-of-the-art models that are continuously learning and adapting to critical real-world feedback.

In addition to Ernst & Young, Gretel Navigator has accelerated AI initiatives at leading companies such as Microsoft, Google, Databricks, and AWS, as well as emerging AI startups like Athena Intelligence and Dataclay. Gretel Navigator is also the AI system behind the world’s largest open source Text-to-SQL dataset, consisting of over 100,000 high-quality synthetic Text-to-SQL samples with metadata spanning 100 domains and industry verticals. Since its April release, this dataset has been downloaded more than 10,000 times and used to train and fine-tune AI models across industries.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW