AI Snorkeling, the data-centric AI platform company, today introduced Data-Centric Foundation Model Development for enterprises to unlock complex and performance-critical use cases with GPT-3, RoBERTa, T5, and other foundation models. With this release, enterprise data science and machine learning teams can overcome adaptation and implementation challenges by creating large, domain-specific data sets to fit basic models and use them to build smaller, specialized models that can be implement within government and cost constraints. New capabilities for data-centric baseline model development are available in Snorkel Flow, the company’s flagship platform, in preview.
Entry-level models like the GPT-3, DALL-E-2, Stable Diffusion, and more offer a lot of promise for generative, creative, and exploratory tasks. But companies are not yet close to deploying basic models to production for complex and performance-critical NLP and other automation use cases. Enterprises need large volumes of task- and domain-specific labeled training data to tailor basic models for domain-specific use. Creating these high-quality training data sets with traditional manual data labeling approaches is excruciatingly time consuming and expensive. Additionally, the basic models are incredibly expensive to develop and maintain and pose governance constraints when deployed to production.
These challenges must be addressed before companies can reap the benefits of basic models. Snorkel Flow’s data-centric database management development is a new paradigm for enterprise AI/ML teams to overcome the adaptation and implementation challenges that currently prevent them from using basic models to accelerate AI development.
Using early versions of Data-centric Foundation Management Development, AI/ML teams have built and deployed highly accurate NLP applications in days:
- A major US bank improved accuracy from 25.5% to 91.5% when extracting information from complex contracts of several hundred pages.
- A global housewares e-commerce company improved accuracy by 7% to 22% when classifying products from descriptions and reduced development time from four weeks to one day.
- Pixability distilled the knowledge of the basic models and created smaller classification models with more than 90% accuracy in days.
- The Snorkel AI research team and partners at Stanford University and Brown University have achieved the same quality as a fitted GPT-3 model with a model that was more than 1000 times smaller in LEDGAR, a complex task of legal reference of 100 classes.
“With over 3 million videos created daily on YouTube, we need to consistently and accurately categorize millions of videos to help brands properly place their ads and maximize performance,” said Jackie Swansburg Paulino, Pixability’s chief product officer. “With Snorkel Flow, we can apply data-centric workflows to distill knowledge from basic models and create high-cardinality classification models with more than 90% accuracy in days.”
Enterprise Foundation Model Management Suite features include:
- Fine adjustment of the foundation model to create large domain-specific training data sets to tune and tailor basic models for enterprise use cases with production-grade accuracy.
- Warm Start base model to use state-of-the-art low-shot zero-learn and base models to automatically label training data with the push of a button to train deployable models.
- Base Model Request Generator to develop, evaluate, and combine prompts to fit and correct the output of basic models to accurately label data sets and train deployable models.
“Enterprises have struggled to harness the power of entry-level models like GPT-3 and DALL-E due to fundamental customization and implementation challenges. To work in real business use cases, the basic models need to be tailored using task-specific training data and overcome key implementation challenges around cost and governance,” said Alex Ratner, CEO and Co-Founder of Snorkel AI. “Snorkel Flow’s unique data-centric approach provides the necessary bridge between core models and enterprise AI, solving adaptation and implementation challenges so companies can achieve real value from core models.”
Register for free within BIGDATA Newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW