Vero AI Evaluates 10 Leading Generative AI Models Using Its Comprehensive VIOLET Framework to Gauge Responsible AI

Vero AI, an analytical engine and scoreboard that helps enterprises fully harness the potential of advanced technology including artificial intelligence while minimizing risk, announced the findings of its inaugural “Generating Responsibility: Assessing AI Using AI” report. Vero AI’s report provides a comprehensive assessment with measurable scores of 10 prominent generative AI models, to help enterprises understand how these tools align with responsible AI standards as determined by Vero AI’s VIOLET Impact Model™. The model was created by I/O psychologists and AI technology veterans.

“As generative AI continues to rapidly evolve, organizations are increasingly challenged to grasp its benefits and potential risks,” said Eric Sydell, PhD., CEO and co-founder. “Although there have been some attempts to quantify and assess components of popular generative AI models for fairness and compliance, the criteria in these studies have been too narrow in scope to provide valuable recommendations. To fully harness AI in a responsible manner, especially with the emergence of new AI regulations, a broad approach accompanied by a scientific method of measuring AI systems at scale is needed.”

Using its AI-powered analytical engine Iris™, combined with human experts, Vero AI evaluated publicly available documentation of some of the more popular LLMs and generative models, including Google’s Gemini, Open AI’s GPT-4, Meta’s LLAMA2, and more. Iris allows for automatic processing of vast amounts of unstructured information. The models were then assigned scores based on key components of the VIOLET Impact Model, including Visibility, Integrity, Optimization, Legislative Preparedness, Effectiveness, and Transparency. Vero AI’s VIOLET Impact Model is a holistic, human-centered framework of elements and methodologies that provide a comprehensive and objective view of the impact of algorithms and advanced AI architectures.

The generative AI models analyzed showed varying strengths and weaknesses according to the criteria evaluated

The average effectiveness score was 81%.
The lowest average score was on optimization (at 69%) while visibility (76%) and transparency (77%) were less than 10 points higher. These results underscore the importance of vendors giving equal weight to all components of an algorithm when designing and building their models, and continuing to monitor them to make sure they are meeting responsible AI standards.

Generative AI models are aiming for a responsible approach to AI, but the task at hand is large

Most generative AI models have posted responses to calls from the White House to manage the risks posed by AI, on their websites. Additionally, many have clear feedback channels for users to reach out with model experience feedback, questions, or privacy and data related concerns.

The majority of generative AI vendors could benefit, however, from increased efforts related to transparency about their model algorithms, training data sources, and data quality, as well as documentation about how they ensure fairness and prevent biased outputs.
Although individual scores ranged from as low as 56% in certain categories to a high of 86%, some strengths stood out for each of the evaluated models. For example:
- Google’s Gemini, Meta’s LLAMA2, Inflection’s INFLECTION2, Big Science’s BLOOM all scored high for accountability
- OpenAI’s GPT-4, Cohere’s COMMAND and Amazon’s TITAN TEXT, AI21Labs’ JURASSIC 2 have made noticeable efforts in risk management

There is a clear path forward to achieving responsible AI, prioritizing evaluation and transparency
There are many AI frameworks across the globe, even the top generative AI models did not score perfectly on the VIOLET Impact Model and demonstrated room for growth. Responsible AI results in the equitable and beneficial use and downstream effects of AI for all of humanity. As companies contemplate integrating AI into their operations, Vero AI makes the following recommendations:

Have your model independently evaluated for effectiveness and make these results clearly and easily accessible to end users.
Provide clear information pertaining to human annotation rules practiced in the development of the system and information outlining the scale of human annotation.
Be transparent regarding data sources – what methods were used to ensure data quality? How were humans involved?

Derived from a global approach to AI ethics and regulation, incorporating best practice frameworks and legislation from across a variety of countries and cultures along with scientific practices, VIOLET ensures that both business effectiveness and human interests are served.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30