What does it take to build AI that works for both business and humanity? by Wendy Gonzalez

Reviews

0
No votes yet
Automatic Summary

Building AI for Business and Humanity: Insights from Wendy Gonzales

In a rapidly evolving digital landscape, the integration of Artificial Intelligence (AI) into business operations has become a necessity. Wendy Gonzales, CEO of Sama, recently addressed the critical aspects of developing AI that serves both business and societal needs. In this article, we will explore the core challenges of AI adoption, regulatory developments, and fundamental principles for building responsible AI systems.

The Current Landscape of AI Adoption

AI technology is everywhere, with a significant shift from experimentation to deployment within enterprises. However, despite nearly three-quarters of organizations utilizing Generative AI, over 80% of companies are failing to deliver the intended ROI from their AI investments. This discrepancy highlights a crucial gap between AI adoption and readiness.

Challenges in AI Implementation

  • Misinformation: Models often produce incorrect answers, leading to significant downstream effects, particularly in sensitive areas like healthcare and finance.
  • Reputation and Legal Risks: With various lawsuits emerging around AI usage, companies face potential reputational damage.
  • Bias in High-Stakes Areas: Mistakes in recommendations for lending or healthcare can have dire consequences for individuals.
  • Security and Privacy: Gartner predicts that by 2028, a considerable portion of marketing budgets will be allocated to combating misinformation.

Regulatory Landscape Shaping AI

The regulatory environment surrounding AI is evolving, as the "Wild West" era of AI begins to close. Key regulations include:

  • The EU AI Act: Passed in 2024, this act mandates rigorous data governance for high-risk AI systems, effective from August 2026.
  • California AI Laws: Several bills focused on transparency and data governance have been enacted.
  • ISO Standards: Standards like ISO 42001 are in development to guide ethical AI implementation.

Core Principles of Responsible AI

To better align AI practices with ethical standards, several key principles have emerged:

  • Fairness: Avoiding bias in AI decision-making.
  • Transparency: Ensuring clarity in how models operate and make decisions.
  • Accountability: Defining clear responsibilities for AI outcomes.
  • Privacy: Protecting user information and ensuring robust data handling practices.
  • Human Oversight: Maintaining human evaluation in critical decision-making processes.

Foundational Elements for Building Effective AI

Gonzales emphasized three foundational elements necessary for creating AI that works for both businesses and humanity:

1. Trustworthy Data

Grounding AI models in reliable and diverse data is essential. This includes:

  • Representative Datasets: Ensuring diversity across demographics, languages, environments, and edge cases to avoid hallucinations and inaccuracies.
  • Continuous Evaluation: Regularly updating and assessing models to account for changes in the real world.

2. Contextual Awareness

Understanding context is vital for improving AI's capabilities. For example, a model recognizing a stroller must also understand the common associated elements like a baby. Effective AI recognizes the nuances of context similarly to humans.

3. Stakeholder Alignment

Responsibility extends to the entire supply chain, encompassing everyone involved in AI, from data labelers to end-users. This diversity ensures a holistic perspective in AI development, allowing for better outcomes that meet the varying needs of users.

The Value of Responsible AI

Investing in responsible AI leads to enhanced performance, trust, and broader adoption. Companies that prioritize ethical frameworks can expect:

  • Enhanced Societal Impact: Preventing harm and discrimination through thoughtful AI applications.
  • Legal Safety: Reduced risk of lawsuits and regulatory issues.
  • Trust and Adoption: Increased user confidence in AI technologies.
  • Business Value: Achieving better ROI through responsible practices.
  • Global Applicability: Creating solutions that can operate in diverse cultural contexts.

Moving Forward: Steps for Leaders


Video Transcription

Hi, everybody. I'm Wendy Gonzales, CEO of Sama.And as Shelley mentioned, I'll be spending time talking about what it takes to build AI that works for both business and humanity. So this is probably self evident, but AI is everywhere. I think the thing that is most, present is that the pressure to ship is absolutely enormous. AI has truly moved from experimentation to the enterprise core, and the gap between adoption and readiness has really never been wider. So while nearly three quarters of organizations are using GenAI and, two thirds of, CEOs say that, AI and generative AI in particular is going to be a competitive advantage. There's a record spend that's happening, and yet still 80% of companies are failing to deliver against the intended ROI or the intended business value of their AI investment.

So not only are there still challenges for AI projects to deliver value, but generative AI in particular creates new ways to break and new ways to, to to fail. There are really four categories of challenge. The first is misinformation. Hallucinations, which is basically models presenting incorrect answers, well, the cost can ultimately compound to downstream systems, and this is particularly, important in agentic AI. There's, of course, reputation and legal risks. So there's everything from the recent suits in, in, chat GPT to, the Google AI overviews. There's also, the risk of bias in high risk or high stakes areas. So it's one thing to get the wrong recommendation from, an ecommerce website. It's a completely different thing to have mistakes when it comes to lending, mortgages, insurance, and health care. And then, of course, there's security and privacy.

AI is now a top attack surface. In fact, Gartner's, last predictions, for the next decade say that by 2028, 10% of marketing and cybersecurity's budgets will be, will be used to combat misinformation. So that's a representative of over 30,000,000,000 in spend. In fact, you may have seen most recently, there have been some some, interesting AI trademarks that are happening from everyone from, Taylor Swift to to, Matthew McConaughey who are literally trademarking their likenesses and their voices to ensure that they don't get deepfaked.

Well, there are a lot of regulations that are happening. A lot of said here that the Wild West era of AI is closing. I would say it is starting to close. So there are four, regulations here that I want to highlight. Probably the most, progressive of these is the EUAI Act, which was overwhelmingly voted into law, back in 2024. But now is, the implementation guidelines, are actually just now happening where in August 2026, high risk, systems that that leverage AI such as financial lending, insurance companies, policing, for example, that are powered by by AI must have a a rigorous set of data governance, provenance, model valuation procedures.

There are also, quite a few laws that are happening in The United States. In particular, California, which which, views itself as a leader in, AI policy, has signed over 17 art 17 bills, including, including a b twenty thirteen, which is about transparency and data. So generative AI transparency, how was the model trained and which data was used. There are other standards such as ISO forty two zero zero one, which actually my my, company, Sama, also, is, is, in development of. As well as now we are beginning to see customers actually begin to ask those questions as part of the RFP process. What are your AI development practices? Do you have the right protections in place? So each of these, regulations, each of these policies that are put into place, they really all share the same core principles.

The first is around fairness. How do you avoid bias in description? Around transparencies, how does the model work, and how does it make decisions? Accountability, how do we define clear responsibility for AI outcomes? Privacy, robustness, and, of course, human oversight. So how in those higher risk or key decisions is there some level of human oversight or at least an audit track that we know a human has evaluated it? So, there's some good news in that, many companies have embraced this notion of responsible AI frameworks and responsible AI practices. But there is a significant, execution gap. In fact, you can see that the majority of of, companies, including executives, believe they would have a hard time, a hard time, passing an audit. Sorry. Yeah. And beyond that, there is only one in five companies that actually has a governance model for, autonomous, AI agents, and, the level of, AI incidents in 2024 has gone to a record high.

Yeah. So there's one other thing that I wanted to, to touch on, which is that, ethical AI, doesn't necessarily equal responsible AI, and these two concepts are oftentimes confused. Both, of course, are essential, but ethical AI really focuses on, does the model produce discriminatory or harmful content? And on responsible AI, it's really more of an entire development life cycle approach to ensure that users can trust the AI that they are adopting. So these are all the challenges that I've outlined, but what does it actually take to build an AI that works for both? I'm gonna touch on three key topics. The first is to ground models in trustworthy data. Without representative data, models can often hallucinate and drift. So a great example of that, is very simply if you're, building a self driving car and all of the data that you've had has been in the summer and spring, when it becomes winter and there's snow that covers the car, if you don't have that data set, the self driving vehicle will not know how to deal with snow covered cars.

Right? So it's critical that you have representative data sets. That could be everything from demographics, environments, languages, and edge cases, that you have the right to use that data. So there's consent, quality controls, baked in, and then, of course, continuous evaluation. So models are just, they're like humans. So humans learn from experience while models learn from data. So these these, systems are gonna continuously digest new data, and it's important that they get evaluated on a very continuous basis to avoid model drift as, basically, the world changes underneath our feet. So I wanna talk a little bit about what diversity in data really means. We talk about representative data and diverse data. Oftentimes, the focus can be on demographics. So do you have, for example, in a in a, you know, an automated login, your login or your your your, you know, facial ID needs to recognize different demographics. So it is very important that you get age, gender, you know, ethnicity, ability, that those are well covered. But beyond that, it is quite a bit more. It's environments.

Right? So lighting, weather, and geography. That example I just used of snow covered cars. Or another example where, not too long ago, there were major floods in San Francisco and several Waymo cars stopped in the middle of, the intersection unable to move because they didn't recognize rain, in San Francisco, which is very unusual to have flooded streets. So it's about looking at all those different environments. It can be language. Language not just, English or, you know, French or Chinese or Spanish, but, what does the accent, look like? What are the dialects? And what about low resource languages? I'll touch on some of those in just a little bit. Edge cases and, of course, modalities. So modalities, cover text, image, video, sensor, and beyond. And now what we are seeing, and in particular, in our business at at at SAMA where we do, model evaluation and training data, the majority of the use cases we work on now are multiple modality.

So vision and text or video and text, text and audio and video. So the second foundation that's important so moving from, you know, data and having the right datasets. The second foundation is around context. So humans understand context, and that is something that AI, while it's getting better and better, still is yet to learn. So for example, a model could be trained on parents pushing strollers. K? But it might be might be a struggle to recognize a stroller alone or to generate one that actually has a baby inside of it. Right? So that context, we as humans see a stroller. We're gonna assume that there's a baby in it. We're also going to have the context to know that somebody's gonna be pushing it. So, automation, is confident. Right? But humans, they notice what isn't there. Right?

We would see that, hey. There's not a baby in this image or there's not somebody that's pushing that stroller. So, this is kind of an example here that I've, I've used to sort of demonstrate sometimes how synthetic data can be can be challenged and how even synthetic data needs to be evaluated by a human. Right? So the prompt could be a parent pushing a smiling baby, in a stroller at the park. Model output, stroller. No baby visible. Is a stroller? Is this a stroller? No. Well, it is a stroller. You just just because you don't have the baby in it doesn't mean it's not still a stroller. The third important, foundation is that you need to align with people related to AI impacts. So responsible AI extends the supply chain, you're responsible for. So that's every frame of the workers who label your datas as well as the users on the other end of the system.

One of the things that is most interesting is that if you're a user and you're you're, you know, leveraging, for example, an LLM, one of the major, challenges, with adoption is just feeling safe to ask the LLM silly questions. Right? So you have to have those different perspectives in mind. In addition to that, the data is how you train your model. So having a diversity of data and having a diversity of people who are annotating, labeling, or validating that data is really important because people see things differently. Specifically, Asama, diversity is a key, key, core part of our, value system, and we focus on everything from, gender diversity to ethnic diversity. But I'm certainly very proud to say that, women represent 50% of our company from our entry level positions all the way up to our senior positions. Yeah. So I mentioned I was going to talk a little about, about, rare, you know, rare languages or, lower volume languages.

So when you're building a a a model, a large model, so say, for example, chat g p t or cloud or others, these, models are are trained on the Internet, but there are gonna be languages that are not readily available. So, our research team pulled together a research paper after doing some analysis to benchmark how these large models would would, deal with, Swahili, which is, a very prominent East African language that tens of millions of people speak. So the challenge, of course, is that, both the this particular African language and the perspective are systematically underrepresented in these large large language models. So what we did was we actually created an Africa centric dataset. Everything from, you know, basic phrases, you know, what to eat for, you know, lunch, you know, breakfast, lunch, and dinner, to contribute a very well rounded set of prompts and high quality data. And what that does when these models are being trained is it presents a more representative model for everybody because these models, they know no borders.

So having low lower, resource languages, engaged is very important. And, there are some organizations, including the Aquarium Project that was originally, and Sea Lion Sea Lion, LLM as an example that was sponsored by, by Google that represents a ton of rare Southeast Asian languages, and it's an initiative to ensure that there's a diverse set of data.

So why, ultimately are these responsible AI foundations and these grounding factors so important? Well, it's because responsible AI is better performing AI. K? It is AI that people will will trust and actually adopt. So every company, if they want to have that level of adoption, ultimately, responsible AI will lead to both better adoption and better performance. And that covers a variety of areas, everything from, societal impacts of preventing harm and discrimination, legal impacts. I already mentioned trust and adoption. Business value and, of course, global impacts. So to be truly globally adoptable, it needs to be solution that can fit anywhere in the world. So there are a number of moves that any leader can make to build responsible AI that is both, representative, better performing, and more trustworthy.

The first is set principles then operationalize. So there are common principles that, are basically checklists where you can think of it as as sort of guardrails for every project that you, that you drive. You need to make it all about the data. Models are trained on data. It is still true that it is garbage in, garbage out. And if you're missing that data, missing a view on on having representative datasets, datasets that you can can legally access, that's challenging. You need to search for the provenance. You need to search for representation as well as edge cases. You should design our leaders should design with, human in the loop on purpose. Humans provide important context and validation.

It is important to both, to have that at at multiple steps in the process, but to continue to monitor and manage, your systems once they're out in the wild. Right? So this is kinda my fifth point, which is monitor like this is production code. Having an evaluation dashboard and metrics, drafting alerts, having checkpoints to assure that the quality, is maintained, is is absolutely absolutely critical. You wanna test like an adversary. So, you know, I I always say measure twice, cut, you know, measure twice, cut once. That is the best way to plan for these systems and build the right frameworks. But at the end of the day, I don't think there's ever been a single developer who's built the absolute perfect system.

There are gonna be things that you don't anticipate. So red teaming, alright, and adversarial testing is really critical to test the boundaries. Right? To test the boundaries because, I, certainly recall in in in the day that when, self driving cars first came out in San Francisco, people would do things like stick cones on top of them to make sure that they couldn't operate. Well, this is the kind as a a a physical example of an of an adversarial test. And, lastly, you want to treat the supply chain, as if it is as is as part of the AI development life cycle. Right? You want the people who are working to help train and validate your models to, not only have, fair wages and ethical sourcing, but to be focused on quality as opposed to quantity. So, as I as I near I know I'm going near the entire twenty minutes here. I think that the I'll I'll say what I've shared again here.

Measure twice, cut once. The thing that is constant is that the technology will continue to change. There will be the next version of llama, of Musepark, of of, ChatGPT. They will continue to change, constantly. But what shouldn't change is evaluation, oversight, and purpose. Getting the right evaluation frameworks in place, having the right level of oversight, continuing to check what was the purpose and what do we want this model to do, having those framework at space are absolutely critical. You can swap out models, but you cannot swap out your purpose and the way in which you evaluate whether your model is functioning as you expect. So build those foundations. And if you keep humans at the center, whether there are the individuals who are doing the training, the the data validation and the model evaluation to your users.

So how are you just gonna use this system? How how can you build a system that will reflect, their reflect how they, how they use it, and aligning with people that your systems touch. That's really critical. So at the end of the day, we didn't, have a bunch of when we did our, African, dataset research, we had team members who who, you know, understood the local Swahili context, language, and culture. Being part of developing that training dataset made the, LLMs, right, with those training datasets, it reflected it reflected, the the, the views of people who would ultimately use that data. So it was a very, very powerful thing. This is ultimately how AI works for both business and humanity at the same time and ultimately on purpose.