Operationalising AI Ethics by Natalie Rouse

Automatic Summary

Welcome to Operational Ethics and Artificial Intelligence

Hello everyone! My name is Natalie Rouse, the general manager at Eliza. We are a leading data science, analytics, and strategy consulting company based in Australia and Aotearoa, New Zealand.

While discussing ethics, it's clear that ethics in AI is now a critical consideration. However, creating consistent and universal methods for embedding these principles within our development life cycle remains a challenge. In this article, we explore the practical steps needed to develop fair and accurate AI systems. By the end, we hope to equip you with meaningful insights on strategies to mitigate artificial intelligence's unintended consequences.

The Need for Ethical Artificial Intelligence

Ethics are critically important in AI development. In recent years, we have seen AI technology applications repeatedly meet with unintended negative consequences, fostering an understanding of the importance of ethical considerations worldwide.

This awareness has spurred various sectors into drafting and adopting ethical principles. Principles focusing on fairness, transparency, accountability, privacy, and human-centered values like autonomy and consent.

From Principles to Execution: Operationalizing Ethics in AI

While establishing ethical principles is vital, it doesn't ensure a seamless transition into the development life cycle. We must ensure that our considerations do not add significant administrative burdens to the process.

One aspect to emphasize is the importance of representation and diversity to ensure robust and performant models, even for underrepresented groups. The process of embedding ethics into AI should be iterative and aimed at continuous improvement.

Key Considerations in Implementing Ethical AI

  • First, avoid copying and pasting ethical principles from the internet. For successful team buy-in, these principles need to be developed in alignment with your organization's values and principles.
  • Second, risk management is crucial in operationalizing ethics in AI. Effective identification, mitigation, and management of risk increase visibility and enable collective mitigation strategies.
  • Third, defining subpopulations within your data set is essential for understanding outcomes or varying levels of accuracy for different groups.
  • Last, understand that adding additional processes along development (including privacy and ethics considerations) should be done so non-prohibitively to not overburden the development team.

The AI Development Model

The nature of AI development differs from other areas of software development as it's iterative and probabilistic. Most practitioners use the Crisp ML process model which adequately describes project phases and shows the iterative nature of development.

Model Integration for Ethical Implementation in AI

  • The Scoping Phase: Begin with an ethical impact assessment and risk level calculation to understand potential implications. If higher risk methods can be replaced with lower risk alternatives, that option should be pursued.
  • The Model Development Process: This step includes many iterations of analysis of the data set and model performance. Adding tasks for subpopulation definition, representation, and performance is a viable addition.
  • Deployment: Before deploying, provide a set of recommendations for using the model to avoid uncertainties in the decision-making process.

Embedding Ethics in AI: Mitigating Risks & Performance Management

Ensuring the mitigation of risks identified during the development process is vital. Privacy and consent processes, contesting outcomes, and an adequate level of explainability are key areas to consider.

Additionally, monitoring the model's performance over time and how it varies across your subpopulations is equally important for achieving the desired outcomes.

Conclusion

The key lesson here is to start implementing ethical considerations in AI development today. An evolving framework that learns from each iteration is better than not having one at all. Stay curious, ponder how model performance impacts real-world outcomes, and make those performances visible.

While the process may not be straightforward, it is ultimately worthwhile. Thank you for your time and don't hesitate to reach out for more discussions on this exciting topic!


Video Transcription

Hello and welcome everybody to this women tech 2022 session on operational ethics and Artificial intelligence.I'm Natalie Rouse and I'm the general manager for Eliza, a leading data science Analytics and strategy consulting company based in Australia and Aotearoa, New Zealand, which is where I'm coming to you from today. I think we all agree now that ethics is a critical consideration for the development of artificial intelligence, but we don't have well-defined standardized methods for taking these principles and embedding them within our development life cycle. Today, we'll look at some of the practical steps that we can take to make sure that we're doing everything we can to develop fair and ethical A I systems from discriminatory hiring models to selectively poor performing facial recognition models and racial profiling to deep fakes.

Headlines over the past few years have been overflowing with examples of A I technology with unintended negative consequences. These examples have united people across the world in the understanding of the importance of ethical considerations when it comes to the development and implementation of A I systems.

This understanding has galvanized both individual contributors and organizations into action with many drafting and adopting a set of ethical principles. These principles have largely converged in some main areas around them like fairness, transparency, accountability, contestability, privacy and security and human centered values like autonomy and consent.

However, agreeing and establishing these principles does not automatically bridge the gap from organizational principles to the development life cycle work and careful thought is needed to embed these considerations into the development and deployment processes without adding prohibitive administrative overhead to any project.

And finally, representation and diversity really counts to ensure that models are robust and performant, even for underrepresented groups. And guess what? This is not a set and forget process, an iterative process of continuous improvement is needed to make sure that any approaches we take are constantly challenged and improved upon. I think this quote from the revered American poet and civil rights activist, Maya Angelou really perfectly represents the journey that we're on to do good in the world and not harm. Don't be afraid to have a go at creating a framework, a process and improving on it. As you go asking questions, being really curious and having a really good think at each stage of a project can make a huge difference to the outcomes as you embark on your journey to know or learn, learn better and do better. There are some key considerations to guide your footsteps. Please don't just copy and paste some ethical principles from the internet. For these to really get buy in from your team and really work within your organization. They need to be developed in alignment with your own values and principles.

This will allow your ethical principles to pull in the same direction as your organizational values for better outcomes all around risk management is a core part of the operationalization of ethics and A I, the identification mitigation and management of risk gives visibility.

It facilitates discussion and it enables collective agreement on mitigation strategies, agreeing and recording what an acceptable level of risk looks like for your organization is key, not just for the development process, but also for understanding and managing models and production.

Subpopulation definition is a critical element of embedding ethics within the modeling process. If you can't define subgroups or populations within your data set, how can you understand what different outcomes or differing levels of accuracy might mean for different groups of people?

And lastly, it's important to consider that adding additional processes alongside development, whether that be for privacy or ethics or both needs to be done in a way that's not prohibitively onerous on the development team. As a data science consultancy, we recognize the need to embed these considerations as standard into any project that we undertake. And as we have exposure to many industries and types of use, use cases are perhaps well placed to map that out some of that pathway for our clients and partners, the nature of A I or machine learning development is different from other areas of software or application development in that it is iterative and probabilistic in nature.

But many practitioners at a high level, at least have converged on the crisp ML process model which adequately describes the phases of any project at a high level and represents the level of iterative development at each stage. This model has been evolved from the cross industry process for data mining or crisp DM, which has long been the accepted wisdom for data mining projects, which may be thought of as ancestors for modern A I projects. So we have three main phases of development. The first phase involves understanding the business requirements and evaluating the data available to identify a solution approach and validate the technical feasibility of the solution. The second phase is all around the development and evaluation of a model or the third phase is about not just the deployment of a model into production, but the stewardship of that model throughout its lifetime, there are clear points during this process that key ethical considerations can be embedded during the scoping phase.

It's important to begin with an ethical impact assessment and risk level calculation to understand the potential implications of any solution that you identify. If an approach was deemed to be higher risk than a different one, then that might impact the decision on which approach to proceed with.

The model development process already includes many iterations of analysis of the training data set and the model performance, adding tasks for subpopulation definition and then review of both the representation in the input training data set and the performance against defined metrics across those subpopulations is a reasonable addition to this process.

When the model is ready for deployment, there should be a set of recommendations for the use of the model. For example, a level of aggregation below which the outcomes become outputs become less accurate or decision making processes that shouldn't be based on the outputs of this model.

Let's dive into these activities in a little more detail. The scoping phase of any A I project is critical to ensure the right problem is being solved in the right way to ensure, not just that maximum value is added to the process, but also to ensure the ethical impacts are well understood and the minimum amount of risk is introduced.

Many organizations already undertake privacy impact assessments at the outset of any data project which is a great start. But going a step further and extending API A to be an EI A or ethical impact assessment is an important step up front. This should be structured in such a way that low risk projects with little or no human or environmental impact can drop out quickly and proceed while more risky projects are subjected to adequate due diligence. It's important to consider ethical implications at this stage alongside other considerations such as Human Centered Design, because the potential impacts downstream may impact the aspects of the solution design such as the level of granularity. The data is operated on the explainability requirements on the type of model or method that you choose and what data you actually have the right to use the risk register that you should produce. At this stage will guide the rest of the project. The exploratory data analysis required to construct a suitable training data set to be used as an input to your model or A I system is a critical part of building expectations and hypotheses around what can be expected from your system. Bias is a key quantitative element of ethical A I. And we know that bias comes from unbalanced data sets and underrepresentation. But how do we define the groups or subpopulations? In order to measure representation? It turns out this is a hard question to answer.

If you have demographic data included in your data set, you might choose to use that. But things like ethnicity and even gender can be subjective or inadequate and app proxy. At best, we have identified a few approaches for classifying image data. However, this area is still in its early stages. I would welcome discussion with anyone who has thoughts on how best to do this step objectively and repeatedly. Once you've defined your subpopulations to measure representation and performance, you can select the performance metrics that can be used to measure performance during model training and also on an ongoing basis in production for example, if you're building a facial recognition system, you might care about just pure face detections or you might care about other things like recognizing a known person or estimating age.

Once you have a trained model that you're happy with, you need to review the performance for your predetermined subpopulations. It's best practice anyway, to interrogate your model slicing and dicing the performance metrics in as many different cuts and finding as many edge cases as you can to make sure the model has learned the right behavior from the right features to be robust in as many different situations.

As you can identify, this step is really just an enhancement for your existing process. It's really important at this point to think about what an acceptable tolerance is for performance. And by that, I mean, if you have a few percentage points spread in accuracy across your subpopulation groups, what does that mean? In reality if accuracy for all groups is over say 90% would those differences have any material difference in outcomes for any of those groups? What about if accuracy for one group drops below a certain level, what might the impact be of decisions being made for a group with much lower accuracy than the others? Or even if there's a tipping point where a few percentage points difference may have a bigger impact asking yourself these questions. And being really curious about the link between model performance and real-world outcomes helps you set realistic tolerances that can be used to monitor the performance of your model in production before we can deploy an A I system into production. We need to make sure that throughout the development process, we've taken steps to mitigate the risks that we identified upfront. Are we happy that a process for informed consent has been baked in? Is there a clear process for contesting outcomes?

And is the level of explainability fit for purpose? All of these decisions and details need to be captured in a living report that should inform any downstream consumer of the system outputs as well as the team monitoring and maintaining it once your A I system is in production, this does not mark the end of your ethics efforts.

As I mentioned back at the start, ethics is not a set and forget box ticked type of exercise, understanding how your model is performing over time and how the performance against your accuracy metrics might vary across your subpopulations. It's really important for monitoring the outcomes of your system embedding your ethics. Kpis within your existing ML ops framework alongside other performance metrics is the ideal way to do this. Your performance thresholds can be added as triggers or alerts for monitoring or retraining processes and you can give visibility of performance to key stakeholders in the business. Now, I know I've covered a lot in a short amount of time. Um but the key takeaways are really just to get started today do doing something is infinitely better than doing nothing. So start with a basic framework and evolve it over time building in your learnings from each iteration. Be curious and really ponder how model performance impacts real world outcomes monitor and make visible the performance of your model against your ethical metrics. So it's not necessarily a clear cut or straightforward process, but it's ultimately worthwhile nonetheless. OK. Well, thank you all for your time.

And, uh, my information is in my profile. I'd love to chat if anybody wants to chat more about this topic. Thank you very much and enjoy the, this fantastic conference.