Building Responsible and Resilient AI Products: Best Practices for Scalability and Explainability by Koyelia Ghosh

Koyelia Ghosh Roy
Sr AVP - Enterprise BI Capability and Generative AI Solution Owner

Reviews

0
No votes yet
Automatic Summary

Building Responsible and Resilient AI Products: Best Practices for Explainability and Scalability

Welcome to my session on building responsible and resilient AI products, where we will explore best practices in explainability and scalability. In today's world, AI is not only powering our products but also significantly shaping our lives, influencing our decisions, and impacting our trust at scale. This article aims to address a crucial question: Are we scaling our AI innovation without compromising responsibility?

Understanding the RICEA Framework

To effectively overcome challenges in AI product development, we need actionable insights and comprehensive frameworks. One such framework is known as RICEA, which stands for Reach, Impact, Confidence, Effort, and AI Complexity. By using RICEA, we can prioritize our AI products with responsibility and ethical considerations deeply ingrained in our processes. Let’s delve into each component:

  • Reach: How many users will benefit from the AI product? What is your target addressable market? For example, a job recommendation system on LinkedIn impacts 900 million users.
  • Impact: What transformative benefits does this product bring? Amazon's Buy Box algorithm significantly influences sales revenue, showing the importance of understanding cost versus benefits.
  • Confidence: How confident are we in the AI model's effectiveness? For instance, Apple's Face ID needed to fine-tune its models to enhance confidence levels due to prior biases.
  • Effort: What resources are required for development and maintenance? Not all AI solutions demand the same level of effort; for example, building a simple chatbot can take weeks, while complex AI workflows may take months.
  • AI Complexity: Consideration of unique AI challenges, such as regulatory compliance and model accuracy, is essential for smooth operation and scalability.

Best Practices for Scalability

Scalability in AI does not merely mean accommodating more users but also managing increasing complexities. Here are some best practices to enhance your AI product's scalability:

  1. Model Architecture: Decouple AI components to update models efficiently without causing bottlenecks.
  2. MLOps: Implement MLOps strategies for automatic deployments, monitoring, and retraining workflows, ensuring long-term benefits.
  3. Data Management: Treat data as a core part of AI; separate data contracts and schema versioning from models to maintain functionality amidst data changes.
  4. Cloud-Native Approach: Adopt cloud-native thinking for auto-scaling infrastructure; technologies like AWS Lambda and Azure Functions can meet peak demands without overprovisioning.
  5. Metadata Storage: Store and tag models for easy management, enabling A/B testing and fast recovery from failures.
  6. Continuous Monitoring: Implement alert systems to track AI model performance, automatically triggering retraining when necessary.

Enhancing Explainability

Explainable AI (XAI) refers to techniques that allow users to understand the workings of AI models and the outcomes they generate. Here are some benefits and methods for ensuring explainability:

  • Accountability: Explainability helps in identifying biases and inaccuracies, fostering accountability.
  • Fairness: Providing reasons for decisions (e.g., loan approval outcomes) enhances user trust and reduces conflict.
  • Model Improvement: Understanding which features led to specific outcomes can help refine AI models for better accuracy.

Two popular techniques for promoting explainability are:

  1. SHAP (Shapley Additive Explanations): A reliable method that calculates the contribution of each feature in decision-making processes.
  2. LIME (Local Interpretable Model-Agnostic Explanations): Generates simplified interpretable models based on modified inputs, providing easier-to-understand results.

Conclusion

As we design AI products, it is crucial to think beyond technological possibilities and focus on what is right and ethical. Responsible AI goes beyond compliance; it encompasses a strategy that uplifts society across various sectors, including healthcare, education, and environmental sustainability.

To summarize:

  • Incorporate the RICEA framework in your AI product development to ensure

Video Transcription

A very good morning, good afternoon, and good evening to all. Welcome to my session, building responsible and resilient AI products, sharing best practices on explainability and scalability.I really appreciate you taking out time for the session. We are living in a world where AI is not only powering our products, but it is shaping our lives, impacting our decisions, and also our trust at scale. This session is to answer one critical question. Are we scaling our AI innovation without compromising on responsibility? In this session, I aim to provide you with actionable insights and real world strategies to prioritize your air products using RICEA, RICIA framework, how you can scale your AI products securely and transparently, and finally, how do you embed responsible AI development principles into your everyday development practices?

So whether you are a product manager, a program manager, an engineer, or a leader, you will walk away with insights that will not only help you to ship your products that works, but that works securely, transparently, and at scale. I hope my screen is visible and I'm loud and clear. May I just get an confirmation on the chat and then we can get started? Great. A little about myself. I've got around twenty years of experience in data and AI, and I have innovated microservice based, generative AI powered, explainable solutions, like conversational BI using data agents. I have been, winner of several global innovation awards, like three AI, Pinnacle Award, Inspiring Woman Leader, as well as Women in Clouds, empower her access awards, as well as ambassador award. I have been mentoring, in global JNI hackathons and have been fortunate that some of my team members have had the winners in such hackathons.

And finally, but not the least, is I've been able I do do I'm a thinker. I do code a lot. And one of my agents have also featured in Nee Diamond. He's a very famous AI agent practitioner. So this has featured in his repository as well. Great. So let me move forward. As I mentioned here, today going to discuss about scalability and the explainability. But we need to understand something. Scalability isn't enough. It has to be explainable, it has to be secured, and it has to be ethically grounded so as to have real lasting impact. Now to move beyond these buzzwords and put these principles into practice, We have to come up with a clear, actionable way to design and prioritize AI products. And that's where the evolution of the traditional product prioritization framework, which was RISE, reach impact, confidence, and effort, had to be expanded to embed the AI specific concerns, like the model accuracy, the explainability, the data quality, and the ethical trade offs that happens with every AI decision that is powered with AI models.

That evolution led to the inclusion of AI complexity, which takes care of the unique considerations that AI models bring with it. It helps you to not only ship products, which will work or which are smart, but which is scalable, which takes care of your security and also mitigates your bias. So that's where the RICA framework started taking lot of importance now. Let's get into the deeper into each of these impacts. Let's go into r, which is reach. The reach is your how many users is it benefiting? What is your target addressable market? How do you define your global and local scopes? How do you actually manage the diverse demographics that we're handling? Because that reach would help us to define how a solution would be. For example, in LinkedIn, we all are aware that it has recently come up with a job recommendation AI model, and it actually impacts 900,000,000 users.

A minor tweak in that AI model could impact and cause opportunity misplacement and can also cause a reputational damage. So imagine the reach it has and the kind of impact it could have with that particular demographics geographical limitations. So that is one area that one has to focus in re in RICF framework. The other one, let's come, is the impact. What is impact? Impact is to understand whether what is the transformative benefit this product is bringing? How are we aligning it to the ethical goals? How are we optimizing it such that it can scale, but it does not impact our sustainability? How do we do it such that we can balance cost versus benefits? Now imagine, in case of IBM and, so not IBM, Amazon. Amazon buy box algorithm impacts billions of sales revenue.

As minor twig in it can actually completely derail the stock market, Because every decision it has relates to the decision related to revenue. So the impact is extremely important. The transformative benefit, the the reach it has on the target market defines how we have to give importance to this specific element. Another case I would say for impact. Now you can have a big product like Amazon buy you know, buy box, but you could also have a specific to your company as well. So for example, UPS actually got this route route optimization AI model, which just tweaks and improves on your route and helps to reduce your fuel and labor cost. And this enabled it to save $300,000,000 annually. So understanding what is the benefit and how do we measure the cost versus benefit is important criteria when we go for product, when we go for AI products prioritization.

Let's go for confidence. Confidence is how much do we think that it will have an impact on the its use case? So for example, am I am I aware that it was tested under real conditions? Have we got this tested such that it does not cause, it it's it is completely transparent? Decision making is transparent. We know why the model has come to this decision. Have we considered what are the output that is necessary, and how are we making meaningful contribution or meaningful improvements around it? Now that's what confidence talks about. I'll give you an example of Apple face ID.

During the initial stage of Apple, because of its lack of diverse datasets on underrepresented groups, there was a lot of biases in its product, and hence it had a very low confidence level. It had to be extensively fine tuned and had to be trained on multiple diverse datasets to make it the level that Quidditch that it is currently at. So for confidence perspective, it is the effort that we need to give to make it really give the outcome, and how do we explain it transparently becomes very important. Another cautionary tale that I would tell you is of compass. The compass is a company which created a solution on recidivism. Basically, it used to help whether a convict would commit the crime again. This solution had very low explainability and had a very big racial bias, which didn't spark regulatory debate and also it did not fly.

So here we are need to understand the that what it is trying to see this experience. This experience is trying to tell us that if we do not make our AI models explainable, secured, and scalable, then the solution will not fly. Let's come to the effort. The effort talks about what is our resource, what what is our resource investments and complexity? How are we planning our maintenance and scalability after we have made it into production? How are we planning on its optimizing the development time? These are some of the question that we definitely need to ask when we go in for prioritization. And not all solutions can follow the same principles. For example, if you have to develop a bot on us on a GPT bot on Slack, it might just take you six weeks, which are quite simple.

However, if you go with s f SAP and SFDC AI workflows, where you have to take care of your access rights, of your human in loop, of your explainability, it becomes complex, and it can go up to nine months. So we have to understand high efforts is not equal to bad product. It is justified by the reach it has and the impact it has. I take a pause. I'd like to understand if I'm being able to be I'm heard properly. Good. Good. Great. So all good. So far all with me? Any questions? Okay. Good. So I'll continue. Now comes the AI components, the AI complexity. What makes it uniquely hard in the AI context? Think about regulation, explainability, model accuracy, system integration. There's so many things that we need to think about. Scalability. So how do we go about it? How do we get this into so that it survives? The few questions that we need to ask when we go with the AI complexity. First, how will it evolve with the changing technology?

That is the first question we should ask. Because technology is evolving very fast. Right, ma'am? Will it be able to take, you know, keep pace with evolving technology? How do we minimize the complexity that the AI models bring in? Which algorithm is appropriate for the solution? Some technical terms if I have to say, do we do quantization? That is when we try to compress the model. And if I do quantization, will it impact the accuracy of the solution? So these are certain questions that needs to be answered when our AI product is thought through. So very important here is to understand the algorithms, the complexities, and the technologies that are involved in it, and how are we approaching it so that it can exist with the evolution with the evolving environment that we are currently in. I'll give you an example. I think if someone's ask you, okay. Make a generative via agent for my marketing. We find it, okay, it's very pretty easy. Right?

Until until we realize that this model or this agent has to avoid brand damaging hallucinations. It has to align with the product catalog, which is constantly changing. It has to go through human approval loops. If we do not take care of these parameters, we are flying in blind. We are completely flying blind. So depends on the kind of products we try to we are trying to build. We have to look through and evaluate against each of the RICEA framework to identify what are we missing, what are the missing pieces. Now if I have to tell you, for example, a keyword tagger for a internal file search, now that's pretty simple. That doesn't need to for us to go into very in-depth analysis because it is far more simpler.

But something that seems to be simple may not be simple, and that's what RICEA helps us to bring about. It makes the entire process of prioritization much more transparent. Okay. Now this session I said, I would like to share some of the best practices for scalability and for your explainability. Now one thing that we I've already stated. Right? Scalability does not mean only increasing the number of users. It means how are we handling the change. More users, more complexities. More complexities, more edge cases, and, hence, more risk. So to address the scalability features, we have to find out what are the best practices that are going to help us manage this. So the first thing that we should think about is a model architecture. By decoupling our AI components, we can update the models and even retain them without impacting everything.

This reduces or absolutely eliminates the bottleneck that happens in a monolithic system. Then comes the MLOps. It is a unsung hero, I have to say. With MLOps, you can do automatic deployment. You can monitor the AI models and production. You can also have automatic retraining workflows triggered. Without these MLOps, the entire solution will slowly stop giving the benefits that it's supposed to give. So it's important that MLR strategy is well thought through and implemented from the very inception. Second, data is the heart of AI. Everybody knows that. But data keeps changing. And does it mean then then my AI product also stops working in between and then starts working again? That is going to impact your customer experience. So having your data contracts and your schema versioning separate to separate from your models actually enables you that your model functioning does not get impacted when data changes.

So focus on your schema versioning and also on your data contracts so that when we connect the AI models, we know which version is connected to. Most important importantly currently is that we have to adopt cloud native thinking because that enables us to have auto scaling infrastructure. It helps us to have serverless compute, and serverless computes are some of your examples, the AWS Lambda, Azure functions, which helps us to meet the peak demands without over provisioning. Finally, we need to also store and tag our models with metadata because that helps us to do the a b testing, cannery rollouts, and even fast recovery from failures. And finally, our work is not done after deployment. There is real world AI and data drafts happening. And if we don't monitor them, we're going to have very wrong assumptions.

So there has to be alert systems plan that will trigger the retraining workflows automatically the moment it identifies any AI data and prediction, drifting from the expected outcomes. Let's now talk about explainability, and that's a very one of my best topics, that I'd really love to discuss more. Explainable artificial intelligence. It is a set of techniques that allows users to comprehend the AI model and understand the outcomes and results that the machine learning algorithms generates. It ensures that there is transparent sequence of actions that's happening, and the features that contribute to the decision making is clearly traceable, detectable, and also can be corrected. For any organization, explainable AI provides that confidence and trust to put those products into production. And that leads to your trust and secured access. Because once we are know that what factors led to that decision making, it may get ensures fairness and also BTKs your bias.

For example, in case of bank loan approvals, if it is denied but it is comes with proper explanation, then the end user does not go for a legal, legal case. Right? They know why why what led to the decision making. And I will ex give an example. In case of a user with credit score, age, income, and debts, they bring some of the features. By letting the user know which feature led to its denial, gives the user a more fair explanation. Similarly, from a usability perspective, if, if a AI, which is used in health care, provides a diagnosis, and it gives clear indication to the doctors which test results or symptoms led the AI model to this diagnosis, it is easier for the doctors to validate and then adhere to their recommendations.

It also helps to improve those models. There are two key techniques that enables this explainable AI. One is s h a p, and the one is lime, l I m e. S h a p full form is shapely just one second. Yeah. So s s s a p, s h a p is talks about is a go to reliable and a very comprehensive explainable solution. Ideally speaking, it is basically a library that we use, and it talks and it helps to treat the features like it is a player in a game. So how does SHAP works? SHAP full form is shapely additive explanations. It calculates the average contribution of each player across all combinations. It is it actually helps to understand why the model came up with this decision in case of a loan approval process.

So an example, it will let you know that the results are negative because the solution, the the parameters or the features of which the credit score and the age and the age was plus 0.25, but the depth contributed to minus 0.35. Very clearly showing what features led to the decision making. Line or local interpretable model agnostic explanations. They talk about more about your works at a more local system level. Sorry. I did not change my slides. Just let me understand. Yeah. Am I so far everybody with me? Following what I'm saying? Great. So Lime, it is more local solution, ideally good for your demos and your POCs. How does it work? It what it does is that it creates slightly modified versions of your inputs, and then it runs the model with those inputs to understand the prediction changes. Then it creates a much more simpler interpretable model and plot those this plot those outcomes in those model, which is our approximate of that complex model.

So it helps to give you a very easy to explain outcome reasons. So, for example, if the same use case of a loan approval is run on the line, it will say that the age and the credit score led to the impacted it negatively, while the debt impacted it positively. Okay. Less depth, so it was more positive for them. It is not giving any such course to it. So that's the reason why lime is more for your local usage. It helps you to use it, for your POCs. While SAP is more auditable because it takes into account every feature, calculates the average contribution across all combinations, and gives you a score around it. So far with me, everybody? Great. I think I'm already going time, so let me just move forward. Okay. So I quickly close. What is so how does it impact?

Explainability helps us. How do we how do we take accountability of our actions. For example, if there is any kind of bias, explainability helps us to detect and fix it. So having a targeted correction method, thereby improving our accountability and making it more responsible. It also helps in fairness because it helps us to detect what features led to this decision. For example, again, there's IBM's three six AI to fairness three sixty toolkit. It actually it uses explainability to identify where is the model going biased with respect to demographic groups, thereby enabling targeted correctness. And finally, it helps the developers to detect and debug very easily.

For example, if the developer know that this feature or for example, if I talk about the imaging, you know, in you know, an AI image. So case so if in case of a imaging if I have a image, okay, medical imaging, and I have I run the model, and for some reason, the model has identified a benign spot as cancerous, explainability will help us identify which features in that image led to this decision making. And once that is identified, the model could be improved. So it not only helps us to bias mitigation, but also for targeted improvement. Thereby, improving our efforts that is optimizing our efforts and as well as increasing our reach. Finally, I'll go into responsible AI. The responsible AI is no longer an act luxury. It is a strategy imperative. All AI products has to be ethical from grounds up, and it should address bias and have inclusive datasets in its training workflows.

It has to have explainability ingrained into each of its it's in each of its outcomes because that would provide a clear observability from the user end's perspective. And privacy is a non negotiable thing. Federated learning and differential privacy are techniques that help us to build intelligence without compromising on the user data. Finally, we cannot ignore regular trees. For example, India's DPD, so that is digital protect digital personal protection act and UAI act. These acts has to be kept in mind while we're designing our AI products. Else, we will have long term risk. And finally, is the community. So the point is, the question that we need to ask today, who will this AI benefit? Responsible AI development means uplifting the society, be that in education, health care, in your account accessibility, or in your climate change.

So I'll end this session stating, please think that when you're building your product, it is not about building what's possible. It is building what's right. So with this, I would like to end my session. I'd like to know if there's any questions. And I've got my address in here. You can connect me through Gmail, or you can also connect me on LinkedIn. Yeah. So thanks, Rebecca, for asking this question. SHAP and Lime are two tools that you recommend to incorporate. So these are based any one of the two you can use. So when you build your AI, product and you're ready to move into production, you should use shape or lime depending on the use case that you have. And you can then analyze whether the solution is working as expected.

So if we know if it is not ex if it is not performing as expected, then what features led to that outcomes? And then we can fix it. It is absolutely important that we use either one of the two before we move to production because that was gives trust and confidence. Mentioning that you have used one of these tools, which are actually these are actually techniques, and shape and line are actually libraries, technically, if I have to speak. And if you give the score that they have generated, it gives more confidence to your customers when they are starting using your products.