Observability for GenAI Apps by Shirsha Ray

Shirsha Ray Chaudhuri
Director, Research Engineering ,

Reviews

0
No votes yet
Automatic Summary

Understanding Observability in Generative AI Solutions

In the fast-evolving world of technology, especially within generative AI, having a robust understanding of observability is crucial for engineers and stakeholders alike. This article shares key learnings from the Women in Tech Global Conference, focusing on the importance of observability in AI solutions, ethical considerations, challenges faced, and future trends.

What is Observability?

Observability refers to the degree to which the internal states of a system can be inferred from knowledge of its external outputs. In the context of generative AI, observability encompasses:

  • Tracing
  • Debugging
  • Monitoring
  • Feedback collection

Effective observability in AI solutions is essential for managing datasets, evaluating performance, tracking user interactions, and prompt management throughout the generative AI life cycle—from proof of concept (POC) to post-production support.

Key Observability Requirements for AI Solutions

During various stages of building generative AI solutions, observability should ensure:

  • For POC: Focus on tracing, debugging, and monitoring conversational sessions.
  • During Testing: Continue tracing and debugging; incorporate evaluation through automated tests or user feedback.
  • In Production: Track user feedback, manage datasets, and ensure operational efficiency.

Understanding customer satisfaction and operational efficiency is paramount. Monitoring should evolve from improving customer delight during initial phases to maintaining system health and security post-launch.

Common Gaps in Observability

While building AI solutions, several key observability gaps often surface, including:

  • Failure to capture per-user interactions and token usage, which are vital for scaling solutions.
  • Lack of observability in managing datasets for performance improvement.
  • Inability to combine tracing data with user feedback, inhibiting a comprehensive understanding of application performance.

Ethical Considerations in Monitoring AI Solutions

When observing AI systems, ethical considerations are paramount. It's essential to monitor:

  • Fairness and Bias: Ensure solutions provide consistent answers regardless of user demographics.
  • Security: Watch out for vulnerabilities such as prompt injection attacks.
  • Data Privacy: Maintain customer confidentiality and protect sensitive information.

Building trust in AI systems requires extending observability tools into feedback classification and facilitating collaboration with users.

Challenges of AI Observability Tools

Several challenges can hinder effective observability in AI systems, such as:

  • Complexity of models and systems, making it difficult to collect and interpret data.
  • Fragmentation of the observability toolchain.
  • Difficulty correlating data across disparate systems due to silos and skill gaps.

Essential Observability Aspects for Generative AI

To enhance observability in generative AI, focus on:

  1. Latency, Throughput, and Cost: Core parameters that ensure system reliability.
  2. Real-time Feedback: Crucial for iterative improvement of performance and resource utilization.
  3. Alert Classification: Distinguish between critical security concerns and routine issues.
  4. Scenario Tracing: Ability to trace across different submodules for comprehensive monitoring.

Recommended Tools for Observability

Here are some observability tools that professionals in the field have found useful:

  • Datadog
  • New Relic
  • Langfuse

Engaging with your peers to exchange insights about effective tools and best practices can enhance the overall efficiency of your observability strategy.

Conclusion

As generative AI continues to advance, understanding and implementing robust observability is essential for ensuring effective performance management, ethical considerations, and continuous improvement. By addressing the challenges and leveraging the right tools, organizations can create efficient and reliable AI-driven solutions that meet user needs and exceed expectations.

What observability tools have you found effective in your work? Let's collaborate and share our experiences to enhance the future of AI


Video Transcription

What I'm looking to share with you today is some of my learnings from standing up solutions which embed the Gen AI tech stack.And, as part of doing this, I welcome suggestions, recommendations from you in terms of tooling and capabilities that you use, perhaps in house solutions that you've built, and we could exchange ideas. What I'm presenting to you is what I have missed and, you know, the challenges with, you know, missing some of these, observability tools with Gen AI apps. Right? Very briefly about me, this is my second time presenting at the Women in Tech Global Conference. I'm super excited to be sharing, with all of you today, a topic which is close to my heart as an engineer. What do we need to observe the performance of solutions with the generated AI stack? Right?

What we will be doing, very quickly over the next fifteen minutes is lay down a basic construct of what is observability, and then, you know, maybe lead into what is observability for AI solutions. And then we will talk about what are we missing. If time permits, I could share a few case studies. And finally, we will look at ethical considerations, gaps, and future trends. Right? What do we what do we mean by observability? Let's let's see if we can all arrive at, you know, the same, definition for this. Observability refers to the, the the tracing, the debugging, the monitoring, the feedback collection of solutions into this because there's an element of the generative AI stack. We want to be able to manage datasets that have been used. We wanna be able to evaluate. We wanna have, manage conversational sessions and, of course, the prompt management.

Without observability, debugging applications, understanding how do we monitor them, tracking latency, not just with respect to the large language model, but even otherwise, managing user prompts and performing evaluations is really challenging. Observability is crucial throughout the generative AI life cycle right from the POC and the research phase to once the solution is, tested and, deployed in production, and we wanna understand the postproduction support. Let's understand how should observability be any different with, generative AI, though. Right? At the time of, you know, when we're doing the proof of concept, we definitely want, to see the tracing. We wanna see the debugging. We wanna see, the conversational sessions and the monitoring. And when we are testing, right, when we are integrated it with the rest of the solution stack, we want to continue to see the tracing, the debugging, the conversational sessions, as well as, the prompt management.

Right? But when we are also testing, we because perhaps we are using automated tests or we are using subject matter experts or beta users, we probably also want to, extend our observability to have elements of evaluation. And finally, what is the minimum viable product, that is getting built? We want to, hence, get the user feedback as well. When the solution is eventually moved to general availability, apart from, you know, tracing, debugging, the conversational aspect, the monitoring aspect, the user feedback, we also want to collect datasets, which we could then use to understand how we can improve the solution.

At the time of building the POC, we are starting with improving the customer delight, and, hence, all our monitoring and observability requirements are are around improving that element. But when the solution is live in production, the observability has a larger role to play, and we want to create a solution which is operationally efficient. I thought I could share with you videos of what observability tools look like, but I will skip that given our lack of time today. Instead, switch to, you know, where did I miss observability when I was building my solutions? The need to capture users per users' token usage, their feedback, and, you know, what experience that they are having with the API at a per user level. And why is this important? This is important for us to understand how do we scale the solution then. So to plan, observe plan and eventually to scale.

And, of course, this also helps us understand what are those points of optimization. I also missed observability for managing datasets between testing, referencing, iterating, and collecting one from the live environment to, to improve the performance. And oftentimes, we don't talk about managing datasets as an element of observability, but perhaps it is because it helps us debug and improve performance. Combining traces with user feedback. Not too many observability solutions, APM solutions out there allow us to combine traces with user feedback because user feedback is something which is, at an application level, and sometimes tracing sits in different, ecosystems. And finally, prompt management. Prompt management may or may not have to may not be needed to be combined with observability, but it allows us to simulate, you know, causes for failure.

Let's look at a few ethical considerations. When you're monitoring AI solutions, fairness and bias are, you know, pillars of the responsible AI ecosystem that need to be monitored. Is my solution giving different answer based on the gender of the user? Or, just because it interpreted, values differently, is it giving different results? The security and vulnerability of the solution, also needs to be monitored, prompt injection attacks. I'm sure it's not a term that, there's anyone around who hasn't heard about it. And then, of course, there are data driven improvements. You want to look at, you know, your performance outcomes, profile usage, and then look at where you want to bring in optimizations, or, do you want to, stage datasets and answers differently so as to be able to give, you know, better answers. So there are ethical considerations which keep the AI algorithm in mind, and there are ethical considerations which keep the human that is using the system in mind.

You want to ensure that you're building trust in the system, and so you want to evaluate feedback from that point of view. And, hence, observability tools extending into feedback classification will definitely help. Sometimes, observability tools also helps us collaborate and build solutions together with the user. So there's this element of collaboration. And finally, handling data privacy and audit. As we investigate issues, we definitely want customers' information to be, masked. Let's look at, certain challenges that come up when we are considering observability tools for AI solutions. The model complexity and, therefore, the system complex city, is manifold, therefore, causing, a concern in terms of, like, you know, allow not allowing us to transparently investigate or collect information. The interpretability of the solution, the flawed feedback mechanisms also sometimes give challenges to correlate behavior across systems.

The tools the tools the toolbox the tool chain that we have to use for observability is also highly fragmented. So you, you you remember in the previous slide, we were talking about tracing your, you know, to be able to, correlate between the traces, the logs, and the performance, and the user feedback is a very important aspect of, monitoring AI solutions. And so having a consistent platform where all of this can be, collected to then analyze is, somewhat crucial. Otherwise, you it's you know, correlating between systems becomes very tricky. You also don't want alerts for, too many trivial issues. Therefore, you know, the important ones getting classified as noise. And finally, there are silos and skill gaps which exist between teams. And, therefore, what we put as system requirements when we are going live with the Gen AI solution keeps changing.

And finally, I think I want to kind of distill the information that we've collected so far into into some key observability aspects for generative AI solutions. The latency, throughput, and cost are primary. They are the most important aspects, like, you know, they are your uptime. The real time feedback allows us to improve the performance and the resource, hand in hand with the resource utilization. Right? Like, the token usage per user that, we were talking about. And then comes your p three is you you want to classify your alerts. You want to look at the security and quality aspects. By no means are we saying security is less important than latency. But what what we're trying to say is it's more important to keep a system up and then, look at security concerns and prompt injection attacks.

And finally, your p four is to be able to trace scenarios across different, submodules in the AI system. Few tools that I have used, Datadog. I've seen, you know, demos of New Relic, Langfuse, as well as. I'm very happy to, network with all of you and understand if there are other tools that you would recommend in this space. Thank you.