Session: Cutting LLM Inference Costs by 50–90%: Introduction to Caching in AI systems
Enterprise AI systems are rapidly encountering scaling challenges rising costs, slower responses, and increasing complexity from long-context and multimodal inputs. Caching has emerged as one of the most effective strategies to address these issues, delivering dramatic improvements in both performance and cost efficiency.
In this session, we explore how modern caching techniques ranging from model-level KV caching to prompt and semantic caching are transforming LLM system design. Through real-world examples and vendor benchmarks, we demonstrate how organizations are achieving up to 90% cost savings and significant latency reductions.
We’ll also cover how these techniques can be implemented and integrated into existing AI pipelines, along with best practices for monitoring, evaluation, and production readiness.
Attendees will gain a clear understanding of where caching delivers the highest ROI and how to apply it effectively in enterprise AI environments.
Bio
Sowmya Podila is a Senior Applied AI Scientist currently with Target and has a decade of experience in AI/ML with organizations such as AWS and Gartner. She led the widely recognized TrendBrain initiative at Target, featured in RetailDive and CNBC for leveraging AI for fashion trend analysis to elevate the style and design of their Owned brands.
Beyond her industry work, Sowmya is an AI advisor to not-for-profits, Program Chair@RecSys 2026, IEEE Senior member, IEEE Access reviewer and an active voice in the AI community. She creates content to share practical insights and emerging trends in artificial intelligence, runs a LinkedIn-based mini podcast series exploring AI applications across sectors and an AI event host (Hosted Generative AI Summit, DC 2026).
Outside of her professional pursuits, Sowmya is a new mom and an avid travel enthusiast.
Connect with her:
LinkedIn: https://linkedin.com/in/sowmyapodila
Instagram: @indigirl.ai