Sowmya Podila

Senior Applied AI Scientist at Fortune 50 Retail

"Cutting LLM Inference Costs by 50–90%: Introduction to Caching in AI systems"

Wed May 13 - 12:10 PM EDT/New York (See in local time)

Add to Calendar 05/13/2026 12:10 PM 05/13/2026 12:30 PM America/New_York #WTGC2026

"Cutting LLM Inference Costs by 50–90%: Introduction to Caching in AI systems"

#WTGC2026

"Cutting LLM Inference Costs by 50–90%: Introduction to Caching in AI systems"

https://www.womentech.net/ringcentral https://www.womentech.net/ringcentral

Get Tickets

Don’t miss out and join visionaries, innovators, and thought leaders from all over the world at the Women in Tech Global Conference.

Vote by Sharing

Unite 100 000 Women in Tech to Drive Change with Purpose and Impact.

Do you want to see this session? Help increase the sharing count and the session visibility. Sessions with +10 votes will be available to career ticket holders.
Please note that it might take some time until your share & vote is reflected.

Session: Cutting LLM Inference Costs by 50–90%: Introduction to Caching in AI systems

Enterprise AI systems are rapidly encountering scaling challenges rising costs, slower responses, and increasing complexity from long-context and multimodal inputs. Caching has emerged as one of the most effective strategies to address these issues, delivering dramatic improvements in both performance and cost efficiency.

In this session, we explore how modern caching techniques ranging from model-level KV caching to prompt and semantic caching are transforming LLM system design. Through real-world examples and vendor benchmarks, we demonstrate how organizations are achieving up to 90% cost savings and significant latency reductions.

We’ll also cover how these techniques can be implemented and integrated into existing AI pipelines, along with best practices for monitoring, evaluation, and production readiness.

Attendees will gain a clear understanding of where caching delivers the highest ROI and how to apply it effectively in enterprise AI environments.

Key Takeaways

Optimizing AI pipelines

Bio

Sowmya Podila is a Senior Applied AI Scientist currently with Target and has a decade of experience in AI/ML with organizations such as AWS and Gartner. She led the widely recognized TrendBrain initiative at Target, featured in RetailDive and CNBC for leveraging AI for fashion trend analysis to elevate the style and design of their Owned brands.

Beyond her industry work, Sowmya is an AI advisor to not-for-profits, Program Chair@RecSys 2026, IEEE Senior member, IEEE Access reviewer and an active voice in the AI community. She creates content to share practical insights and emerging trends in artificial intelligence, runs a LinkedIn-based mini podcast series exploring AI applications across sectors and an AI event host (Hosted Generative AI Summit, DC 2026).

Outside of her professional pursuits, Sowmya is a new mom and an avid travel enthusiast.

Connect with her:
LinkedIn: https://linkedin.com/in/sowmyapodila
Instagram: @indigirl.ai

Sowmya Podila

Senior Applied AI Scientist at Fortune 50 Retail

"Cutting LLM Inference Costs by 50–90%: Introduction to Caching in AI systems"

Vote by Sharing

Session: Cutting LLM Inference Costs by 50–90%: Introduction to Caching in AI systems

Key Takeaways

Bio

Don't miss out on the latest Women in Tech events, updates and news!

Powered By

Women in Tech Network

Women in Tech Conference

Tech Women Impact Globally

Follow us

Sowmya Podila

Senior Applied AI Scientist at Fortune 50 Retail

"Cutting LLM Inference Costs by 50–90%: Introduction to Caching in AI systems"

Vote by Sharing

Session: Cutting LLM Inference Costs by 50–90%: Introduction to Caching in AI systems

Key Takeaways

Bio

Don't miss out on the latest Women in Tech events, updates and news!

Powered By​​​​​​​

Women in Tech Network

Women in Tech Conference

Tech Women Impact Globally

Follow us

Powered By