Session: How I Turned Vendor Outages Into Innovation: A Framework for Resilience
Cloud outages are viewed as disruptions, but in reality, they can be powerful catalysts for innovation. I share how repeated vendor outages pushed me to rethink reliability engineering and build a proactive resilience framework that transformed the way our systems-and teams-respond to failures.
Drawing from real incidents across major cloud providers, I break down the patterns behind vendor tool failures, DNS disruptions, and service degradations, and explain how I designed automation and detection strategies that substantially reduced operational noise, accelerated incident response, and improved system stability. External service-health monitoring, automated escalation workflows, smarter alerting, and event-driven remediation are some of the practical techniques that I will walk through.
This session blends technical depth with real-world lessons learned, focusing on how unexpected outages became opportunities to innovate, build confidence, and drive reliability culture across teams. You'll leave with actionable strategies for designing resilient cloud-native architectures, and a mindset to turn challenges into leadership moments in your own career.
Bio
I am a DevOps and Cloud Infrastructure engineer specializing in SRE, incident management, and cloud-native automation. My work focuses on building reliable, scalable systems across AWS, Azure, and GCP, with deep experience in Kubernetes, Terraform, CI/CD pipelines, and observability platforms.