Session: Model Calibration: The Hidden Key to Trustworthy AI
In high-stakes domains like finance and healthcare, getting the right answer isn't enough—your AI system needs to know how confident it should be. This presentation explores model calibration, the critical but often overlooked bridge between statistical predictions and real-world decision-making.
We'll examine why a model with 95% accuracy can still cause catastrophic harm when its probability estimates are unreliable. Through concrete examples from credit risk management, fraud detection, clinical decision support, and cancer screening, attendees will understand how miscalibration leads to billions in financial losses, unnecessary medical interventions, and life-threatening delays in treatment.
The talk covers:
Core Concepts: Understanding calibration vs. accuracy vs. discrimination, and why all three matter
Visualization Techniques: Reading reliability diagrams and interpreting calibration curves
Measurement Metrics: Brier Score, Expected Calibration Error (ECE), and Maximum Calibration Error (MCE)
Real-World Impact: Case studies from Basel-compliant credit risk models, fraud detection systems, sepsis prediction, and hospital readmission forecasting
Practical Implementation: Step-by-step Python code examples using scikit-learn, with before/after comparisons showing dramatic improvements (ECE reduction from 0.184 to 0.012)
Calibration Techniques: When to use Platt Scaling, Isotonic Regression, or Temperature Scaling, with pros/cons for each approach
Production Best Practices: Data splitting strategies, monitoring for calibration drift, stratified fairness checks, and recalibration schedules
Attendees will leave with actionable knowledge to audit their existing models for calibration failures, implement calibration fixes in production systems, and establish monitoring frameworks to maintain calibration over time. Whether you're deploying ML for regulatory compliance, clinical decisions, or customer-facing applications, this presentation provides the tools to build AI systems that know what they don't know.
Target Audience: Data scientists, ML engineers, risk managers, healthcare AI practitioners, and anyone deploying machine learning in regulated or high-stakes environments.
Key Takeaway: Discrimination tells you who is at risk. Calibration tells you how much risk there actually is. In finance and healthcare, you need both.
Bio
Swati Tyagi is an AI/ML leader and researcher specializing in responsible AI, generative AI, and data-driven decision systems for highly regulated industries. With a PhD in Statistics and extensive industry experience, she has led impactful work in bias mitigation, model evaluation, and large-scale AI deployment. Swati is an active speaker, mentor, and community builder, contributing to global tech forums, academic research, and professional communities to advance ethical and trustworthy AI.