Session: Who Gets to Sound Natural? Accent, Tone, and Bias in Voice AI Systems
Voice technologies increasingly shape access to information, services, and opportunities, from automated captions and voice assistants to speech recognition in workplaces and public institutions. Yet many of these systems implicitly assume a “default” way of speaking, often privileging Western accents and non-tonal speech patterns.
In this talk, I examine how design and modelling choices in modern voice AI systems influence who is understood, who is misinterpreted, and who is excluded. Drawing on research in speech representation learning and real-world deployment examples, I show how accent variation, tonal languages, and data imbalance affect system performance, often in ways that remain invisible until deployment.
Rather than framing these challenges as purely data limitations, the session reframes them as representation and evaluation problems. I outline practical strategies for building more inclusive voice AI systems, including better evaluation practices, representation-aware modelling decisions, and design principles that centre linguistic diversity from the outset.
This session is for AI practitioners, engineers, researchers, and product leaders who want to build voice technologies that work for global users, not just the default ones.
Bio
Opeyemi Osakuade is a PhD researcher in speech and language processing at the University of Edinburgh. Her work focuses on how modern speech and voice AI systems represent tone, accent, and linguistic variation, with particular attention to low-resource and tonal languages. She researches speech representation learning, evaluation methodologies, and bias analysis in voice technologies, aiming to bridge technical model design with real-world impact and more inclusive AI systems.