Having a phone conversation with an AI-powered robot is a common and frustrating experience — particularly for people who stutter or have other diverse patterns of speech.
To address the poor performance of speech AI systems for this community, a team of researchers is studying and seeking solutions to the technical and design challenges of speech AI, as well as the policies and norms that surround this reinforcement of the existing difficulties associated with variations in speech patterns.
The team will hold a workshop on Speech AI for All at the Association of Computing Machinery (ACM) CHI conference on Human Factors in Computing Systems, the premier international conference of Human-Computer Interaction. The group is led by Shaomei Wu, CEO and founder of tech non-profit AImpower.org and a person who stutters, and includes UC Santa Cruz Associate Professor of Computational Media Norman Makoto Su. This workshop is part of an ongoing project supported by the National Science Foundation to destigmatize speech differences in speech AI systems.
“AI is not accommodating,” Wu said. “There’s a huge need to think about why they are designed and trained that way, and how we can make them better. This is not purely a technical problem, it has a lot of root causes embedded with normative expectations of what human speech should be like.”
Varied speech
Speech AI, which can be found in smartwatches, in-car navigation, and more, can be difficult or impossible to use for people with varied speech because they are designed based on assumptions about human speech that do not include the full range of possible patterns.
“A lot of these speech AI systems are trained and optimized for more typical speech patterns, so they have a lot of trouble processing or working with people with more diverse patterns of speech, including but not limited to people who stutter,” Wu said. “This is both based on my lived experience with those technologies, but also based on AImpowers’s engagement and research with the stuttering community as a whole. We’ve heard a lot of complaints, especially now as these kinds of technology have been deployed widely.”
AImpower.org, which creates empowering technologies for, with, and by marginalized communities, first began collaborating with the other researchers at a research summit held by Apple, where they gave a presentation on their work with speech diversity. In attendance was Raja Kushalnagar, a professor of Information Technology at Gallaudet University who is deaf, who brought a perspective as a member of a community deeply impacted by challenges with speech-based technologies. At UC Santa Cruz, Su’s lab focuses on human-computer interaction and how to design technologies for communities who are often marginalized in tech — a natural fit for this project.
This motivated a working group to come together to tackle these challenges in a holistic manner, and hold the upcoming workshop at CHI. This conference offers a productive setting to address this issue from a multitude of perspectives, as it is one of the most interdisciplinary conferences of the ACM, with speakers beyond just the technical field.
Finding solutions
At the workshop, participants will explore the technical aspects of making speech AI more accessible, as well as related policies and norms. The organizers aim to ensure people impacted by issues in speech technology have an active voice in the workshop. The solutions they create often differ from options introduced by developers and designers outside of the community, who may be well-intentioned but can reinforce normative expectations about fluency with their solutions. The sessions will include a community panel for those impacted to share their stories and their goals for what they want the technology to be.
On the technical side, they plan to discuss what kind of data can be used to train and test those models so they can recognize diverse speech patterns. In AI architecture, they plan to explore how parameters within the models can help be set in a certain way to accommodate varied cadences of speech.
They also will discuss how grassroots organizations might mobilize the impacted communities to take these issues into their own hands.
“Most of AI is controlled by big companies with lots of resources, money, and computing power, so how can grassroots movements build their own models that are more accessible and inclusive? We are trying to understand the mechanisms of that, and how online communities and cultures can enable that,” Su said.
The workshop will also explore how policy and norms might be shifted to avoid ostracizing varied speech. For example, there are currently no laws to require performance parity of speech AI models before they can be used in government sites. On the advocacy side, they are interested in how people outside of the community can notice these accessibility challenges and intervene.
After the workshop, the group plans to produce a white paper outlining potential solutions, and continue to facilitate a research community to foster inter-institutional collaboration
with regular meetups and discussions.
“We're hoping this will be just the beginning of a long term collective.” Wu said. “Everyone experiences speech disfluencies at times. Our pauses, hesitations, and silences can carry meanings as much as our words do. Together, we can build AI that understands and embraces these disfluent moments, rather than rejecting them.”
Workshop aims to create speech AI for all
