Regularization improves models of audiovisual integration in speech perception
Last modified: 2013-05-05
Abstract
Visual speech, the speech information mediated by the sight of articulatory mouth movements, can influence the auditory phonetic speech percept. This is demonstrated by the McGurk illusion, where an acoustic speech signal (e.g. /ba/) is perceived differently (as /da/) when perceived audio-visually, dubbed onto the video of an incongruent talking face (articulating /ga/).
A computational account of the integration of information across the senses underlying the McGurk illusion has long been sought. One account, the Fuzzy Logical Model of Perception (FLMP, Massaro, 1998) posits that integration is based on fuzzy truth-values. Here we present an alternative accounts in which integration is based on continuous feature values.
We show that such models can provide a better fit to observed data than the FLMP. In order to take model-flexibility into account, we cross-validate model fits and show that although feature based models have more predictive power than the FLMP, both types of models perform rather poorly. Finally, we show that the predictive power of both types of models improve when the models are regularized by Bayesian priors and that after regularization, feature based models have significantly better predictive power than the FLMP.
A computational account of the integration of information across the senses underlying the McGurk illusion has long been sought. One account, the Fuzzy Logical Model of Perception (FLMP, Massaro, 1998) posits that integration is based on fuzzy truth-values. Here we present an alternative accounts in which integration is based on continuous feature values.
We show that such models can provide a better fit to observed data than the FLMP. In order to take model-flexibility into account, we cross-validate model fits and show that although feature based models have more predictive power than the FLMP, both types of models perform rather poorly. Finally, we show that the predictive power of both types of models improve when the models are regularized by Bayesian priors and that after regularization, feature based models have significantly better predictive power than the FLMP.
Keywords
audiovisual; speech perception; modeling
References
Massaro, D. W. (1998). Perceiving talking faces. Cambridge, Massachusetts: MIT Press.