Auditory and Visual Conditions that make Speech a Multisensory Event

Kevin G. Munhall, Queen's University, Canada

Abstract
It has been recognized for many years that viewing the face during auditory speech perception can influence phonetic categorization. The intelligibility of degraded auditory speech is enhanced when listeners view a talker's face movements. Watching these face movements can also influence the perception of perfectly audible speech or be the sole basis of speech perception. In this talk I will present data about the perceptual processes involved in the McGurk effect, the perception of speech in noise and speechreading. In a series of studies I will show that the visual information used for speech perception is matched to the characteristics of speech production (low temporal and spatial frequency, low dimensionality and with separable prosodic and segmental streams). Using data from eye tracking studies and spatial-frequency filtered images we have found that the low to mid-range spatial frequency information is sufficient for increases of intelligibility of speech in noise and for demonstrating the McGurk effect. Further, our studies of eye tracking indicate that high-resolution information may not always be processed because of gaze tendencies. On the other hand, results from visual-only, speechreading shows different patterns with specific phonemes showing perceptual benefits from subtle information available at higher spatial frequencies. Using these results and the lessons we have learned from our work creating facial animation, I will discuss the interaction between the visual and auditory modalities in speech perception and the linkage between speech production and perception.

Not available

Back to Abstract