CROSSMODAL PREDICTION DURING PERCEPTION OF AUDIOVISUAL SPEECH

carolina sanchez, agnès alsius, james T Enns, salvador soto-faraco
Poster
Time: 2009-06-30  09:00 AM – 10:30 AM
Last modified: 2009-06-04

Abstract


CROSSMODAL PREDICTION DURING PERCEPTION OF AUDIOVISUAL SPEECH

Carolina Sánchez1, Agnès Alsius2, James T.Enns3 & Salvador Soto-Faraco 1, 2, 4

1 CBC, Dept. de Tecnologies de la Informació i les Comunicacions, UPF (Barcelona), Spain ; 2 Dept. de Psicología Básica, Universidad de Barcelona, Spain ; 3 Department of Psychology, University of British Columbia’s (Vancouver) , Cànada ; 4 Institució Catalana de Recerca i Estudis Avançats ( ICREA) & Dept. de Tecnologies de la Informació i les Comunicacions, UPF (Barcelona), Spain

Information about past events can be used to make predictions about what is coming next. For instance, human perception capitalizes on this type of predictive coding to speed up information processing in a variety of domains, including vision and speech perception. The present study addressed whether predictive coding can occur across sensory modalities. In particular, we hypothesized that input in one sensory modality (i.e. an auditory speech stream) might contribute to predictions about upcoming events in a different sensory modality (i.e. the video of a speaker’s face). To test this we used an audio-visual speech-matching task, in which observers made speeded classifications of spoken sentence fragments as either matching or mismatching the moving face seen in a video. We provided prior context to the combined audio-visual fragments using only one of the modalities (i.e., lead-in was audio or video alone) so that the test fragment could be experienced as a continuation of the lead-in. We tested for predictive abilities in both auditory-to-visual and visual-to-auditory directions. The results supported the existence of both within and cross-modal predictions. However, asymmetries were also observed, such that predictions based on audio information alone impaired performance in the speeded matching task whereas predictions based on video information alone lead to improvements.

Conference System by Open Conference Systems & MohSho Interactive Multimedia