Audiovisual integration of emotional and linguistic signals in voice and face
Laura S. Copeland, Shari R. Baum, Vincent L. Gracco
Poster
Time: 2009-06-29 11:00 AM – 12:30 PM
Last modified: 2009-06-04
Abstract
Purpose: Speech prosody serves numerous functions, including conveying information regarding meaning as well as emotion or affect. In addition during audiovisual speech processing, prosody interacts with visual cues to enrich and inform the meaning behind the utterance. Although there has been a surge of interest in examining the integration of auditory and visual cues in speech (phonetic) perception (Calvert & Campbell, 2003; Calvert, Campbell & Brammer, 2000) and in the perception of emotion (Ethofer et al., 2006; Kreifelts et al., 2007; Pourtois et al., 2005) there has been little direct comparison of the neural substrate for linguistic and affective processing. In the current report, we examine the neural substrate associated with unisensory and multimodal integration of cues in linguistic and affective judgments.
Methods: Ten healthy volunteers (5 males) were presented with semantically neutral sentences expressing affective or linguistic prosody solely through the use of non-verbal cues (intonation, facial expressions) while undergoing fMRI. The sentences were presented under auditory, visual, as well as audio-visual conditions. The emotional prosody task required participants to identify the emotion of the utterance (happy or angry) while the linguistic prosody task required participants to identify the type of utterance (question or statement).
Results: Affective and linguistic processing appear to rely on a common neural substrate for both unisensory and multisensory modalities. The multisensory network included bilateral occipital areas, multiple bilateral areas on the superior temporal gyrus (STG), the supramarginal gyrus, the right superior temporal sulcus (STS), bilateral fusiform region, and the pre-supplementary motor area. Within this common network, affective processing resulted in increased areas of activation in the pre-SMA, bilateral fusiform region, the left inferior occipital region, the caudomedial and lateral portions of the posterior STG (pSTG), as well as an area around the right STS. In contrast, linguistic processing resulted in increases in activation on the right lateral pSTG, left middle STG, the left superior temporal plane and the right inferior occipital region. Two areas in the right hemisphere (middle frontal gyrus and the lower border of the inferior frontal gyrus) were only activated for judging affect. Unisensory differences were found in an area around the supramarginal gyrus (SMG) in the left hemisphere associated with the affective judgment in both visual and auditory modalities while only the auditory modality activated the SMG for the linguistic judgment.
Conclusions: The model of speech prosody processing that emerges is, for the most part, one of overlapping bilateral activation for affective and linguistic prosody, with the strength of activation modulated by task demands and modality of presentation.
References
Calvert, G. A. & Campbell, R. (2003). Reading speech from still and moving faces: The neural substrates of visible speech. Journal of Cognitive Neuroscience, 15, 57-70.
Calvert, G. A., Campbell, R. & Rammer, M. J. (2000). Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Current Biology, 10, 649-657.
Ethofer, T., Pourtois, G. & Wildgruber, D. (2006). Investigation audiovisual integration of emotional signals in the human brain. Progress in Brain Research, 156, 345-361.
Kreifelts, B., Ethofer, T., Grodd, W, Erb, M. & Wildgrubber, D. (2007). Audiovisual integration of emotional signals in voice and face: An event-related fMRI study. NeuroImage, 37, 1445-1456.
Pourtois, G., de Gelder, B., Bol, A. & Crommelinck, M. (2005). Perception of facial expression and voices and of their combination in the human brain. Cortex, 41, 49-59.
Methods: Ten healthy volunteers (5 males) were presented with semantically neutral sentences expressing affective or linguistic prosody solely through the use of non-verbal cues (intonation, facial expressions) while undergoing fMRI. The sentences were presented under auditory, visual, as well as audio-visual conditions. The emotional prosody task required participants to identify the emotion of the utterance (happy or angry) while the linguistic prosody task required participants to identify the type of utterance (question or statement).
Results: Affective and linguistic processing appear to rely on a common neural substrate for both unisensory and multisensory modalities. The multisensory network included bilateral occipital areas, multiple bilateral areas on the superior temporal gyrus (STG), the supramarginal gyrus, the right superior temporal sulcus (STS), bilateral fusiform region, and the pre-supplementary motor area. Within this common network, affective processing resulted in increased areas of activation in the pre-SMA, bilateral fusiform region, the left inferior occipital region, the caudomedial and lateral portions of the posterior STG (pSTG), as well as an area around the right STS. In contrast, linguistic processing resulted in increases in activation on the right lateral pSTG, left middle STG, the left superior temporal plane and the right inferior occipital region. Two areas in the right hemisphere (middle frontal gyrus and the lower border of the inferior frontal gyrus) were only activated for judging affect. Unisensory differences were found in an area around the supramarginal gyrus (SMG) in the left hemisphere associated with the affective judgment in both visual and auditory modalities while only the auditory modality activated the SMG for the linguistic judgment.
Conclusions: The model of speech prosody processing that emerges is, for the most part, one of overlapping bilateral activation for affective and linguistic prosody, with the strength of activation modulated by task demands and modality of presentation.
References
Calvert, G. A. & Campbell, R. (2003). Reading speech from still and moving faces: The neural substrates of visible speech. Journal of Cognitive Neuroscience, 15, 57-70.
Calvert, G. A., Campbell, R. & Rammer, M. J. (2000). Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Current Biology, 10, 649-657.
Ethofer, T., Pourtois, G. & Wildgruber, D. (2006). Investigation audiovisual integration of emotional signals in the human brain. Progress in Brain Research, 156, 345-361.
Kreifelts, B., Ethofer, T., Grodd, W, Erb, M. & Wildgrubber, D. (2007). Audiovisual integration of emotional signals in voice and face: An event-related fMRI study. NeuroImage, 37, 1445-1456.
Pourtois, G., de Gelder, B., Bol, A. & Crommelinck, M. (2005). Perception of facial expression and voices and of their combination in the human brain. Cortex, 41, 49-59.