Neural architecture of auditory object categorization
Yune-Sang Lee, Michael Hanke, David Kraemer, Samuel Lloyd, Richard Granger

Last modified: 2011-09-03

Abstract


We can identify objects by sight or by sound, yet far less is known about auditory object recognition than about visual recognition. Any exemplar of a dog (e.g., a picture) can be recognized on multiple categorical levels (e.g., animal, dog, poodle). Using fMRI combined with machine-learning techniques, we studied these levels of categorization with sounds rather than images. Subjects heard sounds of various animate and inanimate objects, and unrecognizable control sounds. We report four primary findings: (1) some distinct brain regions selectively coded for basic ("dog") versus superordinate ("animal") categorization; (2) classification at the basic level entailed more extended cortical networks than those for superordinate categorization; (3) human voices were recognized far better by multiple brain regions than were any other sound categories; (4) regions beyond temporal lobe auditory areas were able to distinguish and categorize auditory objects. We conclude that multiple representations of an object exist at different categorical levels. This neural instantiation of object categories is distributed across multiple brain regions, including so-called "visual association areas," indicating that these regions support object knowledge even when the input is auditory. Moreover, our findings appear to conflict with prior well-established theories of category-specific modules in the brain.

References


Adams, R. B., & Janata, P. (2002). A comparison of neural circuits underlying auditory and visual object categorization. NeuroImage, 16(2), 361-377. Alain, C., Arnott, S. R., Hevenor, S., Graham, S., & Grady, C. L. (2001). "What" and "where" in the human auditory system. Proceedings of the National Academy of Sciences of the United States of America, 98(21), 12301-12306. Altmann, C. F., Doehrmann, O., & Kaiser, J. (2007). Selectivity for animal vocalizations in the human auditory cortex. Cerebral cortex, 17(11), 2601-2608. Belin, P., & Zatorre, R. J. (2003). Adaptation to speaker's voice in right anterior temporal lobe. Neuroreport, 14(16), 2105-2109. Belin, P., Zatorre, R. J., & Ahad, P. (2002). Human temporal-lobe response to vocal sounds. Brain research. Cognitive brain research, 13(1), 17-26. Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-selective areas in human auditory cortex. Nature, 403(6767), 309-312. Blakemore, S. J. (2008). The social brain in adolescence. Nature reviews. Neuroscience, 9(4), 267-277. Blesser, B. (1972a). Speech Perception under Conditions of Spectral Transformation .1. Phonetic Characteristics. Journal of Speech and Hearing Research, 15(1), 5-&. Blesser, B. (1972b). Speech perception under conditions of spectral transformation. I. Phonetic characteristics. J Speech Hear Res, 15(1), 5-41. Buccino, G., Vogt, S., Ritzl, A., Fink, G. R., Zilles, K., Freund, H. J., et al. (2004). Neural circuits underlying imitation learning of hand actions: an event-related fMRI study. Neuron, 42(2), 323-334. Buxbaum, L. J., Johnson-Frey, S. H., & Bartlett-Williams, M. (2005). Deficient internal models for planning hand-object interactions in apraxia. Neuropsychologia, 43(6), 917-929. Damasio, H., Grabowski, T. J., Tranel, D., Hichwa, R. D., & Damasio, A. R. (1996). A neural basis for lexical retrieval. Nature, 380(6574), 499-505. Damasio, H., Tranel, D., Grabowski, T., Adolphs, R., & Damasio, A. (2004). Neural systems behind word and concept retrieval. Cognition, 92(1-2), 179-229. Doehrmann, O., Naumer, M. J., Volz, S., Kaiser, J., & Altmann, C. F. (2008). Probing category selectivity for environmental sounds in the human auditory brain. Neuropsychologia, 46(11), 2776-2786. Engel, L. R., Frum, C., Puce, A., Walker, N. A., & Lewis, J. W. (2009). Different categories of living and non-living sound-sources activate distinct cortical networks. NeuroImage, 47(4), 1778-1791. Farrer, C., Frey, S. H., Van Horn, J. D., Tunik, E., Turk, D., Inati, S., et al. (2008). The angular gyrus computes action awareness representations. Cerebral cortex, 18(2), 254-261. Fecteau, S., Armony, J. L., Joanette, Y., & Belin, P. (2004). Is voice processing species-specific in human auditory cortex? An fMRI study. NeuroImage, 23(3), 840-848. Fenske, M. J., Aminoff, E., Gronau, N., & Bar, M. (2006). Top-down facilitation of visual object recognition: object-based and context-based contributions. Progress in brain research, 155, 3-21. Freedman, D. J., Riesenhuber, M., Poggio, T., & Miller, E. K. (2001). Categorical representation of visual stimuli in the primate prefrontal cortex. Science, 291(5502), 312-316. Freedman, D. J., Riesenhuber, M., Poggio, T., & Miller, E. K. (2003). A comparison of primate prefrontal and inferior temporal cortices during visual categorization. The Journal of neuroscience : the official journal of the Society for Neuroscience, 23(12), 5235-5246. Gauthier, I., Anderson, A. W., Tarr, M. J., Skudlarski, P., & Gore, J. C. (1997). Levels of categorization in visual recognition studied using functional magnetic resonance imaging. Current biology : CB, 7(9), 645-651. Goll, J. C., Crutch, S. J., Loo, J. H., Rohrer, J. D., Frost, C., Bamiou, D. E., et al. (2010). Non-verbal sound processing in the primary progressive aphasias. Brain : a journal of neurology, 133(Pt 1), 272-285. Grafton, S. T., Arbib, M. A., Fadiga, L., & Rizzolatti, G. (1996). Localization of grasp representations in humans by positron emission tomography. 2. Observation compared with imagination. Experimental brain research. Experimentelle Hirnforschung. Experimentation cerebrale, 112(1), 103-111. Griffiths, T. D., & Warren, J. D. (2002). The planum temporale as a computational hub. Trends in neurosciences, 25(7), 348-353. Hackett, T. A., Stepniewska, I., & Kaas, J. H. (1999). Prefrontal connections of the parabelt auditory cortex in macaque monkeys. Brain research, 817(1-2), 45-58. Hanke, M., Halchenko, Y. O., Sederberg, P. B., Hanson, S. J., Haxby, J. V., & Pollmann, S. (2009). PyMVPA: A python toolbox for multivariate pattern analysis of fMRI data. Neuroinformatics, 7(1), 37-53. Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293(5539), 2425-2430. Hickok, G., & Poeppel, D. (2004). Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition, 92(1-2), 67-99. Iacoboni, M., Woods, R. P., Brass, M., Bekkering, H., Mazziotta, J. C., & Rizzolatti, G. (1999). Cortical mechanisms of human imitation. Science, 286(5449), 2526-2528. Johnson, J. S., & Olshausen, B. A. (2003). Timecourse of neural signatures of object recognition. Journal of vision, 3(7), 499-512. Johnson-Frey, S. H., Newman-Norlund, R., & Grafton, S. T. (2005). A distributed left hemisphere network active during planning of everyday tool use skills. Cerebral cortex, 15(6), 681-695. Jolicoeur, P., Gluck, M. A., & Kosslyn, S. M. (1984). Pictures and names: making the connection. Cognitive psychology, 16(2), 243-275. Kaas, J. H., & Hackett, T. A. (1999). 'What' and 'where' processing in auditory cortex. Nature neuroscience, 2(12), 1045-1047. King, A. J., & Nelken, I. (2009). Unraveling the principles of auditory cortical processing: can we learn from the visual system? Nature neuroscience, 12(6), 698-701. Kosslyn, S. M., Ganis, G., & Thompson, W. L. (2001). Neural foundations of imagery. Nature reviews. Neuroscience, 2(9), 635-642. Kriegeskorte, N., Goebel, R., & Bandettini, P. (2006). Information-based functional brain mapping. Proceedings of the National Academy of Sciences of the United States of America, 103(10), 3863-3868. Kriegstein, K. V., & Giraud, A. L. (2004). Distinct functional substrates along the right superior temporal sulcus for the processing of voices. NeuroImage, 22(2), 948-955. Kroliczak, G., & Frey, S. H. (2009). A common network in the left cerebral hemisphere represents planning of tool use pantomimes and familiar intransitive gestures at the hand-independent level. Cerebral cortex, 19(10), 2396-2410. Leaver, A. M., & Rauschecker, J. P. (2010). Cortical representation of natural complex sounds: effects of acoustic features and auditory object category. The Journal of neuroscience : the official journal of the Society for Neuroscience, 30(22), 7604-7612. Lee, Y. S., Janata, P., Frost, C., Hanke, M., & Granger, R. (2011). Investigation of melodic contour processing in the brain using multivariate pattern-based fMRI. NeuroImage. Lewis, J. W., Brefczynski, J. A., Phinney, R. E., Janik, J. J., & DeYoe, E. A. (2005). Distinct cortical pathways for processing tool versus animal sounds. The Journal of neuroscience : the official journal of the Society for Neuroscience, 25(21), 5148-5158. Lewis, J. W., Talkington, W. J., Puce, A., Engel, L. R., & Frum, C. (2010). Cortical Networks Representing Object Categories and High-level Attributes of Familiar Real-world Action Sounds. Journal of cognitive neuroscience. Lewis, J. W., Wightman, F. L., Brefczynski, J. A., Phinney, R. E., Binder, J. R., & DeYoe, E. A. (2004). Human brain regions involved in recognizing environmental sounds. Cerebral cortex, 14(9), 1008-1021. Liu, J., Harris, A., & Kanwisher, N. (2002). Stages of processing in face perception: an MEG study. Nature neuroscience, 5(9), 910-916. Martin, A. (2007). The representation of object concepts in the brain. Annual review of psychology, 58, 25-45. Martin, A., & Weisberg, J. (2003). Neural foundations for understanding social and mechanical concepts. Cognitive neuropsychology, 20(3-6), 575-587. Mesulam, M. M. (1998). From sensation to cognition. Brain : a journal of neurology, 121 ( Pt 6), 1013-1052. Okada, K., Rong, F., Venezia, J., Matchin, W., Hsieh, I. H., Saberi, K., et al. (2010). Hierarchical organization of human auditory cortex: evidence from acoustic invariance in the response to intelligible speech. Cerebral cortex, 20(10), 2486-2495. Pereira, F., Mitchell, T., & Botvinick, M. (2009). Machine learning classifiers and fMRI: a tutorial overview. NeuroImage, 45(1 Suppl), S199-209. Raizada, R. D., Tsao, F. M., Liu, H. M., & Kuhl, P. K. (2010). Quantifying the adequacy of neural representations for a cross-language phonetic discrimination task: prediction of individual differences. Cerebral cortex, 20(1), 1-12. Rao, S. C., Rainer, G., & Miller, E. K. (1997). Integration of what and where in the primate prefrontal cortex. Science, 276(5313), 821-824. Rauschecker, J. P., & Scott, S. K. (2009). Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nature neuroscience, 12(6), 718-724. Rauschecker, J. P., & Tian, B. (2000). Mechanisms and streams for processing of "what" and "where" in auditory cortex. Proceedings of the National Academy of Sciences of the United States of America, 97(22), 11800-11806. Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual review of neuroscience, 27, 169-192. Romanski, L. M., Bates, J. F., & Goldman-Rakic, P. S. (1999). Auditory belt and parabelt projections to the prefrontal cortex in the rhesus monkey. The Journal of comparative neurology, 403(2), 141-157. Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyesbraem, P. (1976). Basic Objects in Natural Categories. Cognitive psychology, 8(3), 382-439. Santos, N. S., Kuzmanovic, B., David, N., Rotarska-Jagiela, A., Eickhoff, S. B., Shah, J. N., et al. (2010). Animated brain: a functional neuroimaging study on animacy experience. NeuroImage, 53(1), 291-302. Schultz, J., Friston, K. J., O'Doherty, J., Wolpert, D. M., & Frith, C. D. (2005). Activation in posterior superior temporal sulcus parallels parameter inducing the percept of animacy. Neuron, 45(4), 625-635. Schwartz, M. F., Kimberg, D. Y., Walker, G. M., Faseyitan, O., Brecher, A., Dell, G. S., et al. (2009). Anterior temporal involvement in semantic word retrieval: voxel-based lesion-symptom mapping evidence from aphasia. Brain : a journal of neurology, 132(Pt 12), 3411-3427. Scott, L. S., Tanaka, J. W., Sheinberg, D. L., & Curran, T. (2006). A reevaluation of the electrophysiological correlates of expert object processing. Journal of cognitive neuroscience, 18(9), 1453-1465. Staeren, N., Renvall, H., De Martino, F., Goebel, R., & Formisano, E. (2009). Sound categories are represented as distributed patterns in the human auditory cortex. Current biology : CB, 19(6), 498-502. Stokes, M., Thompson, R., Nobre, A. C., & Duncan, J. (2009). Shape-specific preparatory activity mediates attention to targets in human visual cortex. Proceedings of the National Academy of Sciences of the United States of America, 106(46), 19569-19574. Streit, M., Ioannides, A., Sinnemann, T., Wolwer, W., Dammers, J., Zilles, K., et al. (2001). Disturbed facial affect recognition in patients with schizophrenia associated with hypoactivity in distributed brain regions: a magnetoencephalographic study. The American journal of psychiatry, 158(9), 1429-1436. Sugase, Y., Yamane, S., Ueno, S., & Kawano, K. (1999). Global and fine information coded by single neurons in the temporal visual cortex. Nature, 400(6747), 869-873. Tranel, D., Grabowski, T. J., Lyon, J., & Damasio, H. (2005). Naming the same entities from visual or from auditory stimulation engages similar regions of left inferotemporal cortices. Journal of cognitive neuroscience, 17(8), 1293-1305. Tusche, A., Bode, S., & Haynes, J. D. (2010). Neural responses to unattended products predict later consumer choices. The Journal of neuroscience : the official journal of the Society for Neuroscience, 30(23), 8024-8031. Van Essen, D. C. (2005). A Population-Average, Landmark- and Surface-based (PALS) atlas of human cerebral cortex. NeuroImage, 28(3), 635-662. von Kriegstein, K., Dogan, O., Gruter, M., Giraud, A. L., Kell, C. A., Gruter, T., et al. (2008). Simulation of talking faces in the human brain improves auditory speech recognition. Proceedings of the National Academy of Sciences of the United States of America, 105(18), 6747-6752. von Kriegstein, K., & Giraud, A. L. (2006). Implicit multisensory associations influence voice recognition. PLoS biology, 4(10), e326. Walker, G. M., Schwartz, M. F., Kimberg, D. Y., Faseyitan, O., Brecher, A., Dell, G. S., et al. (2010). Support for anterior temporal involvement in semantic error production in aphasia: New evidence from VLSM. Brain and language. Walther, D. B., Caddigan, E., Fei-Fei, L., & Beck, D. M. (2009). Natural Scene Categories Revealed in Distributed Patterns of Activity in the Human Brain. Journal of Neuroscience, 29(34), 10573-10581. Warren, J. D., Jennings, A. R., & Griffiths, T. D. (2005). Analysis of the spectral envelope of sounds by the human brain. NeuroImage, 24(4), 1052-1057. Wheatley, T., Milleville, S. C., & Martin, A. (2007). Understanding animate agents: distinct roles for the social network and mirror system. Psychological science : a journal of the American Psychological Society / APS, 18(6), 469-474. Zatorre, R. J., Bouffard, M., & Belin, P. (2004). Sensitivity to auditory object features in human temporal neocortex. The Journal of neuroscience : the official journal of the Society for Neuroscience, 24(14), 3637-3642. Zevin, J. D., Yang, J., Skipper, J. I., & McCandliss, B. D. (2010). Domain general change detection accounts for "dishabituation" effects in temporal-parietal regions in functional magnetic resonance imaging studies of speech perception. The Journal of neuroscience : the official journal of the Society for Neuroscience, 30(3), 1110-1117.

Conference System by Open Conference Systems & MohSho Interactive