Multimodal integration

Multi-modal integration is a sub-discipline of perceptual psychology and is primarily concerned with how the different sensory modalities (eg sight, sound, touch) become integrated into the coherent, unified, conscious representation we experience. Although the field is not essentially concerned with consciousness and often investigates how the different sense modalities interact and alter the processing of each other.

Most psychologists interested in perceptual processes typically investigate each sense modality independently and these traditional unimodal approaches labour under a number of assumptions. Jerry Fodor (1983) popularized the notion that sense modalities are processed independently in his monograph Modularity of Mind. Fodor's modular theory of perception states that perceptual processes are simple reflex-like computations that create perceptual inferences about the environment. Importantly these computations are modular, separated into parallel processing channels that are not affected by higher level cognitive processes which Fodor labeled 'central processes'. In essence Fodor highlighted the functional independence of each of the senses and asserted that perception followed feedforward information processing scheme only.

If color, motion, depth, form, sound etc, are processed independently where does the unified coherent conscious experience of the world come in? This is known as the binding problem and is usually studied entirely within visual processes, however it is clear that the binding problem is central to multi-modal perception.

However considerations of how unified conscious representations are formed are not full focus of Multi-modal Integration research. It is obviously important for the senses to interact in order to maximize how efficiently people interact with the environment. For perceptual experience and behavior to benefit from the simultaneous stimulation of multiple sensory modalities, integration of the information from these modalities is necessary. The mechanisms mediating this phenomenon and its subsequent effects on cognitive and behavioural processes will be examined hereafter. Perception is conventionally defined as ones conscious experience, and thereby combines inputs from all relevant senses and prior knowledge. However, despite the existence of Gestalt psychology schools that advocated a holistic approach to the operation of the brain, the physiological processes underlying the formation of percepts have been vastly understudied until recently. Neural structures implicated in multisensory integration include the superior colliculus (SC), and somewhat dissociated from this, various cortical structures such as the superior temporal gyrus (GT) and visual and auditory association areas. Although the structure and function of the SC are well known, the cortex and the relationship between its constituent parts are presently the subject of much investigation. Concurrently, the recent impetus on integration has enabled investigation into perceptual phenomena such as the ventriloquism effect, rapid localisation of stimuli and the McGurk effect; culminating is a more thorough understanding of the human brain and its functions.

Perceptual and behavioral consequences
A unimodal approach dominated scientific literature until the beginning of this century. Although this enabled rapid progression of neural mapping, and an improved understanding of neural structures, the investigation of perception remained relatively stagnant. The recent revitalized enthusiasm into perceptual research is indicative of a substantial shift away from reductionism and toward gestalt methodologies. Gestalt theory, dominant in the late 19th and early 20th centuries espoused two general principles: the ‘principle of totality’ in which conscious experience must be considered globally, and the ‘principle of psychophysical isomorphism’ which states that perceptual phenomena are correlated with cerebral activity. These ideas are particularly relevant in the current climate and have driven researchers to investigate the behavioural benefits of multisensory integration.

Improving sensory uncertainty
It has been widely acknowledged that uncertainty in sensory domains results in an increased dependence of multisensory integration (Alais & Burr, 2004). Hence, it follows that cues from multiple modalities that are both temporally and spatially synchronous are viewed neurally and perceptually as emanating from the same source. The degree of synchrony that is required for this ‘binding’ to occur is currently being investigated in a variety of approaches. It should be noted here that the integrative function only occurs to a point beyond which the subject can differentiate them as two opposing stimuli. Concurrently, a significant intermediate conclusion can be drawn from the research thus far. Multisensory stimuli that are bound into a single percept, are also bound on the same receptive fields of multisensory neurons in the SC and cortex (Alais & Burr, 2004). Considering the lack of empirically based evidence for the psychophysical isomorphism principle during the period of dominance of the gestalt school, this highly nontrivial statement validates the and further emphasizes the increasing role of Gestalt psychology in modern science.

However, strict perceptual to neural correlation is also not accurate, and may also be considered over reductionist. It has been found that two converging bimodal stimuli can produce a perception that is not only different in magnitude than the sum of its parts, but also quite different in quality. In a classic study labeled the McGurk effect, (McGurk & MacDonald, 1976) a person’s phoneme production was dubbed with a video of that person speaking a different phoneme. The end result was the perception of a third, different phoneme. McGurk and MacDonald (1976) explained that phonemes such as ba, da, ka, ta, ga and pa can be divided into four groups, those that can be visually confused, ie. (da, ga, ka, ta) and (ba and pa), and those that can be audibly confused. Hence, when ba – voice and ga lips are processed together, the visual modality sees ga or da, and the auditory modality hears ba or da, combining to form the percept da. This result can be generalized to reach a profound conclusion; that multiple modalities calculate the most parsimonious possibility from the information provided in order to produce a single percept. Although a maladaptive process in this case, in a natural context this ability would disambiguate slightly differing perceptual phenomena for a better understanding of the perceptual world.

Ventriloquism
Prior to the formation of the inverse effectiveness rule, ventriloquism was used as the primary evidence base of the modality appropriateness hypothesis. Ventriloquism describes the situation in which auditory location perception is shifted toward a visual cue. The original study describing this phenomenon was conducted by Howard and Templeton, (1966) after which several studies have replicated and built upon the conclusions they reached (Hairston et al, 2003;). In conditions in which the visual cue is unambiguous, visual capture reliably occurs. Thus to test the influence of sound on perceived location, the visual stimulus must be progressively degraded (Alais & Burr, 2004). Furthermore, given that auditory stimuli are more attuned to temporal changes, recent studies have tested the ability of temporal characteristics to influence the spatial location of visual stimuli. Some types of EVP - Electronic voice phenomenon, mainly the ones using sound bubles are considered a kind of modern ventriloquism technique and is played by the use of sophisticated software, computers and sound equipment.

Reaction time benefits
Accompanying the benefits of disambiguation, and increased salience, a relatively well established phenomenon is the ability for multiple sensory inputs to increase the speed of outputs. Hershenson (1962) performed a basic yet discerning experiment in which a light and tone were displayed simultaneously and separately while reaction times were measured. As the asynchrony between the onsets of both stimuli was varied, it was observed that for certain degrees of asynchrony, reaction times were increased. These levels of asynchrony were quite small, reflecting the temporal window that exists in multimodal neurons of the SC. Further studies have analysed the reaction times of saccadic eye movements (Hughs et al., 1994); and more recently correlated these findings to neural phenomena (Wallace, 2004). The behavioural implications of these results including; faster reflexes and smoother control of motion through the combination of kinesthesia and vision, are significant for the survival of a species in a diverse perceptual world.

Modality appropriateness vs. the inverse effectiveness rule
Welch and Warren (1980) asserted that multisensory processes followed a modality appropriateness hypothesis, in which due to visual dominance of spatial tasks, also known as visual capture, one will always depend on vision over audition or tactition to solve spatial problems. Thus, auditory stimuli can not at all influence ones perception of the location of a visual stimulus. Concurrently, audition was considered dominant toward temporal tasks.

However, more recent studies have generated results that contradict this hypothesis. Alais and Burr (2004), found that following progressive degradation in the quality of a visual stimulus, participants’ perception of spatial location was determined progressively more by a simultaneous auditory cue. reached a similar finding. However, they also progressively changed the temporal uncertainty of the auditory cue; eventually concluding that it is the uncertainty of individual modalities that determine to what extent information from each modality is considered when forming a percept.

This conclusion has become known as the ‘inverse effectiveness rule’ and has significant physiological correlates in the SC. It is known that there is a multiplicative excitation effect when the spatial and temporal attributes of more that one modality combine. However, the extent to which excitation is multiplied varies, according to the ambiguity of the relevant stimuli (Heron et al., 2004). Unimodal neurons, on the superficial layers of the SC can also influence orienting behaviour. In the event that an unambiguous stimulation occurs to one modality, these operate without the input of multisensory neurons (Patton, Belkacem-Boussaid & Anastasio, 2002). Conversely, since multisensory cues can be more salient than unimodal cues of the same magnitude, increasing ambiguity from one sense, results in an increasing dependence on multisensory neurons with the same receptive field. Hence, there is a multiplicative excitation effect. Conversely, projections to multisensory layers of stimuli in non- adjacent receptive fields, lead to an inhibitory effect. Concurrently, to calculate the level of increased excitation or inhibition, formulae utilising the inverse of each relevant modality’s variance have been generated (Anastasio, Patton & Belkacem-Boussaid, 2000).

Superior colliculus (SC)
The SC is part of the tectum, located in the midbrain, superior to the brainstem and inferior to the thalamus. It contains seven layers of alternating white and grey matter, of which the superficial contain topographic maps of the visual field; and deeper layers contain overlapping spatial maps of the visual, auditory and somatosensory modalities (Affifi & Bergman, 2005). The structure receives afferents directly from the retina, as well as from various regions of the cortex (primarily the occipital lobe), the spinal chord and the inferior colliculus. It sends efferents to the spinal chord, cerebellum, thalamus and occipital lobe via the lateral geniculate nucleus (LGN). The structure contains a high proportion of multisensory neurons and plays a role in the motor control of orientation behaviours of the eyes, ears and head (Wallace, 2004).

Receptive fields from somatosensory, visual and auditory modalities converge in the deeper layers to form a two dimensional multisensory map of the external world. Here, objects straight ahead are represented caudally and objects on the periphery are represented rosterally. Similarly, locations in superior sensory space are represented medially, and inferior locations are represented laterally (Stein and Meredith, 1993).

However, in contrast to simple convergence, the SC integrates information to create an output that differs from the sum of its inputs. Following a phenomenon labelled the ‘spatial rule’, neurons are excited if stimuli from multiple modalities fall on the same or adjacent receptive fields, but are inhibited if the stimuli fall on disparate fields (Giard & Peronnet, 1999). Excited neurons may then proceed to innervate various muscles and neural structures to orient an individual’s behaviour and attention toward the stimulus. Neurons in the SC also adhere to the ‘temporal rule’, in which stimulation must occur within close temporal proximity to excite neurons. However, due to the varying processing time between modalities and the relatively slower speed of sound to light, it has been found the neurons may be optimally excited when stimulated some time apart (Miler & D’Esposito, 2005).

Cortical structures and the superior colliculus
The most significant interaction between these two systems (corticotectal interactions) is the connection between the anterior ectosylvian sulcus (AES), which lies at the junction of the parietal, temporal and frontal lobes, and the SC. The AES is divided into three unimodal regions with multimodal neurons at the junctions between these sections (Jiang & Stein, 2003). Neurons from the unimodal regions project to the deep layers of the SC and influence the multiplicative integration effect. That is, although they can receive inputs from all modalities as normal, the SC can not enhance or depress the effect of multimodal stimulation without input from the AES (Jiang & Stein, 2003).

Concurrently, the multisensory neurons of the AES, although also integrally connected to unimodal AES neurons, are not directly connected to the SC. This pattern of division is reflected in other areas of the cortex, resulting in the observation that cortical and tectal multisensory systems are somewhat dissociated (Wallace, Meredith & Stein, 1993). Stein, London, Wilkinson and Price (1996) analysed the perceived luminance of an LED in the context of spatially disparate auditory distracters of various types. A significant finding was that a sound increased the perceived brightness of the light, regardless of their relative spatial locations, provided the light’s image was projected onto the fovea. Here, the apparent lack of the spatial rule, further differentiates cortical and tectal multisensory neurons. Little empirical evidence exists to justify this dichotomy. Nevertheless, cortical neurons governing perception, and a separate sub cortical system governing action (orientation behavior) is synonymous with the perception action hypothesis of the visual stream (Goodale & Milner, 1995). Further investigation into this field is necessary before any substantial claims can be made.

Multisensory properties of the cerebral cortex
Multisensory neurons exist in a large number of locations, often integrated with unimodal neurons. They have recently been discovered in areas previously though to be modally specific, such as the somatosensory cortex; as well as in clusters at the borders between the major cerebral lobes, such as the occipito-patietal space and the occipito-temporal space (Wallace, Ramachandran & Stein, 2004; Wallace, 2004). Two properties they espouse are their adaptability or plasticity, and their ability to communicate through feed forward and feedback mechanisms.

Audio visual cross modal interactions are known to occur in the auditory association cortex which lies directly inferior to the Sylvian fissure in the temporal lobe (Sedato et al., 2004). Plasticity was observed in the superior temporal gyrus (GT) (the superior most point of the temporal lobe) by Petitto et al. (2000). Here, it was found that the GT was more active during stimulation in native deaf signers compared to hearing non signers. Concurrently, further research has revealed differences in the activation of the Planum temporale (PT) in response to non linguistic lip movements between the hearing and deaf; as well as progressively increasing activation of the auditory association cortex as previously deaf participants gain hearing experience via a cochlear implant (Sedato et al., 2004). These examples of cross modal plasticity similarly exist in the visual association cortex (Kujala et al., 1997; Theoret & Pascual–Leone, 2006). This ability has obvious benefits for human behavior, including increased efficiency and optimal utilization of the brain as well as greater resilience to changes in circumstance.

However, in order to undergo such physiological changes, there must exist continuous connectivity between these multisensory structures. It is generally agreed that information flow within the cortex follows a hierarchical configuration (Clavagnier, Falchier & Kennedy, 2004). Hubel and Wiesel (as cited in Clavagnier et al, 2004) showed that receptive fields and thus the function of cortical structures, as one proceeds out from V1 along the visual pathways, become increasingly complex and specialized. From this it was postulated that information flowed outwards in a feed forward fashion; the complex end products eventually binding to form a percept. However, via fMRI and intracranial recording technologies, it has been observed that the activation time of successive levels of the hierarchy does not correlate with a feed forward structure. That is, late activation has been observed in the striate cortex, markedly after activation of the prefrontal cortex in response to the same stimulus (Foxe & Simpson, 2002).

Complementing this, afferent nerve fibres have been found that project to early visual areas such as the lingual gyrus from late in the dorsal (action) and ventral (perception) visual streams, as well as from the auditory association cortex (Macaluso, Frith & Driver, 2000). Feedback projections have also been observed in the opossum directly from the auditory association cortex to V1 (Clavagnier et al, 2004). This last observation currently highlights a point of controversy within the neuroscientific community. Sedato et al. (2004) concluded, in line with Bernstein et al. (2002), that the primary auditory cortex (A1) was functionally distinct from the auditory association cortex, in that it was void of any interaction with the visual modality. They hence concluded that A1 would not at all be effected by cross modal plasticity. This concurs with Jones and Powell’s (1970) contention that primary sensory areas are connected only to other areas of the same modality.

In contrast, the dorsal auditory pathway, projecting from the temporal lobe is largely concerned with processing spatial information, and contains receptive fields that are topographically organized. Fibers from this region project directly to neurons governing corresponding receptive fields in V1 (Clavagnier et al, 2004). The perceptual consequences of this have not yet been empirically acknowledged. However, it can be hypothesized that these projections may be the precursors of increased acuity and emphasis of visual stimuli in relevant areas of perceptual space. Consequently, this finding rejects Jones and Powell’s (1970) hypothesis and thus is in conflict with Sedato et al.’s (2004) findings. A resolution to this discrepancy includes the possibility that primary sensory areas can not be classified as a single group, and thus may be far more different than previously thought. Regardless, further research is necessary for a definitive resolution.

Development of multimodal operations
All species equipped with multiple sensory systems, utilize them in an integrative manner to achieve action and perception (Stein & Meredith, 1993). However, in most species, especially higher mammals, the ability to integrate develops in parallel with physical and cognitive maturity. Classically, two opposing views that are principally modern manifestations of the nativist/empiricist dichotomy have been put forth. The integration (empiricist) view states that at birth, sensory modalities are not at all connected. Hence, it is only through active exploration that plastic changes can occur in the nervous system to initiate holistic perceptions and actions. Conversely, the differentiation (nativist) perspective asserts that the young nervous system is highly interconnected; and that during development, modalities are gradually differentiated as relevant connections are rehearsed and the irrelevant are discarded (Lewkowicz & Kraebel, 2004).

Using the SC as a model, the nature of this dichotomy can be analysed. In the newborn cat, deep layers of the SC contain only neurons responding to the somatosensory modality. Within a week, auditory neurons begin to occur, but it is not until two weeks after birth that the first multimodal neurons appear. Further changes continue, with the arrival of visual neurons after three weeks, until the SC has achieved its fully mature structure after three to four months. Concurrently in species of monkey, newborns are endowed with a significant complement of multisensory cells; however, along with cats there is no integration effect apparent until much later (Wallace, 2004). This delay is thought to be the result of the relatively slower development of cortical structures including the AES; which as stated above, is essential for the existence of the integration effect (Jiang & Stein, 2003).

Furthermore, it was found by Wallace (2004) that cats raised in a light deprived environment had severely underdeveloped visual receptive fields in deep layers of the SC. Although, receptive field size has been shown to decrease with maturity, the above finding suggests that integration in the SC is a function of experience. Nevertheless, the existence of visual multimodal neurons, despite a complete lack of visual experience, highlights the apparent relevance of nativist viewpoints. Multimodal development in the cortex has been studied to a lesser extent, however a similar study to that presented above was performed on cats whose optic nerves had been severed. These cats displayed a marked improvement in their ability to localize stimuli through audition; and consequently also showed increased neural connectivity between V1 and the auditory cortex (Clavagnier et al, 2004). Such plasticity in early childhood allows for greater adaptability, and thus more normal development in other areas for those with a sensory deficit.

In contrast, following the initial formative period, the SC does not appear to display any neural plasticity. Despite this, habituation and sensititisation over the long term is known to exist in orientation behaviors. This apparent plasticity in function has been attributed to the adaptability of the AES. That is, although neurons in the SC have a fixed magnitude of output per unit input, and essentially operate an all or nothing response, the level of neural firing can be more finely tuned by variations in input by the AES.

Although there is evidence for either perspective of the integration/differentiation dichotomy, a significant body of evidence also exists for a combination of factors from either view. Thus, analogous to the broader nativist/empiricist argument, it is apparent that rather than a dichotomy, there exists a continuum, such that the integration and differentiation hypotheses are extremes at either end.