Localisation
Binaural and monaural hearing
As hearing is primarily processed by a pair of organs (the two ears), a distinction is made between listening with a single ear (monaural hearing) and with both ears (binaural hearing).
Binaural processing
When a sound has equal stimulation or identical sensation at both ears, it is called ‘diotic’ stimulation. In contrast, when the inputs to the two ears are different, the stimulus is known as ‘dichotic’.
In natural environments, only dichotic hearing exists, allowing binaural processing to occur. If both ears received the same information, we would be unable to follow conversations in noisy situations, locate sound sources and accurately define our sound environment.
Localisation of sound sources
Localisation occurs mainly on primary sound sources which dominate the auditory environment. It becomes less precise when multiple sound sources are superimposed, and when reverberations create second sources.
Generally speaking, the human ear uses various cues to localise a sound in space: in the horizontal plane (also known as ‘azimuth’), the primary cues are a result of the differences between the signals arriving and the two ears (binaural cues). In the vertical plane (or elevation), the primary cues are monaural – they are generated by the modification of a sound by the torso, head, and the outer ear of the listener.
Localisation in thehorizontal plan
Interaural time difference (ITD)
The time difference between the arrival of a sound wave at each ear is an important cue for estimating the position of a sound in the horizontal plane.
A sound generated by a localised source ‘S’, in a horizontal plane created by the angle α and a distance r from the centre of the head, will have a different route to reach the right (R) and left (L) ears.
In this case, the distance SL is greater than SR and the sound will therefore arrive at the right ear sooner than at the left. The difference in the arrival time of the sound wave at each ear is called the interaural time difference (ITD). When the sound source is at equal distance from both ears (at 0° or 180°), the ITD is equal to 0.
On the contrary, the ITD is at its maximum when the sound source is at ±90°. At this point, it is equal to around 0.7 ms for an average-sized human head.
ITD is a fundamental cue for localising the source of a sound that has a frequency of less than 1500 Hz. Above 1500 Hz, ITD cues become ambiguous. However, in the case of complex sounds, the ITD of the envelope (slow modulation) of the high frequencies can be perceived. This is known as the interaural envelope time difference.
Interaural level difference (ILD)
The head-shadowing effect refers to the absorption of some of a sound’s energy by the head itself. It causes an interaural level difference (ILD), which corresponds to the difference of the intensity of the sound at each ear. However, this cue depends heavily on the frequency component of the sound. When it is below 1500 Hz, the ILD is almost non-existant. In contrast, for frequencies over 1500 Hz, ILD is a useful cue.
Thus, merging ILD and ITD allows a quite precise localization in the horizontal plane. Nevertheless, a confusing situation persists at azimuts 0° and 180° where ILD and ITD are almost identical identical, creating the so-called cone of confusion.
Vertical localisation
The ability to localise a sound in a vertical plane is often attributed to the analysis of the spectral composition of the sound at each ear. In fact, the sound waves arriving at the ears are rebounded from structures such as the shoulders or pinnae, and these rebounds interfere with the sound as it enters the ear canal. This interference causes spectral modifications, reinforcements (spectral peaks) or deterioration (spectral gaps) in certain frequency zones which allow the localisation of a sound source in the vertical plane.
Along the course of life, a multitude of transfer functions are learnt, which correspond to different directions for sound sources. These memorised filters are used to reweight the sound spectrum and are used to disambiguate the location of sounds in the cone of confusion.
This diagram (from J. Garas) shows the spectral modification of the original sound wave in function of the azimuth of the source (from top to bottom: -10°, 0°, 10°). It is apparent that the spectral gap moves from left to right (hover over the arrows with your mouse).
Note: The localisation of the source in the vertical plane remains less precise than in the horizontal plane.
Distance
Variations in intensity levels and the relationship between the energy of direct sounds and those that are rebounded are the principle cues that allow us to approximate the distance of a sound source.
Variations in the intensity level
A moving sound source has a variable intensity level. If it becomes closer, the intensity level increases, if it gets further away the intensity decreases. This cue can only be used in a non-reverberant environment (such as an anechoic chamber). In a reverberant context, the distribution of sound waves, and as a result their respective intensity levels, is dependent on the rebound properties of the room.
Relationship between the direct and reverberant sound fields
Close to a sound source, the direct sound field dominates. The further away from the sound source one gets, the more the reverberant field, which is created by the reflection of the direct waves on the ceiling and walls of the environment, dominates. Therefore, the evolution of the ratio between direct field and reverberant fields gives an indication of the movement, and therefore the distance, of the sound source.
Resolution of confusing situations
Human beings instinctively make small head movements that create new binaural and monaural cues. These new cues are used to increase the accuracy of sound localisation, and to minimise certain localisation ambiguities, such as the cone of confusion found at 0°and 180°azimuth.