November 06, 2011

Perception and hearing explained by new research (1)


The human hearing mechanism is a biological transducer and psych-acousticians have studied it extensively in an attempt to model its behavior mathematically. The frequency analysis of the ear is carried out by the basilar membrane which is enclosed within the spiral cochlea or inner ear. The basilar membrane resonates to continuous tones at different frequencies along its length, determining the frequency limits of human hearing. After resonance has begun, the physical position or place of the resonance along the membrane allows the listener to determine the pitch. In case of a timbral sound the harmonics will excite a specific pattern of spaced resonances which will be unique for each type of instrument allowing the listener to recognize its type.

The human hearing mechanism does not just detect the existence of sound, it also estimates the direction of the source as well as analyzing the content of the sound to determine the most likely cause. In musical sounds, the pitch will also be determined. Josef Manger has been studying these mechanisms for over 20 years. He has found that each mechanism takes a different time to operate following an initial transient. The location and nature of the sound source are completely discerned before the pitch is recognized.

Pitch and timbral recognition is described by the well-established place theory, in which different parts of the basilar membrane resonate according to the frequencies in the sound. However, various authorities, such as Keidel, Spreng, Klinke and Zenner, have suggested that there is another, faster acting, mechanism which works in the time domain.
The theory could not be tested with conventional loudspeakers. Confirmation of the theory was not possible until Josef Manger used his newly developed transducer as the sound source.


Fig.3 illustrates this principle of transient analysis and shows an idealized transient pressure waveform following an acoustic event.
There are three important points made in the figure:

1/ A complete cycle is quite unnecessary for the recognition of the sound source. Only the initial transient pressure change A-B is required. The time of arrival of the transient at the two ears will be different and will locate cause, i.e. the source laterally within around a millisecond.
2/ Following the event which generated the transient, the air pressure equalizes itself along the line B-F. The period of time between B and F varies and allows the listener to establish the likely size of the sound source.
3/ Only after the recognition of the source from the transient is the pitch recognized according to the place theory of the basilar membrane from the part of the wave-form beyond F.
The information in the initial transient pressure waveform goes beyond locating the source.

Fig. 4 illustrates how the size of a sound source affects the pressure equalization time.

Pressure waveforms from a hand gun, a rifle and a cannon are shown. It will be seen that the larger the source, the longer the pressure equalization time.



October 29, 2011

"THE RELATIONSHIP BETWEEN AUDIENCE ENGAGEMENT AND THE ABILITY TO PERCEIVE PITCH, TIMBRE, AZIMUTH AND ENVELOPMENT OF MULTIPLE SOURCES"



2.     THE PHYSICS OF HEARING   by David Griesinger

2.1 What Do We Already Know?

1. The sounds we want to hear in a performance space are speech and music, both of which consist of segments of richly harmonic tones 25ms to 500ms long, interspersed with bursts of broadband high frequency energy. It is likely we will not understand hearing or acoustics without understanding the necessity of harmonic tones.

2. Survival requires the detection of the pitch, timbre, direction, and distance of each sound source in a complex sound field. Natural selection has driven our skill at performing these tasks. There is a tremendous improvement in signal to noise ratio (S/N) if an organism possesses the ability to analyze the frequency of incoming sound with high precision, as then most of the background noise can be filtered out. Pitch and timbre allow us to identify potential threats, the vowels in speech, and the complexities of music. Location and distance tell us how quickly we must act.

3. We need to perceive pitch, timbre, direction and distance of multiple sources at the same time, and in the presence of background noise. This is the well-known cocktail party effect, essential to our successful navigation of difficult and dangerous social situations.

4. Perhaps as a consequence human hearing is extraordinarily sensitive to pitch. A musician can tune an instrument to one part in one thousand, and the average music lover can perceive tuning to at least an accuracy of one percent. This is amazing, given the frequency selectivity of the basilar membrane, which is about one part in five. Such pitch acuity did not evolve by accident. It must play a fundamental role in our ability to hear – and might help us understand how to measure acoustics.

5. The acuity to the pitch of sine-tones is a maximum at about 1000Hz. The fact that the pitch of low frequency sine tones varies with the loudness of the tone would seem to make playing music difficult. But we perceive the pitch of low tones primarily from the frequencies of their upper harmonics, and the perceived pitch of these harmonics is stable with level. So it is clear that harmonics of complex tones at 1000Hz and above carry most of the information we need to perceive pitch. The mystery we must solve is: how do we perceive the pitches of the  upper harmonics of several instruments at the same time, when such harmonics are typically unresolved by the basilar membrane?

6. Physics tells us that the accuracy with which we can measure the frequency of a periodic waveform depends on the product of the signal to noise ratio (S/N) of the signal and the length of time we measure it. If we assume the S/N of the auditory nerve is about 20dB, we can predict that the brain needs about 100ms to achieve the pitch acuity of a musician at 1000Hz. So we know there is a neural structure that can analyze sound over this time period – and it seems to be particularly effective at frequencies above 700Hz.

7. Physics also tells us that the amount of information that any channel can carry is roughly the product of the S/N and the bandwidth. The basilar membrane divides sound pressure into more than 40 overlapping channels, each with a bandwidth proportional to its frequency. So a critical band at 1000Hz is inherently capable of carrying ten times as much information as a critical band at 100Hz. Indeed, we know that most of the intelligibility of speech lies in frequencies between 700 and 4000Hz. We need to know the physics of how information is encoded into sound waves at these frequencies.

8. The cocktail party effect implies that we can detect the vocal formants of three or more speakers independently, even when the sounds arrive at our ears at the same time. Pitch is known to play a critical role in this ability. Two speakers speaking in monotones can be heard independently if their pitch is different by half a semitone, or three percent.[2] If they whisper, or speak at the same pitch, they cannot be separated. The vocal formants of male speakers are composed of numerous harmonics of low frequency fundamentals. When two people are speaking at once the formant harmonics will mix together on the basilar membrane, which is incapable of separating them. We should hear a mixture of formants, and be unable to understand either speaker. But we can, so it is clear that the brain can separate the harmonics from two or more speakers, and this separation takes place before the timbre – and thus the identity of the vowel – is detected. I believe that our acuity to pitch evolved to enable this separation.

9. Onsets of the sound segments that make up speech and music are far more important to comprehension than the ends of such segments. Convolving a sentence with time-reversed reverberation smoothes over the onset of each syllable while leaving the end clear. The modulation transfer function – the basis of STI and other speech measures – is unchanged. But the damage wrought to comprehension is immensely greater when reverberation is reversed.

10. When there are too many reflections we can sometimes understand speech from a single source, but in the presence of multiple sources our ability to perform the cocktail party effect is nullified and the result is babble. In the presence of reflections our ability to detect the timbre, distance, and direction of single sources is reduced, and the ability to separately detect these properties from multiple sources is greatly reduced.

11. We have found that accurate horizontal localization of sound sources in the presence of reverberation depends on frequencies above 1000Hz, and accuracy drops dramatically when the direct to reverberant ratio (D/R) decreases only one or two dB below a certain value. The threshold for accurate horizontal localization as a function of the D/R and the time delay of reflections can be predicted from a binaural impulse response using relatively simple formula. This formula will be discussed later in this paper.

October 16, 2011

A little bit of history...




28 January 1978

Dear Mr. Manger,

In order to soothe my bad conscience a little, I examined your loudspeakers theoretically last weekend. I was surprised to find that the radiation principle chosen by you gives a radiation time history that corresponds to the time history of the current (at least in the idealisation I examined); i.e. transient oscillations and other disturbing effects do not occur.

If I interpret the equations correctly, it seems feasible that a plate (membrane) with a stiffness decreasing towards the edge (thickness variation), leads to a slightly improved radiation efficiency.

With many thanks for your Christmas surprise,

Yours,


Prof. Dr. Manfred Heckl