October 29, 2011

"THE RELATIONSHIP BETWEEN AUDIENCE ENGAGEMENT AND THE ABILITY TO PERCEIVE PITCH, TIMBRE, AZIMUTH AND ENVELOPMENT OF MULTIPLE SOURCES"



2.     THE PHYSICS OF HEARING   by David Griesinger

2.1 What Do We Already Know?

1. The sounds we want to hear in a performance space are speech and music, both of which consist of segments of richly harmonic tones 25ms to 500ms long, interspersed with bursts of broadband high frequency energy. It is likely we will not understand hearing or acoustics without understanding the necessity of harmonic tones.

2. Survival requires the detection of the pitch, timbre, direction, and distance of each sound source in a complex sound field. Natural selection has driven our skill at performing these tasks. There is a tremendous improvement in signal to noise ratio (S/N) if an organism possesses the ability to analyze the frequency of incoming sound with high precision, as then most of the background noise can be filtered out. Pitch and timbre allow us to identify potential threats, the vowels in speech, and the complexities of music. Location and distance tell us how quickly we must act.

3. We need to perceive pitch, timbre, direction and distance of multiple sources at the same time, and in the presence of background noise. This is the well-known cocktail party effect, essential to our successful navigation of difficult and dangerous social situations.

4. Perhaps as a consequence human hearing is extraordinarily sensitive to pitch. A musician can tune an instrument to one part in one thousand, and the average music lover can perceive tuning to at least an accuracy of one percent. This is amazing, given the frequency selectivity of the basilar membrane, which is about one part in five. Such pitch acuity did not evolve by accident. It must play a fundamental role in our ability to hear – and might help us understand how to measure acoustics.

5. The acuity to the pitch of sine-tones is a maximum at about 1000Hz. The fact that the pitch of low frequency sine tones varies with the loudness of the tone would seem to make playing music difficult. But we perceive the pitch of low tones primarily from the frequencies of their upper harmonics, and the perceived pitch of these harmonics is stable with level. So it is clear that harmonics of complex tones at 1000Hz and above carry most of the information we need to perceive pitch. The mystery we must solve is: how do we perceive the pitches of the  upper harmonics of several instruments at the same time, when such harmonics are typically unresolved by the basilar membrane?

6. Physics tells us that the accuracy with which we can measure the frequency of a periodic waveform depends on the product of the signal to noise ratio (S/N) of the signal and the length of time we measure it. If we assume the S/N of the auditory nerve is about 20dB, we can predict that the brain needs about 100ms to achieve the pitch acuity of a musician at 1000Hz. So we know there is a neural structure that can analyze sound over this time period – and it seems to be particularly effective at frequencies above 700Hz.

7. Physics also tells us that the amount of information that any channel can carry is roughly the product of the S/N and the bandwidth. The basilar membrane divides sound pressure into more than 40 overlapping channels, each with a bandwidth proportional to its frequency. So a critical band at 1000Hz is inherently capable of carrying ten times as much information as a critical band at 100Hz. Indeed, we know that most of the intelligibility of speech lies in frequencies between 700 and 4000Hz. We need to know the physics of how information is encoded into sound waves at these frequencies.

8. The cocktail party effect implies that we can detect the vocal formants of three or more speakers independently, even when the sounds arrive at our ears at the same time. Pitch is known to play a critical role in this ability. Two speakers speaking in monotones can be heard independently if their pitch is different by half a semitone, or three percent.[2] If they whisper, or speak at the same pitch, they cannot be separated. The vocal formants of male speakers are composed of numerous harmonics of low frequency fundamentals. When two people are speaking at once the formant harmonics will mix together on the basilar membrane, which is incapable of separating them. We should hear a mixture of formants, and be unable to understand either speaker. But we can, so it is clear that the brain can separate the harmonics from two or more speakers, and this separation takes place before the timbre – and thus the identity of the vowel – is detected. I believe that our acuity to pitch evolved to enable this separation.

9. Onsets of the sound segments that make up speech and music are far more important to comprehension than the ends of such segments. Convolving a sentence with time-reversed reverberation smoothes over the onset of each syllable while leaving the end clear. The modulation transfer function – the basis of STI and other speech measures – is unchanged. But the damage wrought to comprehension is immensely greater when reverberation is reversed.

10. When there are too many reflections we can sometimes understand speech from a single source, but in the presence of multiple sources our ability to perform the cocktail party effect is nullified and the result is babble. In the presence of reflections our ability to detect the timbre, distance, and direction of single sources is reduced, and the ability to separately detect these properties from multiple sources is greatly reduced.

11. We have found that accurate horizontal localization of sound sources in the presence of reverberation depends on frequencies above 1000Hz, and accuracy drops dramatically when the direct to reverberant ratio (D/R) decreases only one or two dB below a certain value. The threshold for accurate horizontal localization as a function of the D/R and the time delay of reflections can be predicted from a binaural impulse response using relatively simple formula. This formula will be discussed later in this paper.

October 16, 2011

A little bit of history...




28 January 1978

Dear Mr. Manger,

In order to soothe my bad conscience a little, I examined your loudspeakers theoretically last weekend. I was surprised to find that the radiation principle chosen by you gives a radiation time history that corresponds to the time history of the current (at least in the idealisation I examined); i.e. transient oscillations and other disturbing effects do not occur.

If I interpret the equations correctly, it seems feasible that a plate (membrane) with a stiffness decreasing towards the edge (thickness variation), leads to a slightly improved radiation efficiency.

With many thanks for your Christmas surprise,

Yours,


Prof. Dr. Manfred Heckl