Motto 1:
As one calls into the forest, so does it sound outwards.

Motto 2:
Theoretically speaking, the terms 'theory' and 'practice'
are exactly defined and different from one another. In practice, they are frequently interchangeable.

During our workshop (from August 28 to September 3, 2010), the idea was swirling around the room from the first day on to record the sound of the forest clearing in a way that it could be available as an 'impulse response' for any desired reverberation.
The weather did not play along; wind and rain thwarted such an undertaking. First on the final evening, as everything had already been taken down and the workshop was slowly ending, there was a favorable moment which Wolfgang Musil used to assemble an appropriate test set-up.

Foto: M. Zacherl
The procedure goes as follows:
One generates a 'sweep' – a sine wave which begins as low as possible and reaches a high enough frequency within a certain timeframe. (In this case, within 20 seconds from around 38 Hz to 19 kHz).
This sweep will be played as strongly as possible through a well-placed loudspeaker, while one or several microphones (beside and behind) record the resulting spatial sound


So far, so good. The only thing was that we did not want the sound of the sweep in the forest clearing, but a pure spatial sound by itself.
The method with which the impulse response from both given sounds (the original sweep and the recording on site) can be, so to speak, distilled is called 'correlation'.
In principle, it is a procedure that has already been known for decades in communications engineering, in order to separate the message information from the contaminated background noises. (I remember a photo of a monstrous apparatus with loops of tape and rotating recording heads with which something like this could, to a certain extent, also be contrived analogously ...).
That is relatively simple in VASP: One makes a convolution of the one sound with the time reversal of the other. (Convolution is, as we know, a modulation of the spectrums). Finished – or almost finished.

sfload original_sweep.wav
sfload raumaufnahme.wav

The result looks like this:

All the way to the left, we actually have the graphically less conspicuous impulse response of the spatial situation. All the way to the right, however, we still see quite some mirrored material. When listening to it, it turns out to be a type of 'synopsis' of the sweep followed by several echoes. That could be the distortions of the loudspeaker, which now appear as a 'pre-echo'. (In such processes, the buffer is to be understood as cyclical).
How ever that came about, there cannot be a pre-echo, so get rid of it and put an envelope over it right away that definitively ends the sound at 5.5 seconds, because the rest is noise:


What actually results is a sound type that one can evoke through an impulse, for instance, by smashing two boards together – as Michael Zacherl had recently demonstrated. What cannot be overheard is the portion of the original sweep that is still contained in it.
How does that come about? Our assumption is this: In the open area there always are air movements, a background noise is always present. The correlation (even if it is formally and mathematically solved in a completely different way) basically compares each moment of the one sound with each moment of the other and finds out to what extent the one sound is contained in the other, whereby it allows itself to be eliminated from the other. Possibly the original sweep finds itself again in the diffused background noise-

No problem.
A few months ago I was once again dealing with digital filters and developed a type of filter which allows amazingly good, precise and controllable filtering to be carried out (ceir - 'complex exponential impulse response' – more about that another time).
There is also a variable notch filter which allows individual frequencies to be suppressed specifically and with adjustable steepness. (The filter type has already been implemented in VASP and in AMP, but has not been honed yet, and, therefore, remains to be documented). In any case, one can trace the course of the sweep with the notch filter and thus largely eliminate any vestiges of it.


We could be satisfied with it. The end result sounds reasonable, and it creates a beautiful space in the application as a reverb function.

* * *

But is it really the 'eigensound' of the forest – at least from one position??

Foto: G.Rabl

We have no comparison, because we do not have this eigensound itself. We gain it only through such or similar methods. But one can examine the method.

* * *

I generate a simple model of a spatial sound through white noise, overlaid by an envelope (attack-decay) that rises 1.5 seconds and fades 3.5 seconds. Then I generate the said sweep in a second buffer and 'reverb' both of them by means of convolution:

shape.attdec 1.5sec,3.5sec
gen.sweep 38hz,19000hz,20sec

By applying the correlation method, the original, this time synthetic spatial sound would have to come out again now. The VASP script above continues:

load.sweep 38hz,19000hz,20sec

The result apparently has the same course as the original reverb function, no pre-echo, but is amazingly dull.
How so? – A glance at the spectrums shows it. Sure, we have used an exponentially rising sweep. That also appeared to me as plausible at first sight. The frequency change races through the same intervals at the same times. However, that means that the spectral density continuously decreases upwards and a frequency response, as with pink noise, develops, with a high frequency roll-off of 3dB/octave! This is the spectrum of the sweep:

So, another frequency response correction at the appropriate position. If the frequency response that we created is pink (3dB high frequency roll-off per octave), then the appropriate correction is 'blue' (3dB high frequency boost).
In VASP there are these standards as noise (nois.pink, noise.blue), as well as filters (eir.pink, eir.blue) and as spectral envelopes (env.pink, env.blue ....).
Here is the whole process with correction:

sfload original_sweep.wav "load the original sweep in A"
mirr "time mirroring"
FFT "Fourier transformation"
sfload raumaufnahme.wav "load the space recording in B"
FFT "Fourier transformation"
vmul "spectral multiplication"
env.blue "frequency response correction"
FFT- "inverse transformation"

shape.fall 5.5sec "limited envelope"
opt "optimal level"


(Now one clearly hears the chirping of the crickets, which had no respect for our recording ...).

Important: The buffer size must be chosen to be so large that each of the two sounds only fills half a buffer at the most. Otherwise, convoluted mirrorings in time develop, which cannot be separated.
(In this case, size=21, approx. 48 seconds in total).

If one listens exactly, or the sound transposes, then one still notices modulations with the sweep, particularly at the beginning. I do not know how this can be completely eliminated during a field recording. Nonetheless, I viewed our recording process with subsequent correlations as a successful experiment to close in on sounds that slumber, so to speak, in the landscape and in the buildings. They are countless; at every position in the space different!
Naturally, they can be used trivially for 'convolution reverb'.
However, we can just as well view them as individual types of sound and compose with them. A piece only with spatial sounds! The notion alone already makes me sit up and take notice ...

* * *

Our process is good enough for collecting material. Material does not have to be perfect. But some of it can perhaps be improved or simplified. The frequency response correction is certainly a weak point. That can already be possibly avoided to some extent.

A suggestion would be a linear sweep, instead of an exponential one. That probably sounds less impressive, but the frequency response would also be linear as a consequence.
A second possibility would be to incorporate an amplitude correction right away in the exponential sweep, so that the tone proportionally becomes louder upwards. (However, this is a stress test for the loudspeaker).

In order to be able to further experiment, I have implemented the functions, which were already contained in VASP in another manner, somewhat more comfortably:


An exponential sweep (3 arguments: start frequency, target frequency and duration), one linear, one exponential with level correction, as well as the associated notch filters for the exponential and for the linear sweep.

The ideal situation, to stimulate the spatial sound with a 'Dirac impulse'
in which all frequencies are equally contained, is beautiful theory. Even if one produces it and can transfer it to the air, it would have to be extremely strong. Even a pistol shot presumably has too much of an individual sound character. Nonetheless, one should keep this method in view. With a mechanical device that can more or less reliably produce always the same, loud impulse (bang), one can record spatial sounds everywhere with a portable recorder. Especially in larger space situations where the initial reflection first arrives after a few milliseconds, the triggered bang can easily be separated from the impulse response. This could suffice to collect material.
Upcoming are a few considerations about how one could apply such a gained spatial sound in real-time.


(c) Günther Rabl 2010

I have also looked up how others do that. By the way, "do" is good! – Formulas, formulas, formulas ...
(Two pages of formulas can be written about even such a simple thing as a linear sweep).
That naturally scares people away who do not have a knack for mathematics – and it can only annoy all others to have to stumble over variants of the same integral again and again.
There has to be a type of typographical dress code here: The integral sign is the tie of scientific earnestness.
And are the formulas even correct at all?? ...
Strangely enough, the process, as we have performed it, is described in several papers as 'autocorrelation'. However, autocorrelation is the convolution (reverberation, if one likes) of a sound with its own time reversal. But we are dealing with two different sounds, one of which is merely somehow contained in the other. Correlation is enough.
The problematic nature of the frequency response of an exponential sweep is known. Of course, half of all papers call the sweep 'logarithmic', actually the complete opposite.
[This unending quis pro quo of 'exponential' and 'logarithmic' can drive me to desperation! It implies that one once learned sometime that hearing perception is 'logarithmic' according to its nature. Naturally, it is also only half the truth, but it opines that pitch rows have to be exponential so that we feel them as linear, which is why such rows or progressions are then called logarithmic (?). Wherever the mix-up is to be feared, I rather prefer to use the description 'proportional', such as, e.g., when controlling the parameters in AMP].

The problem of a linear sweep is also known. (It passes through the same frequency intervals, not pitch intervals, at the same times). Its frequency response would be perfectly linear, but the lower frequencies would pass through so quickly that they could, under circumstances, no longer stimulate the space properly. That must have already been tried out.
Also the suggestion of an amplitude correction of the exponential sweeps is to be found. Has anyone already tried that out? It could namely be that the low frequencies, in turn, are so faint that they likewise cannot stimulate the space.
In practice, namely.
The pre-echoes will actually be blamed for the distortions in the transmission path. But does that also hold true for distortions of odd-numbered order? (In my opinion, no).
All of that and more has to simply be tried out – in a model case as in the application. Practice already begins where theory still has not stopped. (See Motto 2 above).