skip to primary navigationskip to content
 

genbmm


Manual Reference Pages  - GENBMM (1)

NAME

genbmm - generate basilar membrane motion

CONTENTS

Synopsis
Description
Options
Further Description
Motivation
Examples
References
Files
See Also
Bugs
Copyright
Acknowledgements

SYNOPSIS

genbmm [ option=value | -option ] [ filename ]

DESCRIPTION

The genbmm module of the AIM software simulates the spectral analysis performed by the auditory system using a bank of auditory filters. Specifically, genbmm converts an input wave into an array of filtered waves, one for each channel of the filterbank. The surface of the array of filtered waves is AIM’s representation of basilar membrane motion (BMM) as a function of time. AIM provides two alternative methods for generating the BMM, linear, gammatone filterbank (Patterson et al, 1988; Slaney 1993, Cooke, 1993), or a nonlinear, transmission-line filterbank (Giguere and Woodland, 1994). For convenience, they are referred to as the ’functional’ filterbank and the ’physiological’ filterbank, respectively.

OPTIONS

There are three sets of options for genbmm; they are grouped by function and identified by the suffixes _afb, _gtf and _tlf. The first set controls the distribution of the filtered waves across frequency (suffix _afb); the second specifies the shape of the gammatone filter (suffix _gtf); and the third specifies the shape of the transmission line filter (suffix _tlf). These three groups of options are the subject of this manual entry, together with an option that specifies the filter choice (gtf or tlf), and an option that specifies whether a middle ear function should be used with the gtf filterbank. There are also a number of Silent Options which control lesser used functions in the transmission-line filterbank. They are listed in docs/aimSilentOptions.

    The Outer/Middle Ear function: middle_ear

In the auditory system the middle ear causes a progressive attenuation of sound energy in the region below about 500 Hz and a progressive attenuation in the region above about 4000 Hz. There is also a primary auditory canal resonance around 2700 Hz that provides a boost in sound transmission. The resulting transfer function is a normal aspect of auditory processing and preceeds spectral analysis. If the functional filterbank is chosen (gtf), the outer/middle ear filter acts directly on the input wave, and the stapes velocity wave it generates is the input to the spectral filtering stage. If the physiological filterbank is chosen (tlf), the outer/middle ear and cochlear filter are performed simultaneously as in the auditory system. The only parameter associated with this function is the middle_ear switch which makes it possible to turn the outer/middle ear filtering off when the functional filterbank is chosen.

middle_ear Outer/middle ear switch Switch. Default: on.

It is also possible to specify a floating point number, in which case the middle ear output is multiplied by that value.

Note: The middle_ear option is ignored if option filter (see below) is set to tlf. This is because the outer/middle stage and the cochlear stage are bidirectionally coupled in the transmission line filter implementation, and cannot be separated.

    I. THE AUDITORY FILTERBANK OPTIONS: _afb

The distribution of the filters across frequency and the total number of output filters in the bank are determined by four parameters: channels_afb, mincf_afb, maxcf_afb, and dencf_afb.

channels_afb The number of channels in the filterbank. Default unit: filters. Default value: 75
mincf_afb The minimum centre frequency Default unit: Hz. Default value: 100 Hz.
maxcf_afb The maximum centre frequency Default unit: Hz. Default value: 6000 Hz.
dencf_afb The density of the filters in the filterbank. Defaults units: filters/critical band. Default value: off

dencf_afb provides an alternative method of specifying the number of channels in terms of the density of filters along the frequency scale.

Note: channels_afb overrides dencf_afb whenever it has a non-zero value. So to use dencf, you must set channels_afb=off. When channels_afb has a non-zero value, dencf_afb is ignored.

WARNING: When using the transmission line filter (filter=tlf), the channel density should be 3 or more filters/erb. Using a lower density may lead to excessive spatial discretization errors (see Giguere and Woodland (1994) for a discussion). To view a small number of channels, use a reasonable density and reduce the number of displayed channels using option downchannel.
The ERB scale for the gammatone auditory filterbank is specificed with three options: bwmin_afb, quality_afb, and mmerb_afb.
bwmin_afb The minimum bandwidth for an auditory filter. Default unit: Hz. Default value: 24.7
quality_afb The limiting quality factor for high frequency auditory filters. Units: scalar. Default: 9.265
mmerb_afb The length of one erb-rate unit along the basilar membrane. Units: mm. Default: 0.89
info_afb Print filterbank information to stderr. Switch. Default: off.

A listing of the parameters for the filter in the bank can be directed to the terminal at run time by setting info_afb=on.

The physiological data on human cochlear frequency-position function (Greenwood, 1990) and the psychoacoustic data on auditory filter bandwidth (Patterson and Moore, 1986) indicate that the spectral analysis performed in the cochlea is like a ’constant Q’ system (quality_afb) that asymptotes to a minimum filter bandwidth (bwmin_afb) at low centre frequencies. That is,

erb = bwmin_afb + centre-frequency/quality_afb.

If we assume, as Greenwood suggests, that each filter bandwidth corresponds to a constant distance (mmerb_afb) along the basilar membrane, it is possible to scale frequency in terms of erb units (or position along the basilar membrane) by integrating the inverse of the erb function above.

Glasberg and Moore (1990) have reviewed the available human filter shape data and concluded that the optimum values for bwmin_afb and quality_afb are 24.7 and 9.265, respectively, together with mmerb_afb of 0.89. (As a rule of thumb for rapid estimation, erb = 25 + 10% of cf ). The auditory scale used by Greenwood (1990) can be specified by setting bwmin_afb=22.85, quality_afb=7.238 and mmerb_afb=1.0. A reasonable approximation to the Bark scale (Zwicker, 1961) is obtained by setting bwmin_afb=80, quality_afb=6.5 and mmerb_afb=0.89.

    Auditory filter design: filter

The choice of filterbank -- linear gammatone or nonlinear transmission line -- is determined by option filter.

filter The auditory filter design Default: gtf. Choices: gtf, tlf, off.

When gtf is specified, the options below with suffix _gtf apply, and when tlf is specified, the options below with suffix _tlf apply. When off is specified, the input wave (or the stapes velocity) is passed on directly to the next stage. This provides for non-auditory use of the modules following the filterbank with their associated displays. For example, the envelope of the input wave (or stapes velocity) can be extracted using the rectification and integration modules that follow genbmm. The entry point genasa has the most convenient default settings for this purpose. The default value for the filter option is gtf.

    II. THE GAMMATONE FILTER OPTIONS: _gtf

order_gtf The order of the gammatone filter Units: none. Default: 4

The order of the filter, order_gtf, determines the number of filtering stages and so it determines the slope of the skirts of the attenuation function and their extent. The default value is 4 and the range of useful values is from about 2 to 8. The processing time increases linearly with order above about order 2.

Note that the bandwidth calculation takes account of the fact that changes in order_gtf affect bandwidth. Thus, as long as bwmin_afb is fixed, changing the order will not affect the bandwidths of the resulting filters. Increasing the order of the filter increases the delay of the onset of the impulse response but it has little effect on the shape of the envelope of the impulse response for orders greater than three. The human auditory system is not sensitive to small phase changes between filter channels (Patterson, 1987) and so filter order is not well constrained by human experimental data. The default value (4) is used because this value provides the best match between the amplitude characteristics of the gammatone and roex filters for humans (Patterson et al., 1988).

gain_gtf Filter output amplification Units: scalar. Default: 4.

The ratio of input to output level across the auditory filter when the input is a sinusoid at the cf of the filter.

phase_gtf The phase of the impulse response Units: none. Default: 0.

In the absence of phase compensation, the surface of basilar membrane motion has a strong rightward skew in the low-frequency channels because the filters get progressively narrower as centre frequency decreases, and this narrowing is accompanied by a slower filter response. There are occassionally non-auditory reasons for wanting to align the channels across frequency in one way or another. The software provides four alignment systems which are discussed at the end of this entry just before the references under the title Phase Alignment.

    III. THE TRANSMISSION LINE FILTER OPTIONS: _tlf

motion_tlf The basilar membrane output motion variable Default: vel. Choices: vel, disp.

If vel (velocity) is specified, the output of genbmm is the basilar membrane velocity. If disp (displacement) is specified, the output of genbmm is the basilar membrane displacement. The default value is vel.

outdencf_tlf The density of the filters outside the display range. Units: filters/critical band. Default: 4.

In the transmission line filter implementation, it is necessary to simulate the basilar membrane over its entire length. The option outdencf_tlf provides a means of specifying the number of additional channels that must be computed at the basal and apical ends of the cochlea, ie. outside the range specified by mincf_afb and maxcf_afb (see above). These additional channels are only computed for internal use and are not passed to the next stage of processing.

qref_tlf The local quality factor of each basilar membrane channel Units: scalar. Default: 2.

Note: With the transmission line filter, the bandwidth is not determined by options bwmin_afb and quality_afb at high levels but rather by option qref_tlf (see above).

feedback_tlf The feedback gain of the outer hair cell circuit Units: scalar. Default: 0.99

WARNING: A value for feedback_afb greater than or equal to 1.0 can lead to unstable behaviour at low-levels (ie. oscillation). However, the model output will not grow unbound. The growth of the oscillations will be limited by the saturating nonlinearity of the outer hair cell circuit, and the model output will go into a kind of limit-cycle. These model oscillations have not yet been studied in detail and are likely to deviate substantially from real cochlear emissions.

dsat_tlf The basilar membrane displacement at the half-saturation point of the outer hair cell circuit Units: cm. Default: 5.75e-6
gain_tlf Filter output amplification Units: scalar. Default: 4.

Note: There is an internal gain of 4.0 within the software of the transmission line model itself. The total gain is therefore 4.0 times the value for gain_tlf.

NOTE: A linearized version of the transmission line filter with roughly the same bandwidth as the gammatone filter can be obtained by setting feedback_tlf=0 and qref_tlf to about 10. The main difference is that the low-frequency skirt of the transmission line filter is less steep than that of the gammatone.

NOTE: There are a number of Silent Options for the transmission-line filterbank. They are listed in docs/aimSilentOptions.

FURTHER DESCRIPTION

    The distribution of filter centres along the ERB scale.

                        .                               

Given values for mincf_afb, maxcf_afb, and channels_afb (or dencf_afb), the program creates an array of centre frequencies in three steps:

1. It centres a filter at 1.0 kHz.

2. Then it centres filters below 1.0 kHz, one after another, until it encounters mincf_afb. (Thus, mincf_afb is actually the frequency below which no filters are centred). The step size, that is the distance between centre frequencies, is determined by dencf_afb. When dencf_afb is equal to one, the centre frequencies are 1 ERB apart. The ERB is the Equivalent Rectangular Bandwidth of the filter (about 14% larger than the 3 dB bandwidth of the filter). The function relating the ERB to the centre frequency of the filter is taken from a ‘critical band’ equation introduced by Greenwood (1961) and adapted to human auditory masking by Glasberg and Moore (1990).

3. Finally, the program centres filters one after another in the region above 1 kHz until it encounters maxcf_afb (which is, actually, the frequency above which no filters are centred). When dencf_afb is increased, say to two, the program allocates two filters per critical band and spaces them at half ERB steps.

Note: It is not the bandwidths of the filters that are controlled by dencf_afb but rather the density of filters along the frequency axis. Thus, doubling dencf_afb does not cause the bandwidth of the filters to be halved; rather it results in more overlap between adjacent filters. With regard to the images produced by genbmm, dencf_afb determines the density of lines on the surface rather than the shape of the features that appear on the surface.

MOTIVATION

The motivation for adopting the gammatone filter shape is threefold:

1. It provides an excellent summary of physiological data concerning the impulse response of primary auditory neurons in small mammals such as cats (de Boer and de Jongh, 1978; Carney and Yin, 1989)

2. The amplitude characteristic of the gammatone filter is very similar to that of the Roex filter commonly used to represent the human auditory filter (Patterson, et al, 1982; Schofield, 1985; Patterson and Moore, 1986; Patterson et al, 1988).

3. There are recursive gammatone filters that make the calculation particularly fast both on general purpose computers and special purpose DSP chips (Holdsworth et al, 1988; Cooke, 1993; Slaney, 1993).

In summary, the gammatone filter is designed to provide a reasonable trade-off between accuracy in simulating basilar membrane motion, and computational load.

The motivation for adopting the transmission line filter is as follows:

1. The outer hair cell circuit of the transmission line filter is level dependent and so this design produces level-dependent basilar membrane tuning curves (Giguere and Woodland, 1994). There is now ample evidence that the basilar membrane motion is indeed highly nonlinear and a major source of level compression (eg. Johnstone et al., 1986).

2. The internal structure of the transmission line filter model is based on the physics of the auditory periphery and therefore provides a more realistic cochlear simulation than parallel filterbanks. It generates combination tones of the form 2f1-f2 as observed in the auditory system and it has the potential to generate cochlear echoes.

3. The wave-digital-filter implementation of the transmission line filterbank is only about twice as slow as the gammatone filterbank for an equivalent number of channels.

Phase Alignment

There is no question that the output of the cochlea has a phase lag corresponding to the strong rightward skew. However, perceptual evidence indicates that this phase lag has to be enormous (> 4ms) to affect what we hear; indeed, reversing the phase lag with synthetic stimuli does not change what we hear (Patterson, 1987). Phase information that appears in the basilar membrane motion but which we do not hear, is removed in the third module by the strobe mechanism of the temporal integration process. As a result, the stabilised auditory images are always phase aligned even though the basilar membrane motion and the neural activity patterns are not.

Prior to discovering the integration mechanism, we wanted to find a way of reducing the skew from the basilar membrane image, in order to provide a visual representation that was more like what we hear. The genbmm program provides the following options for phase aligning the responses of successive filters, determined by the value of the option phase_gtf:

Value Effect

-1 Envelope alignment. Shift the channels of output horizontally so that the points of maximum response to an impulse (ie the envelope maxima) will be aligned.
-2 Envelope plus fine structure alignment. Perform envelope-peak alignment as in option -1 and then shift the fine structure phase in each channel so that a fine- structure peak coincides with the envelope peak.
-4 Envelope plus peak alignment, ‘left justified’. Align the envelopes and fine structure of all of the impulse responses along the left edge of the image.
0 No phase compensation.
+n Advance each channel by n cycles of the centre frequency of the channel. Approximate envelope alignment is achieved using phase_gtf = 3 or 4.

We experimented with a number of phase compensation schemes (Patterson et al., 1989) and concluded that the best option was envelope plus peak alignment which corresponds to a value of phase_gtf = -4. Accordingly, we recommend the use of phase_gtf values of 0 (ie no phase compensation) or -4 (envelope plus peak alignment). The remaining options are occasionally useful and so they have been left in the software. Note that for any phase compensation option other than 0 the time scale is strictly correct only for the lowest channel. For any other channel, the origin of the abscissa is offset to the right by an amount equal to the difference between ‘the envelope peak time of the lowest-frequency channel’ and ‘the envelope peak time of the given channel’.

EXAMPLES

The following command generates basilar membrane motion using the gammatone filter design (the default) for an input filename cegc:

> genbmm cegc

The following command generates basilar membrane motion using the gammatone filter design (the default) for a filterbank with cf from 200 Hz to 5000 Hz at a density of 4 filters/critical band for the same input filename:

> genbmm channels=0 mincf=200 maxcf=5000 dencf=4. cegc

The following command generates basilar membrane motion using the gammatone filter design (the default) and the audiogram function instead of the outer/middle ear filter:

> genbmm middle_ear=off audiogram=on cegc

The following command generates the basilar membrane motion using the transmission line filter design instead of the default gammatone filter:

> genbmm filter=tlf cegc

The following command generates the basilar membrane motion using the transmission line filter design and the auditory scale of Greenwood (1990):

> genbmm filter=tlf bwmin=22.85 quality=7.238 mmerb=1.0 cegc

The following command generates the basilar membrane motion using the transmission line filter design, but with the nonlinear outer hair cell feedback mechanism turned off:

> genbmm filter=tlf feedback=off cegc

REFERENCES

de Boer, E., and de Jongh, H.R. (1978). "On cochlear encoding: potentialities and limitations of the reverse-correlation technique," J. Acoust. Soc. Am., 63, 115-135.

Carney, L.H. and Yin, C.T. (1988). "Temporal coding of resonances by low-frequency auditory nerve fibers: Single fibre responses and a population model," J.Neurophysiology, 60, 1653-1677.

Cooke, M.P. (1993). Modelling Auditory Processing and Organisation, Cambridge University Press.

Giguere, C. and Woodland, P.C. (1994). "A computational model of the auditory periphery for speech and hearing research: I. Ascending path," J.Acoust. Soc. Am. 95: 331-342.

Glasberg, B.R. and B.C.J. Moore (1990). "Derivation of auditory filter shapes from notched-noise data," Hearing Research, 47, 103-138.

Greenwood, D.D. (1961). "Critical bandwidth and the frequency coordinates of the basilar membrane," J. Acoust. Soc. Am. 33, 1344-1356.

Greenwood, D.D. (1990). "A cochlear frequency position function for several species - 29 years later," J. Acoust. Soc. Am., 87, 2592-2605.

Holdsworth, J., Nimmo-Smith, I., Patterson, R.D. and Rice, P. (1988). Annex C of ’Spiral Vos Final Report, Part A: The Auditory Filterbank,’ APU contract report 2341.

Johnstone, B.M. et al. (1986). Hear Res. 22: 147-153.

Moore, B.C.J and Glasberg, B.R. (1983). "Suggested formulae for calculating auditory filter bandwidths and excitiation patterns," J. Acoust. Soc. Am. 74, pp 750-753.

Patuzzi, R., and Robertson, D. (1988). "Tuning in the mammalian cochlea," Physiological Reviews 68, 1009-1082.

Patterson, R.D. (1976). "Auditory filter shapes derived with noise stimuli," J. Acoust. Soc. Am., 59, 640-654.

Patterson, R.D. (1987). "A pulse ribbon model of monaural phase perception," J. Acoust. Soc. Am., 82, 1560-1586.

Patterson, R.D., Nimmo-Smith, I., Weber, D.L., and Milroy, R. (1982). "The deterioration of hearing with age: Frequency selectivity, the critical ratio, the audiogram, and speech threshold," J. Acoust. Soc. Am., 72, 1788-1803.

Patterson, R.D., Allerhand, M.H. and Holdsworth, J. (1992). "Auditory representations of speech sounds," In Visual representations of speech signals, Eds. Martin Cooke and Steve Beet, John Wiley & Sons. 307-314.

Patterson, R. D., Holdsworth, J., Nimmo-Smith, I., and Rice, P. (1988). SVOS Final Report: The Auditory Filterbank. APU Contract report 2341.

Patterson, R.D. and B.C.J. Moore (1986). "Auditory filters and excitation patterns as representations of frequency resolution," In: Frequency Selectivity in Hearing (B. C. J. Moore, ed.), pp. 123-177. Academic Press, London.

Schofield, D. (1985) "Visualisations of speech based on a model of the peripheral auditory system," NPL Report DITC 62/85.

Slaney, M. (1993). An efficient implementation of the Patterson Holdsworth auditory filter bank. Apple Computer Technical Report No 35.

Zwicker, E. (1961). "Subdivision of the audible frequency range into critical bands (frequenzgruppen)," J. Acoust. Soc. Am. 33, 248.

FILES

.genbmmrc The options file for genbmm.

SEE ALSO

genasa, gensgm

BUGS

There is a bug in the hiddenline plotting of genbmm. It shows up when the surface has deep valleys and there is a large phase delay. The negative peaks show through on surfaces where they should be hidden.

COPYRIGHT

Copyright (c) Applied Psychology Unit, Medical Research Council, 1995

Permission to use, copy, modify, and distribute this software without fee is hereby granted for research purposes, provided that this copyright notice appears in all copies and in all supporting documentation, and that the software is not redistributed for any fee (except for a nominal shipping charge). Anyone wanting to incorporate all or part of this software in a commercial product must obtain a license from the Medical Research Council.

The MRC makes no representations about the suitability of this software for any purpose. It is provided "as is" without express or implied warranty.

THE MRC DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE A.P.U. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

ACKNOWLEDGEMENTS

The AIM software was developed for Unix workstations by John Holdsworth and Mike Allerhand of the MRC APU, under the direction of Roy Patterson. The physiological version of AIM was developed by Christian Giguere. The options handler is by Paul Manson. The revised SAI module is by Jay Datta. Michael Akeroyd extended the postscript facilites and developed the xreview routine for auditory image cartoons.

The project was supported by the MRC and grants from the U.K. Defense Research Agency, Farnborough (Research Contract 2239); the EEC Esprit BR Porgramme, Project ACTS (3207); and the U.K. Hearing Research Trust.


SunOS 5.6 GENBMM (1) 5 Sept 1995
Generated by manServer 1.07 from /cbu/cnbh/aim/release/man/man1/genbmm.1 using man macros.