Manual Page - genbmm(1)

Manual Reference Pages - GENBMM (1)

NAME

genbmm - generate basilar membrane motion

Synopsis
Description
Options
Further Description
Motivation
Examples
References
Files
See Also
Bugs
Copyright
Acknowledgements

SYNOPSIS

genbmm [ option=value | -option ] [ filename ]

DESCRIPTION

The genbmm module of the AIM software simulates the spectral analysis
performed by the auditory system using a bank of auditory filters.
Specifically, genbmm converts an input wave into an array of filtered
waves, one for each channel of the filterbank. The surface of the
array of filtered waves is AIMs representation of basilar membrane
motion (BMM) as a function of time. AIM provides two alternative
methods for generating the BMM, linear, gammatone filterbank
(Patterson et al, 1988; Slaney 1993, Cooke, 1993), or a nonlinear,
transmission-line filterbank (Giguere and Woodland, 1994). For
convenience, they are referred to as the functional filterbank and
the physiological filterbank, respectively.

OPTIONS

There are three sets of options for genbmm; they are grouped by
function and identified by the suffixes _afb, _gtf and _tlf. The first
set controls the distribution of the filtered waves across frequency
(suffix _afb); the second specifies the shape of the gammatone filter
(suffix _gtf); and the third specifies the shape of the transmission
line filter (suffix _tlf). These three groups of options are the
subject of this manual entry, together with an option that specifies
the filter choice (gtf or tlf), and an option that specifies whether a
middle ear function should be used with the gtf filterbank. There are
also a number of Silent Options which control lesser used functions in
the transmission-line filterbank. They are listed in
docs/aimSilentOptions.

The Outer/Middle Ear function: middle_ear

In the auditory system the middle ear causes a progressive attenuation
of sound energy in the region below about 500 Hz and a progressive
attenuation in the region above about 4000 Hz. There is also a
primary auditory canal resonance around 2700 Hz that provides a boost
in sound transmission. The resulting transfer function is a normal
aspect of auditory processing and preceeds spectral analysis. If the
functional filterbank is chosen (gtf), the outer/middle ear filter
acts directly on the input wave, and the stapes velocity wave it
generates is the input to the spectral filtering stage. If the
physiological filterbank is chosen (tlf), the outer/middle ear and
cochlear filter are performed simultaneously as in the auditory
system. The only parameter associated with this function is the
middle_ear switch which makes it possible to turn the outer/middle ear
filtering off when the functional filterbank is chosen.

middle_ear Outer/middle ear switch
Switch. Default: on.

It is also possible to specify a floating point number, in which
case the middle ear output is multiplied by that value.

Note: The middle_ear option is ignored if option filter (see below)
is set to tlf. This is because the outer/middle stage and the
cochlear stage are bidirectionally coupled in the transmission
line filter implementation, and cannot be separated.

I. THE AUDITORY FILTERBANK OPTIONS: _afb

The distribution of the filters across frequency and the total
number of output filters in the bank are determined by four parameters:
channels_afb, mincf_afb, maxcf_afb, and dencf_afb.

channels_afb The number of channels in the filterbank.
Default unit: filters. Default value: 75

mincf_afb The minimum centre frequency
Default unit: Hz. Default value: 100 Hz.

maxcf_afb The maximum centre frequency
Default unit: Hz. Default value: 6000 Hz.

dencf_afb The density of the filters in the filterbank.
Defaults units: filters/critical band. Default value: off

dencf_afb provides an alternative method of specifying the number of
channels in terms of the density of filters along the frequency scale.

Note: channels_afb overrides dencf_afb whenever it has a non-zero
value. So to use dencf, you must set channels_afb=off. When
channels_afb has a non-zero value, dencf_afb is ignored.

WARNING: When using the transmission line filter (filter=tlf), the
channel density should be 3 or more filters/erb. Using a lower
density may lead to excessive spatial discretization errors (see
Giguere and Woodland (1994) for a discussion). To view a small number
of channels, use a reasonable density and reduce the number of
displayed channels using option downchannel.

The ERB scale for the gammatone auditory filterbank
is specificed with three options: bwmin_afb, quality_afb,
and mmerb_afb.

bwmin_afb The minimum bandwidth for an auditory filter.
Default unit: Hz. Default value: 24.7

quality_afb The limiting quality factor for high frequency auditory filters.
Units: scalar. Default: 9.265

mmerb_afb The length of one erb-rate unit along the basilar membrane.
Units: mm. Default: 0.89

info_afb Print filterbank information to stderr.
Switch. Default: off.

A listing of the parameters for the filter in the bank can be directed
to the terminal at run time by setting info_afb=on.

The physiological data on human cochlear frequency-position
function (Greenwood, 1990) and the psychoacoustic data on auditory
filter bandwidth (Patterson and Moore, 1986) indicate that the
spectral analysis performed in the cochlea is like a constant Q
system (quality_afb) that asymptotes to a minimum filter bandwidth
(bwmin_afb) at low centre frequencies. That is,

erb = bwmin_afb + centre-frequency/quality_afb.

If we assume, as Greenwood suggests, that each filter bandwidth
corresponds to a constant distance (mmerb_afb) along the basilar
membrane, it is possible to scale frequency in terms of erb units (or
position along the basilar membrane) by integrating the inverse of the
erb function above.

Glasberg and Moore (1990) have reviewed the available human filter
shape data and concluded that the optimum values for bwmin_afb and
quality_afb are 24.7 and 9.265, respectively, together with mmerb_afb
of 0.89. (As a rule of thumb for rapid estimation, erb = 25 + 10% of
cf ). The auditory scale used by Greenwood (1990) can be specified by
setting bwmin_afb=22.85, quality_afb=7.238 and mmerb_afb=1.0. A
reasonable approximation to the Bark scale (Zwicker, 1961) is obtained
by setting bwmin_afb=80, quality_afb=6.5 and mmerb_afb=0.89.

Auditory filter design: filter

The choice of filterbank -- linear gammatone or nonlinear transmission
line -- is determined by option filter.

filter The auditory filter design
Default: gtf. Choices: gtf, tlf, off.

When gtf is specified, the options below with suffix _gtf apply, and
when tlf is specified, the options below with suffix _tlf apply. When
off is specified, the input wave (or the stapes velocity) is passed on
directly to the next stage. This provides for non-auditory use of the
modules following the filterbank with their associated displays. For
example, the envelope of the input wave (or stapes velocity) can be
extracted using the rectification and integration modules that follow
genbmm. The entry point genasa has the most convenient default
settings for this purpose. The default value for the filter option is
gtf.

II. THE GAMMATONE FILTER OPTIONS: _gtf

order_gtf The order of the gammatone filter
Units: none. Default: 4

The order of the filter, order_gtf, determines the number of filtering
stages and so it determines the slope of the skirts of the attenuation
function and their extent. The default value is 4 and the range of
useful values is from about 2 to 8. The processing time increases
linearly with order above about order 2.

Note that the bandwidth calculation takes account of the fact that
changes in order_gtf affect bandwidth. Thus, as long as bwmin_afb is
fixed, changing the order will not affect the bandwidths of the
resulting filters. Increasing the order of the filter increases the
delay of the onset of the impulse response but it has little effect on
the shape of the envelope of the impulse response for orders greater
than three. The human auditory system is not sensitive to small phase
changes between filter channels (Patterson, 1987) and so filter order
is not well constrained by human experimental data. The default value
(4) is used because this value provides the best match between the
amplitude characteristics of the gammatone and roex filters for humans
(Patterson et al., 1988).

gain_gtf Filter output amplification
Units: scalar. Default: 4.

The ratio of input to output level across the auditory filter
when the input is a sinusoid at the cf of the filter.

phase_gtf The phase of the impulse response
Units: none. Default: 0.

In the absence of phase compensation, the surface of basilar membrane
motion has a strong rightward skew in the low-frequency channels
because the filters get progressively narrower as centre frequency
decreases, and this narrowing is accompanied by a slower filter
response. There are occassionally non-auditory reasons for wanting to
align the channels across frequency in one way or another. The
software provides four alignment systems which are discussed at the
end of this entry just before the references under the title Phase
Alignment.

III. THE TRANSMISSION LINE FILTER OPTIONS: _tlf

motion_tlf The basilar membrane output motion variable
Default: vel. Choices: vel, disp.

If vel (velocity) is specified, the output of genbmm
is the basilar membrane velocity. If disp (displacement)
is specified, the output of genbmm is the basilar membrane
displacement. The default value is vel.

outdencf_tlf The density of the filters outside the display
range.
Units: filters/critical band. Default: 4.

In the transmission line filter implementation, it is necessary to
simulate the basilar membrane over its entire length. The option
outdencf_tlf provides a means of specifying the number of additional
channels that must be computed at the basal and apical ends of the
cochlea, ie. outside the range specified by mincf_afb and maxcf_afb
(see above). These additional channels are only computed for internal
use and are not passed to the next stage of processing.

qref_tlf The local quality factor of each basilar membrane channel
Units: scalar. Default: 2.

Note: With the transmission line filter, the bandwidth is not
determined by options bwmin_afb and quality_afb at high levels but
rather by option qref_tlf (see above).

feedback_tlf The feedback gain of the outer hair cell circuit
Units: scalar. Default: 0.99

WARNING: A value for feedback_afb greater than or equal to 1.0 can
lead to unstable behaviour at low-levels (ie. oscillation). However,
the model output will not grow unbound. The growth of the oscillations
will be limited by the saturating nonlinearity of the outer hair cell
circuit, and the model output will go into a kind of limit-cycle.
These model oscillations have not yet been studied in detail and are
likely to deviate substantially from real cochlear emissions.

dsat_tlf The basilar membrane displacement at the half-saturation point
of the outer hair cell circuit
Units: cm. Default: 5.75e-6

gain_tlf Filter output amplification
Units: scalar. Default: 4.

Note: There is an internal gain of 4.0 within the software of
the transmission line model itself. The total gain is therefore
4.0 times the value for gain_tlf.

NOTE: A linearized version of the transmission line filter with
roughly the same bandwidth as the gammatone filter can be obtained by
setting feedback_tlf=0 and qref_tlf to about 10. The main difference
is that the low-frequency skirt of the transmission line filter is
less steep than that of the gammatone.

NOTE: There are a number of Silent Options for the transmission-line
filterbank. They are listed in docs/aimSilentOptions.

FURTHER DESCRIPTION

The distribution of filter centres along the ERB scale.

.

Given values for mincf_afb, maxcf_afb, and channels_afb (or
dencf_afb), the program creates an array of centre frequencies
in three steps:

1. It centres a filter at 1.0 kHz.

2. Then it centres filters below 1.0 kHz, one after another,
until it encounters mincf_afb. (Thus, mincf_afb is actually the
frequency below which no filters are centred). The step size,
that is the distance between centre frequencies, is determined
by dencf_afb. When dencf_afb is equal to one, the centre
frequencies are 1 ERB apart. The ERB is the Equivalent
Rectangular Bandwidth of the filter (about 14% larger than the 3
dB bandwidth of the filter). The function relating the ERB to the
centre frequency of the filter is taken from a critical band
equation introduced by Greenwood (1961) and adapted to human
auditory masking by Glasberg and Moore (1990).

3. Finally, the program centres filters one after another in
the region above 1 kHz until it encounters maxcf_afb (which is,
actually, the frequency above which no filters are centred). When
dencf_afb is increased, say to two, the program allocates two
filters per critical band and spaces them at half ERB steps.

Note: It is not the bandwidths of the filters that are
controlled by dencf_afb but rather the density of filters along
the frequency axis. Thus, doubling dencf_afb does not cause the
bandwidth of the filters to be halved; rather it results in more
overlap between adjacent filters. With regard to the images
produced by genbmm, dencf_afb determines the density of lines on
the surface rather than the shape of the features that appear on
the surface.

MOTIVATION

The motivation for adopting the gammatone filter shape is
threefold:

1. It provides an excellent summary of physiological data
concerning the impulse response of primary auditory neurons in
small mammals such as cats (de Boer and de Jongh, 1978; Carney and
Yin, 1989)

2. The amplitude characteristic of the gammatone filter is very
similar to that of the Roex filter commonly used to represent the
human auditory filter (Patterson, et al, 1982; Schofield, 1985;
Patterson and Moore, 1986; Patterson et al, 1988).

3. There are recursive gammatone filters that make the calculation
particularly fast both on general purpose computers and special
purpose DSP chips (Holdsworth et al, 1988; Cooke, 1993; Slaney, 1993).

In summary, the gammatone filter is designed to provide a reasonable
trade-off between accuracy in simulating basilar membrane motion, and
computational load.

The motivation for adopting the transmission line filter is
as follows:

1. The outer hair cell circuit of the transmission line filter is
level dependent and so this design produces level-dependent basilar
membrane tuning curves (Giguere and Woodland, 1994). There is now
ample evidence that the basilar membrane motion is indeed highly
nonlinear and a major source of level compression (eg. Johnstone et
al., 1986).

2. The internal structure of the transmission line filter model is
based on the physics of the auditory periphery and therefore provides
a more realistic cochlear simulation than parallel filterbanks. It
generates combination tones of the form 2f1-f2 as observed in the
auditory system and it has the potential to generate cochlear echoes.

3. The wave-digital-filter implementation of the transmission line
filterbank is only about twice as slow as the gammatone filterbank
for an equivalent number of channels.

Phase Alignment

There is no question that the output of the cochlea has a phase lag
corresponding to the strong rightward skew. However, perceptual
evidence indicates that this phase lag has to be enormous (> 4ms) to
affect what we hear; indeed, reversing the phase lag with synthetic
stimuli does not change what we hear (Patterson, 1987). Phase
information that appears in the basilar membrane motion but which we
do not hear, is removed in the third module by the strobe mechanism of
the temporal integration process. As a result, the stabilised auditory
images are always phase aligned even though the basilar membrane
motion and the neural activity patterns are not.

Prior to discovering the integration mechanism, we wanted to find
a way of reducing the skew from the basilar membrane image, in
order to provide a visual representation that was more like what
we hear. The genbmm program provides the following options for
phase aligning the responses of successive filters, determined
by the value of the option phase_gtf:

Value Effect

-1	Envelope alignment. Shift the channels of output horizontally so that the points of maximum response to an impulse (ie the envelope maxima) will be aligned.
-2	Envelope plus fine structure alignment. Perform envelope-peak alignment as in option -1 and then shift the fine structure phase in each channel so that a fine- structure peak coincides with the envelope peak.
-4	Envelope plus peak alignment, left justified. Align the envelopes and fine structure of all of the impulse responses along the left edge of the image.

0	No phase compensation.

+n	Advance each channel by n cycles of the centre frequency of the channel. Approximate envelope alignment is achieved using phase_gtf = 3 or 4.

We experimented with a number of phase compensation schemes
(Patterson et al., 1989) and concluded that the best option was
envelope plus peak alignment which corresponds to a value of
phase_gtf = -4. Accordingly, we recommend the use of phase_gtf
values of 0 (ie no phase compensation) or -4 (envelope plus peak
alignment). The remaining options are occasionally useful and so
they have been left in the software.
Note that for any phase compensation option other than 0 the time
scale is strictly correct only for the lowest channel. For any
other channel, the origin of the abscissa is offset to the right
by an amount equal to the difference between the envelope peak
time of the lowest-frequency channel and the envelope peak time
of the given channel.

EXAMPLES

The following command generates basilar membrane motion using the
gammatone filter design (the default) for an input filename cegc:

> genbmm cegc

The following command generates basilar membrane motion using the
gammatone filter design (the default) for a filterbank with cf from
200 Hz to 5000 Hz at a density of 4 filters/critical band for the same
input filename:

> genbmm channels=0 mincf=200 maxcf=5000 dencf=4. cegc

The following command generates basilar membrane motion using the
gammatone filter design (the default) and the audiogram function
instead of the outer/middle ear filter:

> genbmm middle_ear=off audiogram=on cegc

The following command generates the basilar membrane motion using the
transmission line filter design instead of the default gammatone
filter:

> genbmm filter=tlf cegc

The following command generates the basilar membrane motion using the
transmission line filter design and the auditory scale of Greenwood
(1990):

> genbmm filter=tlf bwmin=22.85 quality=7.238 mmerb=1.0 cegc

The following command generates the basilar membrane motion using the
transmission line filter design, but with the nonlinear outer hair
cell feedback mechanism turned off:

> genbmm filter=tlf feedback=off cegc

REFERENCES

de Boer, E., and de Jongh, H.R. (1978).
"On cochlear encoding: potentialities and limitations of the
reverse-correlation technique," J. Acoust. Soc. Am., 63, 115-135.

Carney, L.H. and Yin, C.T. (1988).
"Temporal coding of resonances by low-frequency auditory nerve fibers:
Single fibre responses and a population model," J.Neurophysiology, 60,
1653-1677.

Cooke, M.P. (1993). Modelling Auditory Processing and
Organisation, Cambridge University Press.

Giguere, C. and Woodland, P.C. (1994).
"A computational model of the auditory periphery for speech and
hearing research: I. Ascending path," J.Acoust. Soc. Am. 95: 331-342.

Glasberg, B.R. and B.C.J. Moore (1990).
"Derivation of auditory filter shapes from notched-noise data,"
Hearing Research, 47, 103-138.

Greenwood, D.D. (1961). "Critical bandwidth
and the frequency coordinates of the basilar membrane,"
J. Acoust. Soc. Am. 33, 1344-1356.

Greenwood, D.D. (1990). "A cochlear frequency position function
for several species - 29 years
later," J. Acoust. Soc. Am., 87, 2592-2605.

Holdsworth, J., Nimmo-Smith, I., Patterson, R.D. and
Rice, P. (1988). Annex C of Spiral Vos Final Report, Part A: The
Auditory Filterbank, APU contract report 2341.

Johnstone, B.M. et al. (1986). Hear Res. 22: 147-153.

Moore, B.C.J and Glasberg, B.R. (1983).
"Suggested formulae for calculating auditory filter bandwidths and
excitiation patterns," J. Acoust. Soc. Am. 74, pp 750-753.

Patuzzi, R., and Robertson, D. (1988).
"Tuning in the mammalian cochlea," Physiological Reviews 68,
1009-1082.

Patterson, R.D. (1976). "Auditory filter shapes
derived with noise stimuli," J. Acoust. Soc. Am., 59, 640-654.

Patterson, R.D. (1987). "A pulse ribbon model of
monaural phase perception," J. Acoust. Soc. Am., 82, 1560-1586.

Patterson, R.D., Nimmo-Smith, I., Weber, D.L., and
Milroy, R. (1982). "The deterioration of hearing with age: Frequency
selectivity, the critical ratio, the audiogram, and speech threshold,"
J. Acoust. Soc. Am., 72, 1788-1803.

Patterson, R.D., Allerhand, M.H. and Holdsworth, J. (1992).
"Auditory representations of speech sounds," In Visual
representations of speech signals, Eds. Martin Cooke and Steve
Beet, John Wiley & Sons. 307-314.

Patterson, R. D., Holdsworth, J., Nimmo-Smith, I., and
Rice, P. (1988). SVOS Final Report: The Auditory Filterbank. APU
Contract report 2341.

Patterson, R.D. and B.C.J. Moore (1986).
"Auditory filters and excitation patterns as representations of
frequency resolution," In: Frequency Selectivity in Hearing (B. C. J.
Moore, ed.), pp. 123-177. Academic Press, London.

Schofield, D. (1985) "Visualisations of speech
based on a model of the peripheral auditory system," NPL Report DITC
62/85.

Slaney, M. (1993). An efficient implementation of
the Patterson Holdsworth auditory filter bank. Apple Computer
Technical Report No 35.

Zwicker, E. (1961). "Subdivision of the
audible frequency range into critical bands (frequenzgruppen),"
J. Acoust. Soc. Am. 33, 248.

FILES

.genbmmrc The options file for genbmm.

BUGS

There is a bug in the hiddenline plotting of genbmm. It shows up when
the surface has deep valleys and there is a large phase delay. The
negative peaks show through on surfaces where they should be hidden.

COPYRIGHT

Copyright (c) Applied Psychology Unit, Medical Research Council, 1995

Permission to use, copy, modify, and distribute this software without fee
is hereby granted for research purposes, provided that this copyright
notice appears in all copies and in all supporting documentation, and that
the software is not redistributed for any fee (except for a nominal
shipping charge). Anyone wanting to incorporate all or part of this
software in a commercial product must obtain a license from the Medical
Research Council.

The MRC makes no representations about the suitability of this
software for any purpose. It is provided "as is" without express or
implied warranty.

THE MRC DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL
THE A.P.U. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES
OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
SOFTWARE.

ACKNOWLEDGEMENTS

The AIM software was developed for Unix workstations by John
Holdsworth and Mike Allerhand of the MRC APU, under the direction of
Roy Patterson. The physiological version of AIM was developed by
Christian Giguere. The options handler is by Paul Manson. The revised
SAI module is by Jay Datta. Michael Akeroyd extended the postscript
facilites and developed the xreview routine for auditory image
cartoons.

The project was supported by the MRC and grants from the U.K. Defense
Research Agency, Farnborough (Research Contract 2239); the EEC Esprit
BR Porgramme, Project ACTS (3207); and the U.K. Hearing Research Trust.

SunOS 5.6

GENBMM (1)

5 Sept 1995

Generated by manServer 1.07 from /cbu/cnbh/aim/release/man/man1/genbmm.1 using man macros.

channels_afb	The number of channels in the filterbank. Default unit: filters. Default value: 75
mincf_afb	The minimum centre frequency Default unit: Hz. Default value: 100 Hz.
maxcf_afb	The maximum centre frequency Default unit: Hz. Default value: 6000 Hz.
dencf_afb	The density of the filters in the filterbank. Defaults units: filters/critical band. Default value: off dencf_afb provides an alternative method of specifying the number of channels in terms of the density of filters along the frequency scale. Note: channels_afb overrides dencf_afb whenever it has a non-zero value. So to use dencf, you must set channels_afb=off. When channels_afb has a non-zero value, dencf_afb is ignored.
WARNING: When using the transmission line filter (filter=tlf), the channel density should be 3 or more filters/erb. Using a lower density may lead to excessive spatial discretization errors (see Giguere and Woodland (1994) for a discussion). To view a small number of channels, use a reasonable density and reduce the number of displayed channels using option downchannel.
The ERB scale for the gammatone auditory filterbank is specificed with three options: bwmin_afb, quality_afb, and mmerb_afb.
bwmin_afb	The minimum bandwidth for an auditory filter. Default unit: Hz. Default value: 24.7
quality_afb	The limiting quality factor for high frequency auditory filters. Units: scalar. Default: 9.265
mmerb_afb	The length of one erb-rate unit along the basilar membrane. Units: mm. Default: 0.89
info_afb	Print filterbank information to stderr. Switch. Default: off. A listing of the parameters for the filter in the bank can be directed to the terminal at run time by setting info_afb=on. The physiological data on human cochlear frequency-position function (Greenwood, 1990) and the psychoacoustic data on auditory filter bandwidth (Patterson and Moore, 1986) indicate that the spectral analysis performed in the cochlea is like a constant Q system (quality_afb) that asymptotes to a minimum filter bandwidth (bwmin_afb) at low centre frequencies. That is, erb = bwmin_afb + centre-frequency/quality_afb. If we assume, as Greenwood suggests, that each filter bandwidth corresponds to a constant distance (mmerb_afb) along the basilar membrane, it is possible to scale frequency in terms of erb units (or position along the basilar membrane) by integrating the inverse of the erb function above. Glasberg and Moore (1990) have reviewed the available human filter shape data and concluded that the optimum values for bwmin_afb and quality_afb are 24.7 and 9.265, respectively, together with mmerb_afb of 0.89. (As a rule of thumb for rapid estimation, erb = 25 + 10% of cf ). The auditory scale used by Greenwood (1990) can be specified by setting bwmin_afb=22.85, quality_afb=7.238 and mmerb_afb=1.0. A reasonable approximation to the Bark scale (Zwicker, 1961) is obtained by setting bwmin_afb=80, quality_afb=6.5 and mmerb_afb=0.89.

genbmm