genbmm
Manual Reference Pages - GENBMM (1)
genbmm - generate basilar membrane motion
CONTENTS
Synopsis
Description
Options
Further Description
Motivation
Examples
References
Files
See Also
Bugs
Copyright
Acknowledgements
SYNOPSIS
genbmm [ option=value | -option ] [ filename ]
DESCRIPTION
The genbmm module of the AIM software simulates the spectral analysisperformed by the auditory system using a bank of auditory filters.Specifically, genbmm converts an input wave into an array of filteredwaves, one for each channel of the filterbank. The surface of thearray of filtered waves is AIMs representation of basilar membranemotion (BMM) as a function of time. AIM provides two alternativemethods for generating the BMM, linear, gammatone filterbank(Patterson et al, 1988; Slaney 1993, Cooke, 1993), or a nonlinear,transmission-line filterbank (Giguere and Woodland, 1994). Forconvenience, they are referred to as the functional filterbank andthe physiological filterbank, respectively.
OPTIONS
There are three sets of options for genbmm; they are grouped byfunction and identified by the suffixes _afb, _gtf and _tlf. The firstset controls the distribution of the filtered waves across frequency(suffix _afb); the second specifies the shape of the gammatone filter(suffix _gtf); and the third specifies the shape of the transmissionline filter (suffix _tlf). These three groups of options are thesubject of this manual entry, together with an option that specifiesthe filter choice (gtf or tlf), and an option that specifies whether amiddle ear function should be used with the gtf filterbank. There arealso a number of Silent Options which control lesser used functions inthe transmission-line filterbank. They are listed indocs/aimSilentOptions.
The Outer/Middle Ear function: middle_ear
In the auditory system the middle ear causes a progressive attenuationof sound energy in the region below about 500 Hz and a progressiveattenuation in the region above about 4000 Hz. There is also aprimary auditory canal resonance around 2700 Hz that provides a boostin sound transmission. The resulting transfer function is a normalaspect of auditory processing and preceeds spectral analysis. If thefunctional filterbank is chosen (gtf), the outer/middle ear filteracts directly on the input wave, and the stapes velocity wave itgenerates is the input to the spectral filtering stage. If thephysiological filterbank is chosen (tlf), the outer/middle ear andcochlear filter are performed simultaneously as in the auditorysystem. The only parameter associated with this function is themiddle_ear switch which makes it possible to turn the outer/middle earfiltering off when the functional filterbank is chosen.
middle_ear Outer/middle ear switchSwitch. Default: on. It is also possible to specify a floating point number, in whichcase the middle ear output is multiplied by that value.
Note: The middle_ear option is ignored if option filter (see below)is set to tlf. This is because the outer/middle stage and thecochlear stage are bidirectionally coupled in the transmissionline filter implementation, and cannot be separated.
I. THE AUDITORY FILTERBANK OPTIONS: _afb
The distribution of the filters across frequency and the totalnumber of output filters in the bank are determined by four parameters:channels_afb, mincf_afb, maxcf_afb, and dencf_afb.
channels_afb The number of channels in the filterbank.Default unit: filters. Default value: 75 mincf_afb The minimum centre frequencyDefault unit: Hz. Default value: 100 Hz. maxcf_afb The maximum centre frequencyDefault unit: Hz. Default value: 6000 Hz. dencf_afb The density of the filters in the filterbank.Defaults units: filters/critical band. Default value: off dencf_afb provides an alternative method of specifying the number ofchannels in terms of the density of filters along the frequency scale.
Note: channels_afb overrides dencf_afb whenever it has a non-zerovalue. So to use dencf, you must set channels_afb=off. Whenchannels_afb has a non-zero value, dencf_afb is ignored.
WARNING: When using the transmission line filter (filter=tlf), thechannel density should be 3 or more filters/erb. Using a lowerdensity may lead to excessive spatial discretization errors (seeGiguere and Woodland (1994) for a discussion). To view a small numberof channels, use a reasonable density and reduce the number ofdisplayed channels using option downchannel. The ERB scale for the gammatone auditory filterbankis specificed with three options: bwmin_afb, quality_afb,and mmerb_afb. bwmin_afb The minimum bandwidth for an auditory filter.Default unit: Hz. Default value: 24.7 quality_afb The limiting quality factor for high frequency auditory filters.Units: scalar. Default: 9.265 mmerb_afb The length of one erb-rate unit along the basilar membrane.Units: mm. Default: 0.89 info_afb Print filterbank information to stderr.Switch. Default: off. A listing of the parameters for the filter in the bank can be directedto the terminal at run time by setting info_afb=on.
The physiological data on human cochlear frequency-positionfunction (Greenwood, 1990) and the psychoacoustic data on auditoryfilter bandwidth (Patterson and Moore, 1986) indicate that thespectral analysis performed in the cochlea is like a constant Qsystem (quality_afb) that asymptotes to a minimum filter bandwidth(bwmin_afb) at low centre frequencies. That is,
erb = bwmin_afb + centre-frequency/quality_afb.
If we assume, as Greenwood suggests, that each filter bandwidthcorresponds to a constant distance (mmerb_afb) along the basilarmembrane, it is possible to scale frequency in terms of erb units (orposition along the basilar membrane) by integrating the inverse of theerb function above.
Glasberg and Moore (1990) have reviewed the available human filtershape data and concluded that the optimum values for bwmin_afb andquality_afb are 24.7 and 9.265, respectively, together with mmerb_afbof 0.89. (As a rule of thumb for rapid estimation, erb = 25 + 10% ofcf ). The auditory scale used by Greenwood (1990) can be specified bysetting bwmin_afb=22.85, quality_afb=7.238 and mmerb_afb=1.0. Areasonable approximation to the Bark scale (Zwicker, 1961) is obtainedby setting bwmin_afb=80, quality_afb=6.5 and mmerb_afb=0.89.
Auditory filter design: filter
The choice of filterbank -- linear gammatone or nonlinear transmissionline -- is determined by option filter.
filter The auditory filter designDefault: gtf. Choices: gtf, tlf, off. When gtf is specified, the options below with suffix _gtf apply, andwhen tlf is specified, the options below with suffix _tlf apply. Whenoff is specified, the input wave (or the stapes velocity) is passed ondirectly to the next stage. This provides for non-auditory use of themodules following the filterbank with their associated displays. Forexample, the envelope of the input wave (or stapes velocity) can beextracted using the rectification and integration modules that followgenbmm. The entry point genasa has the most convenient defaultsettings for this purpose. The default value for the filter option isgtf.
II. THE GAMMATONE FILTER OPTIONS: _gtf
order_gtf The order of the gammatone filterUnits: none. Default: 4 The order of the filter, order_gtf, determines the number of filteringstages and so it determines the slope of the skirts of the attenuationfunction and their extent. The default value is 4 and the range ofuseful values is from about 2 to 8. The processing time increaseslinearly with order above about order 2.
Note that the bandwidth calculation takes account of the fact thatchanges in order_gtf affect bandwidth. Thus, as long as bwmin_afb isfixed, changing the order will not affect the bandwidths of theresulting filters. Increasing the order of the filter increases thedelay of the onset of the impulse response but it has little effect onthe shape of the envelope of the impulse response for orders greaterthan three. The human auditory system is not sensitive to small phasechanges between filter channels (Patterson, 1987) and so filter orderis not well constrained by human experimental data. The default value(4) is used because this value provides the best match between theamplitude characteristics of the gammatone and roex filters for humans(Patterson et al., 1988).
gain_gtf Filter output amplificationUnits: scalar. Default: 4. The ratio of input to output level across the auditory filterwhen the input is a sinusoid at the cf of the filter.
phase_gtf The phase of the impulse responseUnits: none. Default: 0. In the absence of phase compensation, the surface of basilar membranemotion has a strong rightward skew in the low-frequency channelsbecause the filters get progressively narrower as centre frequencydecreases, and this narrowing is accompanied by a slower filterresponse. There are occassionally non-auditory reasons for wanting toalign the channels across frequency in one way or another. Thesoftware provides four alignment systems which are discussed at theend of this entry just before the references under the title PhaseAlignment.
III. THE TRANSMISSION LINE FILTER OPTIONS: _tlf
NOTE: A linearized version of the transmission line filter withroughly the same bandwidth as the gammatone filter can be obtained bysetting feedback_tlf=0 and qref_tlf to about 10. The main differenceis that the low-frequency skirt of the transmission line filter isless steep than that of the gammatone.
motion_tlf The basilar membrane output motion variableDefault: vel. Choices: vel, disp. If vel (velocity) is specified, the output of genbmmis the basilar membrane velocity. If disp (displacement)is specified, the output of genbmm is the basilar membranedisplacement. The default value is vel.
outdencf_tlf The density of the filters outside the displayrange.Units: filters/critical band. Default: 4. In the transmission line filter implementation, it is necessary tosimulate the basilar membrane over its entire length. The optionoutdencf_tlf provides a means of specifying the number of additionalchannels that must be computed at the basal and apical ends of thecochlea, ie. outside the range specified by mincf_afb and maxcf_afb(see above). These additional channels are only computed for internaluse and are not passed to the next stage of processing.
qref_tlf The local quality factor of each basilar membrane channelUnits: scalar. Default: 2. Note: With the transmission line filter, the bandwidth is notdetermined by options bwmin_afb and quality_afb at high levels butrather by option qref_tlf (see above).
feedback_tlf The feedback gain of the outer hair cell circuitUnits: scalar. Default: 0.99 WARNING: A value for feedback_afb greater than or equal to 1.0 canlead to unstable behaviour at low-levels (ie. oscillation). However,the model output will not grow unbound. The growth of the oscillationswill be limited by the saturating nonlinearity of the outer hair cellcircuit, and the model output will go into a kind of limit-cycle.These model oscillations have not yet been studied in detail and arelikely to deviate substantially from real cochlear emissions.
dsat_tlf The basilar membrane displacement at the half-saturation pointof the outer hair cell circuitUnits: cm. Default: 5.75e-6 gain_tlf Filter output amplificationUnits: scalar. Default: 4. Note: There is an internal gain of 4.0 within the software ofthe transmission line model itself. The total gain is therefore4.0 times the value for gain_tlf.
NOTE: There are a number of Silent Options for the transmission-linefilterbank. They are listed in docs/aimSilentOptions.
The distribution of filter centres along the ERB scale.
.
Given values for mincf_afb, maxcf_afb, and channels_afb (ordencf_afb), the program creates an array of centre frequenciesin three steps:
1. It centres a filter at 1.0 kHz.
2. Then it centres filters below 1.0 kHz, one after another,until it encounters mincf_afb. (Thus, mincf_afb is actually thefrequency below which no filters are centred). The step size,that is the distance between centre frequencies, is determinedby dencf_afb. When dencf_afb is equal to one, the centrefrequencies are 1 ERB apart. The ERB is the EquivalentRectangular Bandwidth of the filter (about 14% larger than the 3dB bandwidth of the filter). The function relating the ERB to thecentre frequency of the filter is taken from a critical bandequation introduced by Greenwood (1961) and adapted to humanauditory masking by Glasberg and Moore (1990).
3. Finally, the program centres filters one after another inthe region above 1 kHz until it encounters maxcf_afb (which is,actually, the frequency above which no filters are centred). Whendencf_afb is increased, say to two, the program allocates twofilters per critical band and spaces them at half ERB steps.
Note: It is not the bandwidths of the filters that arecontrolled by dencf_afb but rather the density of filters alongthe frequency axis. Thus, doubling dencf_afb does not cause thebandwidth of the filters to be halved; rather it results in moreoverlap between adjacent filters. With regard to the imagesproduced by genbmm, dencf_afb determines the density of lines onthe surface rather than the shape of the features that appear onthe surface.
The motivation for adopting the gammatone filter shape isthreefold:
1. It provides an excellent summary of physiological dataconcerning the impulse response of primary auditory neurons insmall mammals such as cats (de Boer and de Jongh, 1978; Carney andYin, 1989)
2. The amplitude characteristic of the gammatone filter is verysimilar to that of the Roex filter commonly used to represent thehuman auditory filter (Patterson, et al, 1982; Schofield, 1985;Patterson and Moore, 1986; Patterson et al, 1988).
3. There are recursive gammatone filters that make the calculationparticularly fast both on general purpose computers and specialpurpose DSP chips (Holdsworth et al, 1988; Cooke, 1993; Slaney, 1993).
In summary, the gammatone filter is designed to provide a reasonabletrade-off between accuracy in simulating basilar membrane motion, andcomputational load.
The motivation for adopting the transmission line filter isas follows:
1. The outer hair cell circuit of the transmission line filter islevel dependent and so this design produces level-dependent basilarmembrane tuning curves (Giguere and Woodland, 1994). There is nowample evidence that the basilar membrane motion is indeed highlynonlinear and a major source of level compression (eg. Johnstone etal., 1986).
2. The internal structure of the transmission line filter model isbased on the physics of the auditory periphery and therefore providesa more realistic cochlear simulation than parallel filterbanks. Itgenerates combination tones of the form 2f1-f2 as observed in theauditory system and it has the potential to generate cochlear echoes.
3. The wave-digital-filter implementation of the transmission linefilterbank is only about twice as slow as the gammatone filterbankfor an equivalent number of channels.
There is no question that the output of the cochlea has a phase lagcorresponding to the strong rightward skew. However, perceptualevidence indicates that this phase lag has to be enormous (> 4ms) toaffect what we hear; indeed, reversing the phase lag with syntheticstimuli does not change what we hear (Patterson, 1987). Phaseinformation that appears in the basilar membrane motion but which wedo not hear, is removed in the third module by the strobe mechanism ofthe temporal integration process. As a result, the stabilised auditoryimages are always phase aligned even though the basilar membranemotion and the neural activity patterns are not.
Prior to discovering the integration mechanism, we wanted to finda way of reducing the skew from the basilar membrane image, inorder to provide a visual representation that was more like whatwe hear. The genbmm program provides the following options forphase aligning the responses of successive filters, determinedby the value of the option phase_gtf:
Value Effect
| -1 | Envelope alignment.Shift the channels of output horizontally so that the points ofmaximum response to an impulse (ie the envelope maxima) will be aligned. |
| -2 | Envelope plus fine structure alignment.Perform envelope-peak alignment as in option -1 and then shift thefine structure phase in each channel so that a fine- structure peakcoincides with the envelope peak. |
| -4 | Envelope plus peak alignment, left justified.Align the envelopes and fine structure of all of the impulse responsesalong the left edge of the image. |
| No phase compensation. | |
| +n | Advance each channel by n cycles of the centre frequency of the channel.Approximate envelope alignment is achieved using phase_gtf = 3or 4. |
We experimented with a number of phase compensation schemes(Patterson et al., 1989) and concluded that the best option wasenvelope plus peak alignment which corresponds to a value ofphase_gtf = -4. Accordingly, we recommend the use of phase_gtfvalues of 0 (ie no phase compensation) or -4 (envelope plus peakalignment). The remaining options are occasionally useful and sothey have been left in the software.Note that for any phase compensation option other than 0 the timescale is strictly correct only for the lowest channel. For anyother channel, the origin of the abscissa is offset to the rightby an amount equal to the difference between the envelope peaktime of the lowest-frequency channel and the envelope peak timeof the given channel.
The following command generates basilar membrane motion using thegammatone filter design (the default) for an input filename cegc:
> genbmm cegc
The following command generates basilar membrane motion using thegammatone filter design (the default) for a filterbank with cf from200 Hz to 5000 Hz at a density of 4 filters/critical band for the sameinput filename:
> genbmm channels=0 mincf=200 maxcf=5000 dencf=4. cegc
The following command generates basilar membrane motion using thegammatone filter design (the default) and the audiogram functioninstead of the outer/middle ear filter:
> genbmm middle_ear=off audiogram=on cegc
The following command generates the basilar membrane motion using thetransmission line filter design instead of the default gammatonefilter:
> genbmm filter=tlf cegc
The following command generates the basilar membrane motion using thetransmission line filter design and the auditory scale of Greenwood(1990):
> genbmm filter=tlf bwmin=22.85 quality=7.238 mmerb=1.0 cegc
The following command generates the basilar membrane motion using thetransmission line filter design, but with the nonlinear outer haircell feedback mechanism turned off:
> genbmm filter=tlf feedback=off cegc
| de Boer, E., and de Jongh, H.R. (1978)."On cochlear encoding: potentialities and limitations of thereverse-correlation technique," J. Acoust. Soc. Am., 63, 115-135. | |
| Carney, L.H. and Yin, C.T. (1988)."Temporal coding of resonances by low-frequency auditory nerve fibers:Single fibre responses and a population model," J.Neurophysiology, 60,1653-1677. | |
| Cooke, M.P. (1993). Modelling Auditory Processing andOrganisation, Cambridge University Press. | |
| Giguere, C. and Woodland, P.C. (1994)."A computational model of the auditory periphery for speech andhearing research: I. Ascending path," J.Acoust. Soc. Am. 95: 331-342. | |
| Glasberg, B.R. and B.C.J. Moore (1990)."Derivation of auditory filter shapes from notched-noise data,"Hearing Research, 47, 103-138. | |
| Greenwood, D.D. (1961). "Critical bandwidthand the frequency coordinates of the basilar membrane,"J. Acoust. Soc. Am. 33, 1344-1356. | |
| Greenwood, D.D. (1990). "A cochlear frequency position functionfor several species - 29 yearslater," J. Acoust. Soc. Am., 87, 2592-2605. | |
| Holdsworth, J., Nimmo-Smith, I., Patterson, R.D. andRice, P. (1988). Annex C of Spiral Vos Final Report, Part A: TheAuditory Filterbank, APU contract report 2341. | |
| Johnstone, B.M. et al. (1986). Hear Res. 22: 147-153. | |
| Moore, B.C.J and Glasberg, B.R. (1983)."Suggested formulae for calculating auditory filter bandwidths andexcitiation patterns," J. Acoust. Soc. Am. 74, pp 750-753. | |
| Patuzzi, R., and Robertson, D. (1988)."Tuning in the mammalian cochlea," Physiological Reviews 68,1009-1082. | |
| Patterson, R.D. (1976). "Auditory filter shapesderived with noise stimuli," J. Acoust. Soc. Am., 59, 640-654. | |
| Patterson, R.D. (1987). "A pulse ribbon model ofmonaural phase perception," J. Acoust. Soc. Am., 82, 1560-1586. | |
| Patterson, R.D., Nimmo-Smith, I., Weber, D.L., andMilroy, R. (1982). "The deterioration of hearing with age: Frequencyselectivity, the critical ratio, the audiogram, and speech threshold,"J. Acoust. Soc. Am., 72, 1788-1803. | |
| Patterson, R.D., Allerhand, M.H. and Holdsworth, J. (1992)."Auditory representations of speech sounds," In Visualrepresentations of speech signals, Eds. Martin Cooke and SteveBeet, John Wiley & Sons. 307-314. | |
| Patterson, R. D., Holdsworth, J., Nimmo-Smith, I., andRice, P. (1988). SVOS Final Report: The Auditory Filterbank. APUContract report 2341. | |
| Patterson, R.D. and B.C.J. Moore (1986)."Auditory filters and excitation patterns as representations offrequency resolution," In: Frequency Selectivity in Hearing (B. C. J.Moore, ed.), pp. 123-177. Academic Press, London. | |
| Schofield, D. (1985) "Visualisations of speechbased on a model of the peripheral auditory system," NPL Report DITC62/85. | |
| Slaney, M. (1993). An efficient implementation ofthe Patterson Holdsworth auditory filter bank. Apple ComputerTechnical Report No 35. | |
| Zwicker, E. (1961). "Subdivision of theaudible frequency range into critical bands (frequenzgruppen),"J. Acoust. Soc. Am. 33, 248. | |
.genbmmrc The options file for genbmm.
SEE ALSO
genasa, gensgm
BUGS
There is a bug in the hiddenline plotting of genbmm. It shows up whenthe surface has deep valleys and there is a large phase delay. Thenegative peaks show through on surfaces where they should be hidden.
COPYRIGHT
Copyright (c) Applied Psychology Unit, Medical Research Council, 1995
Permission to use, copy, modify, and distribute this software without feeis hereby granted for research purposes, provided that this copyrightnotice appears in all copies and in all supporting documentation, and thatthe software is not redistributed for any fee (except for a nominalshipping charge). Anyone wanting to incorporate all or part of thissoftware in a commercial product must obtain a license from the MedicalResearch Council.
The MRC makes no representations about the suitability of thissoftware for any purpose. It is provided "as is" without express orimplied warranty.
THE MRC DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDINGALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALLTHE A.P.U. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGESOR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THISSOFTWARE.
ACKNOWLEDGEMENTS
The AIM software was developed for Unix workstations by JohnHoldsworth and Mike Allerhand of the MRC APU, under the direction ofRoy Patterson. The physiological version of AIM was developed byChristian Giguere. The options handler is by Paul Manson. The revisedSAI module is by Jay Datta. Michael Akeroyd extended the postscriptfacilites and developed the xreview routine for auditory imagecartoons.
The project was supported by the MRC and grants from the U.K. DefenseResearch Agency, Farnborough (Research Contract 2239); the EEC EspritBR Porgramme, Project ACTS (3207); and the U.K. Hearing Research Trust.
| SunOS 5.6 | GENBMM (1) | 5 Sept 1995 |