genasa
Manual Reference Pages - GENASA (1)
genasa - generate auditory spectral analysis
CONTENTS
Synopsis
Description
I. Display Defaults
Iii. Leaky Integration
References
Files
See Also
Bugs
Copyright
Acknowledgements
SYNOPSIS
genasa [ option=value | -option ] [ filename ]
DESCRIPTION
The genasa module of the AIM software performs a time-domain spectralanalysis on the input wave using a bank of auditory filters, andsummarises the information in a sequence of auditory spectra. Thespectral analysis converts the input wave into an array of filteredwaves, one for each channel of the filterbank. The surface of thearray of filtered waves is AIMs representation of basilar membranemotion (BMM) as a function of time (Patterson et al. 1995). Thesequence of auditory spectra is produced by calculating the envelopeof the BMM and extracting spectral slices from the envelope everyrectifing, compressing, and lowpass filtering the individual BMM wavesas they flow from the filterbank (Patterson et al. 1992a, 1993a,Patterson, 1994a).
The auditory spectrum produced by genasa is intended to simulate thespectral representation of a sound as it occurs in the peripheralauditory system just prior to neural transduction. As a result, thefrequency resolution of the analysis varies with the center frequencyof the channel, and the distribution of channels across frequency ischosen to match that in the auditory system (Patterson and Moore,1986; Glasberg and Moore, 1990). The auditory spectrum is a plot ofthe activity in each channel as a function of the centre frequency ofthe auditory filter (in ERBs). The representation is referred to asan auditory spectrum to distinguish it from the Fourier energyspectrum (Patterson, 1994a). The suffix asa is short for auditoryspectral analysis; it is used to distinguish this spectralrepresentation from three other spectral representations provided bythe AIM software (epn excitation pattern, sgm auditoryspectrogram, and cgm cochleogram).
The spectral analysis performed by genasa is the same as thatperformed by genbmm. The primary differences are in the defaults forthe Displays, the Compression and the Leaky Integration used toconstruct the spectral slices from the BMM. As a result, this manualentry is restricted to describing the options that differ fromthose in genbmm.
I. DISPLAY DEFAULTS
The default values for three of the display options are reset toproduce a spectral format rather than a landscape; specifically,display=excitation, bottom=0 and top=2500. The number of channels isincreased to 128 to ensure reasonable frequency resolution in thespectral display.
I. RECTIFICATION AND COMPRESSION
The adaptive thresholding process begins with rectification andcompression of the BMM. The default form of compression islogarithmic; it has the advantage of transforming the exponentialenvelope of the ringing response of the gammatone filter into a lineardecay with time. There is evidence, however, that auditorycompression may be better represented by power compression with anexponent in the range of 0.5. It is also advisable to insert powercompression before the Meddis haircell when driving it with agammatone filter. For a discussion of these issues, seedocs/aimMeddisHewitt. To accommodate power compression and theassembly of different configurations of AIM, the rectification andcompression options are presented separately in the options listbefore the neural transduction section.
| rectify | Apply half-wave rectification to filtered wavesSwitch. Default value: off. If rectify is on, the BMM is half-wave rectified. The log compressoralso performs half-wave rectification to avoid negative logs. Sincethe compressor default is log, the rectify default is off. Note: Full wave rectification is produced if rectify is set to 2.This is useful when calculating envelopes with genasa. |
| compress | Apply compression to filtered waves. The form of the compression canbe either logarithmic (log), or a power function (with a value between0 and 1).Switch. Choices log, 0-1, off. Default value: log. The default compressor is logarithmic, not because it is aparticularly good approximation to auditory compression, but ratherbecause it is a good match for the gammatone auditory filtermathematically, and it makes the filterbank level independent. Notethat the logarithmic compressor performs half-wave rectification toavoid negative logs. NOTE: When using the physiological version of AIM with thetransmission-line filterbank and the Meddis haircell bank, setcompress=off, as compression is an integral part of the feedback loopin the transmission-line filterbank module. |
Transduction
transduction Neural transduction switch (at, meddis, off)Switch. Default: off.
III. LEAKY INTEGRATION
stages_idt Number of stages of lowpass filteringDefault unit: scalar. Default value: 2 tup_idt The time constant for each filter stageDefault unit: ms. Default value: 8 ms. The Equivalent Rectandular Duration (ERD) of a two stage lowpassfilter is about 1.6 times the time constant of each stage, or12.8 ms in the current case.
downsample The time between successive spectral frames.Default unit: ms. Default value: 10 ms. Downsample is simply another name for frstep_epn, provided tofacilitate a different mode of thinking about time-series data.
frstep_epn The time between successive spectral framesDefault unit: ms. Default value: 10 ms. With a frstep_epn of 10 ms, genasa will producespectral frames at a rate of 100 per second.
REFERENCES
Glasberg, B. R. and B. C. J. Moore (1990). "Derivation of auditory filter shapes from notched-noise data."Hearing Research, 47, 103-138.
| Patterson, R.D. and B.C.J. Moore (1986)."Auditory filters and excitation patterns as representations offrequency resolution," In: Frequency Selectivity in Hearing. B.C.J.Moore (Ed.), Academic Press, London. 123-177. | |
| Patterson, R.D., Holdsworth, J. and Allerhand M. (1992a)."Auditory Models as preprocessors for speech recognition," In: TheAuditory Processing of Speech: From the auditory periphery to words,M.E.H. Schouten (ed), Mouton de Gruyter, Berlin, 67-83. | |
| Patterson, R.D., Allerhand, M.H. and Holdsworth, J. (1993a)."Auditory representations of speech sounds," In Visualrepresentations of speech signals, Eds. Martin Cooke, SteveBeet, and Malcolm Crawford, John Wiley & Sons, Chichester. 307-314. | |
| Patterson, R.D. (1994a)."The sound of a sinusoid:Spectral models" J. Acoust. Soc. Am. 96, 1409-1418. | |
| Patterson, R.D., Anderson, T., and Allerhand, M. (1994)."The auditory image model as a preprocessor for spoken language," inProc. Third ICSLP, Yokohama, Japan, 1395-1398. | |
| Patterson, R.D., Allerhand, M., and Giguere, C., (1995)."Time-domain modelling of peripheral auditory processing: A modulararchitecture and a software platform," J. Acoust. Soc. Am. 98-3, (inpress). | |
.genasarc The options file for genasa.
SEE ALSO
genbmm, gensgm
BUGS
None currently known.
COPYRIGHT
Copyright (c) Applied Psychology Unit, Medical Research Council, 1995
Permission to use, copy, modify, and distribute this software without feeis hereby granted for research purposes, provided that this copyrightnotice appears in all copies and in all supporting documentation, and thatthe software is not redistributed for any fee (except for a nominalshipping charge). Anyone wanting to incorporate all or part of thissoftware in a commercial product must obtain a license from the MedicalResearch Council.
The MRC makes no representations about the suitability of thissoftware for any purpose. It is provided "as is" without express orimplied warranty.
THE MRC DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDINGALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALLTHE A.P.U. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGESOR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THISSOFTWARE.
ACKNOWLEDGEMENTS
The AIM software was developed for Unix workstations by JohnHoldsworth and Mike Allerhand of the MRC APU, under the direction ofRoy Patterson. The physiological version of AIM was developed byChristian Giguere. The options handler is by Paul Manson. The revisedSAI module is by Jay Datta. Michael Akeroyd extended the postscriptfacilites and developed the xreview routine for auditory imagecartoons.
The project was supported by the MRC and grants from the U.K. DefenseResearch Agency, Farnborough (Research Contract 2239); the EEC EspritBR Porgramme, Project ACTS (3207); and the U.K. Hearing Research Trust.
| SunOS 5.6 | GENASA (1) | 4 September 1995 |