gensgm
Manual Reference Pages - GENSGM (1)
gensgm - generate auditory spectrogram
CONTENTS
Synopsis
Description
I. Display Defaults
References
Files
See Also
Bugs
Copyright
Acknowledgements
SYNOPSIS
gensgm [ option=value | -option ] [ filename ]
DESCRIPTION
The gensgm module of the AIM software performs a time-domain spectralanalysis using a bank of auditory filters, and summarises theinformation in an auditory spectrogram, that is, a spectrogram withauditory frequency resolution and temporal resolution, rather than thefixed frequency and temporal resolution of traditional speechpreprocessors (Patterson and Moore, 1986; Glasberg and Moore, 1990).The spectral analysis converts the input wave into an array offiltered waves, one for each channel of the auditory filterbank. Thesurface of the array of filtered waves is AIMs representation ofbasilar membrane motion (BMM) as a function of time (Patterson etal. 1995). The auditory spectrogram is a plot of a sequence ofspectral slices extracted from the envelope of the BMM everycalculated continuously, by rectifing, compressing, and lowpassfiltering the individual BMM waves as they flow from the filterbank.
The frequency resolution of the analysis varies with the centerfrequency of the channel as in the auditory system, and thedistribution of channels across frequency is chosen to match that inthe auditory system (Patterson and Moore, 1986). Thus, the auditoryspectrogram is a greyscale plot of the activity in each channel(shades of black) as a function of time (the abscissa) and the centrefrequency of the auditory filter (the ordinate) in ERBs. Therepresentation is referred to as an auditory spectrogram (SGM) todistinguish it from more traditional spectrograms based on Fourier,LPC or cepstral analysis. In AIM, the suffix sgm is used todistinguish this spectral representation from the other spectralrepresentations provided by the software (asa auditory spectralanalysis, cgm cochleogram, and epn excitation pattern).
The spectral analysis performed by gensgm is the same as thatperformed by genbmm. The primary differences are inthe Display defaults and the way the LeakyIntegration module is used to produce the spectral slices that form thespectrogram. As a result, this manual entry is restricted todescribing the option values that differ from those in genbmm.
I. DISPLAY DEFAULTS
The default values for three of the display options are reset toproduce a spectrographic format rather than a landscape. Specifically,display=greyscale, bottom=0 and top=2500. The number of channels isset to 128 for compatibility with the auditory spectrum modules,genasa and genepn. When using AIM as a preprocessor for speechrecognition the number of channels would typically be reduced tobetween 24 and 32. Use option downsample if it is necessary toreduce the output to less than 24 channels across the speech range.
I. RECTIFICATION AND COMPRESSION
The default form of compression is logarithmic; it has the advantageof transforming the exponential envelope of the ringing response ofthe gammatone filter into a linear decay with time. It also makes theoutput close to level independant which is useful when using AIM as apreprocessor for speech recognition. There is evidence, however, thatauditory compression may be better represented by power compressionwith an exponent in the range of 0.5. For a discussion of thisissue, see docs/aimMeddisHewitt. To accommodate power compressionand the assembly of different configurations of AIM, the rectificationand compression options are presented separately in the options listbefore the neural transduction section.
| rectify | Apply half-wave rectification to filtered wavesSwitch. Default value: off. If rectify is on, the BMM is half-wave rectified. The log compressoralso performs half-wave rectification to avoid negative logs. Sincethe compressor default is log, the rectify default is off. Note: Full wave rectification is produced if rectify is set to 2.This is useful when calculating envelopes with genasa or gensgm. |
| compress | Apply compression to filtered waves. The form of the compression canbe either logarithmic (log), or a power function (with a value between0 and 1).Switch. Choices log, 0-1, off. Default value: log. The default compressor is logarithmic, not because it is aparticularly good approximation to auditory compression, but ratherbecause it is a good match for the gammatone auditory filtermathematically, and it makes the filterbank level independent. Notethat the logarithmic compressor performs half-wave rectification toavoid negative logs. NOTE: When using the physiological version of AIM with thetransmission-line filterbank and the Meddis haircell bank, setcompress=off, as compression is an integral part of the feedback loopin the transmission-line filterbank module. |
Transduction
transduction Neural transduction switch (at, meddis, off)Switch. Default: off.
II LEAKY INTEGRATION
stages_idt Number of stages of lowpass filteringDefault unit: scalar. Default value: 2 tup_idt The time constant for each filter stageDefault unit: ms. Default value: 8 ms. The Equivalent Rectandular Duration (ERD) of a two stage lowpassfilter is about 1.6 times the time constant of each stage, or12.8 ms in the current case.
downsample The time between successive spectral frames.Default unit: ms. Default value: 10 ms. Downsample is simply another name for frstep_epn, provided tofacilitate a different mode of thinking about time-series data.
frstep_epn The time between successive spectral framesDefault unit: ms. Default value: 10 ms. With a frstep_epn of 10 ms, genasa will producespectral frames at a rate of 100 per second.
REFERENCES
Glasberg, B. R. and B. C. J. Moore (1990)."Derivation of auditory filter shapes from notched-noise data,"Hearing Research, 47, 103-138.
| Patterson, R.D. and B.C.J. Moore (1986)."Auditory filters and excitation patterns as representations offrequency resolution," In: Frequency Selectivity in Hearing. B.C.J.Moore (Ed.), Academic Press, London. 123-177. | |
| Patterson, R.D., Holdsworth, J. and Allerhand M. (1992a)."Auditory Models as preprocessors for speech recognition," In: TheAuditory Processing of Speech: From the auditory periphery to words,M.E.H. Schouten (ed), Mouton de Gruyter, Berlin, 67-83. | |
| Patterson, R.D., Allerhand, M. H. and Holdsworth, J. (1993a)."Auditory representations of speech sounds," In Visualrepresentations of speech signals, Eds. Martin Cooke, Steve Beet, andMalcolm Crawford, John Wiley & Sons, Chichester. 307-314. | |
| Patterson, R.D., Anderson, T., and Allerhand, M. (1994)."The auditory image model as a preprocessor for spoken language," inProc. Third ICSLP, Yokohama, Japan, 1395-1398. | |
| Patterson, R.D., Allerhand, M., and Giguere, C., (1995)."Time-domain modelling of peripheral auditory processing: A modulararchitecture and a software platform," J. Acoust. Soc. Am. 98-3, (inpress). | |
.gensgmrc The options file for gensgm.
SEE ALSO
genasa, genbmm, genepn, gencgm
BUGS
None currently known.
COPYRIGHT
Copyright (c) Applied Psychology Unit, Medical Research Council, 1995
Permission to use, copy, modify, and distribute this software without feeis hereby granted for research purposes, provided that this copyrightnotice appears in all copies and in all supporting documentation, and thatthe software is not redistributed for any fee (except for a nominalshipping charge). Anyone wanting to incorporate all or part of thissoftware in a commercial product must obtain a license from the MedicalResearch Council.
The MRC makes no representations about the suitability of thissoftware for any purpose. It is provided "as is" without express orimplied warranty.
THE MRC DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDINGALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALLTHE A.P.U. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGESOR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THISSOFTWARE.
ACKNOWLEDGEMENTS
The AIM software was developed for Unix workstations by JohnHoldsworth and Mike Allerhand of the MRC APU, under the direction ofRoy Patterson. The physiological version of AIM was developed byChristian Giguere. The options handler is by Paul Manson. The revisedSAI module is by Jay Datta. Michael Akeroyd extended the postscriptfacilites and developed the xreview routine for auditory imagecartoons.
The project was supported by the MRC and grants from the U.K. DefenseResearch Agency, Farnborough (Research Contract 2239); the EEC EspritBR Porgramme, Project ACTS (3207); and the U.K. Hearing Research Trust.
| SunOS 5.6 | GENSGM (1) | 11 May 1995 |