Manual Page - gensgm(1)

Manual Reference Pages - GENSGM (1)

NAME

gensgm - generate auditory spectrogram

Synopsis
Description
I. Display Defaults
References
Files
See Also
Bugs
Copyright
Acknowledgements

SYNOPSIS

gensgm [ option=value | -option ] [ filename ]

DESCRIPTION

The gensgm module of the AIM software performs a time-domain spectral
analysis using a bank of auditory filters, and summarises the
information in an auditory spectrogram, that is, a spectrogram with
auditory frequency resolution and temporal resolution, rather than the
fixed frequency and temporal resolution of traditional speech
preprocessors (Patterson and Moore, 1986; Glasberg and Moore, 1990).
The spectral analysis converts the input wave into an array of
filtered waves, one for each channel of the auditory filterbank. The
surface of the array of filtered waves is AIMs representation of
basilar membrane motion (BMM) as a function of time (Patterson et
al. 1995). The auditory spectrogram is a plot of a sequence of
spectral slices extracted from the envelope of the BMM every

calculated continuously, by rectifing, compressing, and lowpass
filtering the individual BMM waves as they flow from the filterbank.

The frequency resolution of the analysis varies with the center
frequency of the channel as in the auditory system, and the
distribution of channels across frequency is chosen to match that in
the auditory system (Patterson and Moore, 1986). Thus, the auditory
spectrogram is a greyscale plot of the activity in each channel
(shades of black) as a function of time (the abscissa) and the centre
frequency of the auditory filter (the ordinate) in ERBs. The
representation is referred to as an auditory spectrogram (SGM) to
distinguish it from more traditional spectrograms based on Fourier,
LPC or cepstral analysis. In AIM, the suffix sgm is used to
distinguish this spectral representation from the other spectral
representations provided by the software (asa auditory spectral
analysis, cgm cochleogram, and epn excitation pattern).

The spectral analysis performed by gensgm is the same as that
performed by genbmm. The primary differences are in
the Display defaults and the way the Leaky
Integration module is used to produce the spectral slices that form the
spectrogram. As a result, this manual entry is restricted to
describing the option values that differ from those in genbmm.

I. DISPLAY DEFAULTS

The default values for three of the display options are reset to
produce a spectrographic format rather than a landscape. Specifically,
display=greyscale, bottom=0 and top=2500. The number of channels is
set to 128 for compatibility with the auditory spectrum modules,
genasa and genepn. When using AIM as a preprocessor for speech
recognition the number of channels would typically be reduced to
between 24 and 32. Use option downsample if it is necessary to
reduce the output to less than 24 channels across the speech range.

I. RECTIFICATION AND COMPRESSION

The default form of compression is logarithmic; it has the advantage
of transforming the exponential envelope of the ringing response of
the gammatone filter into a linear decay with time. It also makes the
output close to level independant which is useful when using AIM as a
preprocessor for speech recognition. There is evidence, however, that
auditory compression may be better represented by power compression
with an exponent in the range of 0.5. For a discussion of this
issue, see docs/aimMeddisHewitt. To accommodate power compression
and the assembly of different configurations of AIM, the rectification
and compression options are presented separately in the options list
before the neural transduction section.

rectify

Apply half-wave rectification to filtered waves

Switch. Default value: off.

If rectify is on, the BMM is half-wave rectified. The log compressor
also performs half-wave rectification to avoid negative logs. Since
the compressor default is log, the rectify default is off.

Note: Full wave rectification is produced if rectify is set to 2.
This is useful when calculating envelopes with genasa or gensgm.

compress

Apply compression to filtered waves. The form of the compression can
be either logarithmic (log), or a power function (with a value between
0 and 1).

Switch. Choices log, 0-1, off. Default value: log.

The default compressor is logarithmic, not because it is a
particularly good approximation to auditory compression, but rather
because it is a good match for the gammatone auditory filter
mathematically, and it makes the filterbank level independent. Note
that the logarithmic compressor performs half-wave rectification to
avoid negative logs.

NOTE: When using the physiological version of AIM with the
transmission-line filterbank and the Meddis haircell bank, set
compress=off, as compression is an integral part of the feedback loop
in the transmission-line filterbank module.

Transduction

transduction Neural transduction switch (at, meddis, off)
Switch. Default: off.

II LEAKY INTEGRATION

stages_idt Number of stages of lowpass filtering
Default unit: scalar. Default value: 2

tup_idt The time constant for each filter stage
Default unit: ms. Default value: 8 ms.

The Equivalent Rectandular Duration (ERD) of a two stage lowpass
filter is about 1.6 times the time constant of each stage, or
12.8 ms in the current case.

downsample The time between successive spectral frames.
Default unit: ms. Default value: 10 ms.

Downsample is simply another name for frstep_epn, provided to
facilitate a different mode of thinking about time-series data.

frstep_epn The time between successive spectral frames
Default unit: ms. Default value: 10 ms.

With a frstep_epn of 10 ms, genasa will produce
spectral frames at a rate of 100 per second.

REFERENCES

Glasberg, B. R. and B. C. J. Moore (1990).
"Derivation of auditory filter shapes from notched-noise data,"
Hearing Research, 47, 103-138.

Patterson, R.D. and B.C.J. Moore (1986).
"Auditory filters and excitation patterns as representations of
frequency resolution," In: Frequency Selectivity in Hearing. B.C.J.
Moore (Ed.), Academic Press, London. 123-177.

Patterson, R.D., Holdsworth, J. and Allerhand M. (1992a).
"Auditory Models as preprocessors for speech recognition," In: The
Auditory Processing of Speech: From the auditory periphery to words,
M.E.H. Schouten (ed), Mouton de Gruyter, Berlin, 67-83.

Patterson, R.D., Allerhand, M. H. and Holdsworth, J. (1993a).
"Auditory representations of speech sounds," In Visual
representations of speech signals, Eds. Martin Cooke, Steve Beet, and
Malcolm Crawford, John Wiley & Sons, Chichester. 307-314.

Patterson, R.D., Anderson, T., and Allerhand, M. (1994).
"The auditory image model as a preprocessor for spoken language," in
Proc. Third ICSLP, Yokohama, Japan, 1395-1398.

Patterson, R.D., Allerhand, M., and Giguere, C., (1995).
"Time-domain modelling of peripheral auditory processing: A modular
architecture and a software platform," J. Acoust. Soc. Am. 98-3, (in
press).

FILES

.gensgmrc The options file for gensgm.

BUGS

None currently known.

COPYRIGHT

Copyright (c) Applied Psychology Unit, Medical Research Council, 1995

Permission to use, copy, modify, and distribute this software without fee
is hereby granted for research purposes, provided that this copyright
notice appears in all copies and in all supporting documentation, and that
the software is not redistributed for any fee (except for a nominal
shipping charge). Anyone wanting to incorporate all or part of this
software in a commercial product must obtain a license from the Medical
Research Council.

The MRC makes no representations about the suitability of this
software for any purpose. It is provided "as is" without express or
implied warranty.

THE MRC DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL
THE A.P.U. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES
OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
SOFTWARE.

ACKNOWLEDGEMENTS

The AIM software was developed for Unix workstations by John
Holdsworth and Mike Allerhand of the MRC APU, under the direction of
Roy Patterson. The physiological version of AIM was developed by
Christian Giguere. The options handler is by Paul Manson. The revised
SAI module is by Jay Datta. Michael Akeroyd extended the postscript
facilites and developed the xreview routine for auditory image
cartoons.

The project was supported by the MRC and grants from the U.K. Defense
Research Agency, Farnborough (Research Contract 2239); the EEC Esprit
BR Porgramme, Project ACTS (3207); and the U.K. Hearing Research Trust.

SunOS 5.6

GENSGM (1)

11 May 1995

Generated by manServer 1.07 from /cbu/cnbh/aim/release/man/man1/gensgm.1 using man macros.

gensgm

Manual Reference Pages - GENSGM (1)

NAME

CONTENTS

SYNOPSIS

DESCRIPTION

I. DISPLAY DEFAULTS

I. RECTIFICATION AND COMPRESSION

Transduction

II LEAKY INTEGRATION

REFERENCES

FILES

SEE ALSO

BUGS

COPYRIGHT

ACKNOWLEDGEMENTS

Postal Address

PDN resources

Webmaster

Study at Cambridge

About the University

Research at Cambridge

stages_idt	Number of stages of lowpass filtering Default unit: scalar. Default value: 2
tup_idt	The time constant for each filter stage Default unit: ms. Default value: 8 ms. The Equivalent Rectandular Duration (ERD) of a two stage lowpass filter is about 1.6 times the time constant of each stage, or 12.8 ms in the current case.
downsample	The time between successive spectral frames. Default unit: ms. Default value: 10 ms. Downsample is simply another name for frstep_epn, provided to facilitate a different mode of thinking about time-series data.
frstep_epn	The time between successive spectral frames Default unit: ms. Default value: 10 ms. With a frstep_epn of 10 ms, genasa will produce spectral frames at a rate of 100 per second.