Manual Page - gencgm(1)

Manual Reference Pages - GENCGM (1)

NAME

gencgm - generate a cochleogram

Synopsis
Description
I. Display Defaults
Ii. Leaky Integration
References
Files
See Also
Bugs
Copyright
Acknowledgements

SYNOPSIS

gencgm [ option=value | -option ] [ filename ]

DESCRIPTION

Gencgm converts the input wave into a simulated neural activity
pattern (NAP) and summarises the NAP as a sequence of excitation
patterns (EPNs) (Patterson et al. 1992a, 1993a). The operation takes
place in three stages: spectral analysis, neural encoding, and
temporal integration (Patterson et al. 1995). In the spectral analysis
stage, the input wave is converted into an array of filtered waves,
one for each channel of a gammatone auditory filterbank. The surface
of the array of filtered waves is AIMs representation of basilar
membrane motion (BMM) as a function of time. In the neural encoding
stage, compression, adaptation and suppression, are used to convert
each wave from the filterbank into a simulation of the aggregate
neural response to that wave. The array of responses is AIMs
simulation of the neural activity pattern (NAP) in the auditory nerve
at about the level of the cochlear nucleus. Finally, the NAP is
converted into a sequence of excitation patterns (EPNs) by calculating
the envelope of the NAP and extracting spectral slices from the
envelope every frstep_epn ms (Patterson et al. 1994). The envelope
is calculated continuously, by lowpass filtering the individual
channels of the NAP as they flow from the cochlea simulation.

When the sequence of excitation patterns is presented in
spectrographic format, it is referred to as a cochleogram (CGM). The
spectrographic format has time on the abscissa (x-axis), filter
centre-frequency on the ordinate (y-axis), and activity level as the
degree of black in the display. In AIM, the suffix cgm is used to
distinguish this spectral representation from the other spectral
representations provided by the software (asa auditory spectral
analysis, sgm auditory spectrogram, and epn excitation pattern).

The neural activity pattern produced by genepn is the same as that
produced by gennap. The primary differences are in the defaults for
the Displays and the fact that the Leaky Integration is used to
construct spectral slices from the NAP rather than simulating loss of
phase locking. As a result, this manual entry is restricted to
describing the options that differ from those in gennap.

I. DISPLAY DEFAULTS

The default values for three of the display options are reset to
produce a spectrographic format rather than a landscape. Specifically,
display=greyscale, bottom=0 and top=2500. The number of channels is
set to 128 for compatibility with the auditory spectrum modules,
genasa and genepn. When using AIM as a preprocessor for speech
recognition the number of channels would typically be reduced to
between 24 and 32. Use option downsample if it is necessary to
reduce the output to less than 24 channels across the speech range.

NOTE: The cochlea simulations impose compression of one form or
another on the NAP and the notes on compression in the man pages for
gennap apply to gencgm as well.

Transduction

transduction Neural transduction switch (at, meddis, off)
Switch. Default: at.

II. LEAKY INTEGRATION

stages_idt Number of stages of lowpass filtering
Default unit: scalar. Default value: 2

tup_idt The time constant for each filter stage
Default unit: ms. Default value: 8 ms.

The Equivalent Rectandular Duration (ERD) of a two stage lowpass
filter is about 1.6 times the time constant of each stage, or
12.8 ms in the current case.

downsample The time between successive spectral frames.
Default unit: ms. Default value: 10 ms.

Downsample is simply another name for frstep_epn, provided to
facilitate a different mode of thinking about time-series data.

frstep_epn The time between successive spectral frames
Default unit: ms. Default value: 10 ms.

With a frstep_epn of 10 ms, gencgm will produce
spectral frames at a rate of 100 per second.

REFERENCES

Patterson, R.D., Holdsworth, J. and Allerhand M. (1992a).

"Auditory Models as preprocessors for speech recognition," In: The
Auditory Processing of Speech: From the auditory periphery to words,
M.E.H. Schouten (ed), Mouton de Gruyter, Berlin, 67-83.

Patterson, R.D., Allerhand, M.H. and Holdsworth, J. (1993a)
"Auditory representations of speech sounds," In Visual
representations of speech signals, Eds. Martin Cooke, Steve
Beet, and Malcolm Crawford, John Wiley & Sons, Chichester. 307-314.

Patterson, R.D., Anderson, T., and Allerhand, M. (1994).
"The auditory image model as a preprocessor for spoken language," in
Proc. Third ICSLP, Yokohama, Japan, 1395-1398.

Patterson, R.D., Allerhand, M., and Giguere, C., (1995).
"Time-domain modelling of peripheral auditory processing: A modular
architecture and a software platform," J. Acoust. Soc. Am. 98-3, (in
press).

FILES

The options file for gencgm.

BUGS

None currently known.

COPYRIGHT

Copyright (c) Applied Psychology Unit, Medical Research Council, 1995

Permission to use, copy, modify, and distribute this software without fee
is hereby granted for research purposes, provided that this copyright
notice appears in all copies and in all supporting documentation, and that
the software is not redistributed for any fee (except for a nominal
shipping charge). Anyone wanting to incorporate all or part of this
software in a commercial product must obtain a license from the Medical
Research Council.

The MRC makes no representations about the suitability of this
software for any purpose. It is provided "as is" without express or
implied warranty.

THE MRC DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL
THE A.P.U. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES
OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
SOFTWARE.

ACKNOWLEDGEMENTS

The AIM software was developed for Unix workstations by John
Holdsworth and Mike Allerhand of the MRC APU, under the direction of
Roy Patterson. The physiological version of AIM was developed by
Christian Giguere. The options handler is by Paul Manson. The revised
SAI module is by Jay Datta. Michael Akeroyd extended the postscript
facilites and developed the xreview routine for auditory image
cartoons.

The project was supported by the MRC and grants from the U.K. Defense
Research Agency, Farnborough (Research Contract 2239); the EEC Esprit
BR Porgramme, Project ACTS (3207); and the U.K. Hearing Research Trust.

SunOS 5.6

GENCGM (1)

4 September 1995

Generated by manServer 1.07 from /cbu/cnbh/aim/release/man/man1/gencgm.1 using man macros.

gencgm

Manual Reference Pages - GENCGM (1)

NAME

CONTENTS

SYNOPSIS

DESCRIPTION

I. DISPLAY DEFAULTS

Transduction

II. LEAKY INTEGRATION

REFERENCES

FILES

SEE ALSO

BUGS

COPYRIGHT

ACKNOWLEDGEMENTS

Postal Address

PDN resources

Webmaster

Study at Cambridge

About the University

Research at Cambridge

stages_idt	Number of stages of lowpass filtering Default unit: scalar. Default value: 2
tup_idt	The time constant for each filter stage Default unit: ms. Default value: 8 ms. The Equivalent Rectandular Duration (ERD) of a two stage lowpass filter is about 1.6 times the time constant of each stage, or 12.8 ms in the current case.
downsample	The time between successive spectral frames. Default unit: ms. Default value: 10 ms. Downsample is simply another name for frstep_epn, provided to facilitate a different mode of thinking about time-series data.
frstep_epn	The time between successive spectral frames Default unit: ms. Default value: 10 ms. With a frstep_epn of 10 ms, gencgm will produce spectral frames at a rate of 100 per second.

Patterson, R.D., Holdsworth, J. and Allerhand M. (1992a).
	"Auditory Models as preprocessors for speech recognition," In: The Auditory Processing of Speech: From the auditory periphery to words, M.E.H. Schouten (ed), Mouton de Gruyter, Berlin, 67-83.