Manual Page - genwav(1)

Manual Reference Pages - GENWAV (1)

NAME

genwav - display the wave in filename.

Synopsis
Description
Files
See Also
Bugs
Copyright
Acknowledgements

SYNOPSIS

genwav [ option=value | -option ] [ filename ]

DESCRIPTION

Genwav sets up an Xwindow and displays a segment of the input wave in
the window. The size of the window and the size of the wave are
determined by options, as are a number of other input/output functions
and printing functions. These options have no direct bearing on the
auditory processing performed by AIM. For convenience, then, these
Non-Auditory options are associated with the instruction genwav (the
one non-auditory instruction), and they are listed at the top of the
options tables prior to the auditory options. There are also a large
number of Silent Options which control lesser used functions, both
auditory and non-auditory. They are listed in docs/aimSilentOptions
and there is documentation for some of them at the end of the listing.

There are three classes of Non-Auditory options:

I) DISPLAY OPTIONS that determine the format of the auditory representations
of sound on the screen, or on paper when printed.

II) OUTPUT OPTIONS that determine the format and content of files used
to store the auditory representations of sounds.

III) INPUT OPTIONS that determine how the wave in the input file should
be interpreted.

The output options are presented before the input options so that the
input options will be adjacent to the filterbank options in the
options tables produced by genbmm and subsequent instructions.

I. DISPLAY OPTIONS

The AIM modules produce output in the form of a set of functions, one
for each channel of the auditory filterbank. For example, the output
of genbmm is a set of functions that simulate basilar membrane motion
produced in response to the input wave. By default, the AIM software
puts an Xwindow up on the computer screen and displays the output in
the window. This section describes the options that control these
displays. There are also a number of Silent Options associated with
displays (see docs/aimSilentOptions).

The display options are: title, display, x0-win, y0-win, width_win,
height_win, display, view, top, bottom, magnification, pensize, hiddenline.

A. The Display Window Title, Position, and Size

title Title of output display.

Character string. Default: input file name.
The title of the output being displayed. If no title is given, the
display bears the name of the file of the input wave.

x0_win Left edge of window

Unit: pixels. Default: centre.
The left edge of the window into which the display will be drawn,
relative to the left edge of the screen (i.e. the x-coordinate of the
window within the screen). A value of centre will cause centring in
the horizontal dimension (provided the window manager does not
override).

y0-win Lower edge of window

Unit: pixels. Default: centre.
The lower edge of the window into which the display will be drawn,
relative to the lower edge of the screen (i.e. the y-coordinate of the
window within the screen). A value of centre will cause centring in
the vertical dimension (provided the window manager does not
override).

Taken as a pair x0_win and y0-win determine the origin of the window,
relative to the screen origin which is assumed to be the lower left
corner of the screen.

width_win Window width

Unit: pixels. Default: 640.
The width of the window into which the display will be drawn.

height_win Window height

Unit: pixels. Default: 480.
The height of the window into which the display will be drawn.

B. Display Controls

display Display output on screen

Switch. Default: on.
Normally this switch is on and a bitmap of the output is displayed in
a graphical window on the computer screen. The switch is provided
because the time taken to create the displays is considerable, and it
is useful to turn it dsiplay off using AIM as a preprocessor for
speech recognition.

top The largest postive value visible in the display

Scalar. Default value: 1024 (for genwav)
Each of the functions in the multi-channel output of a module is
displayed in a transparent window. Provided the channel density is not
too low, the functions are related and the set of functions produces a
display that looks like a complex landscape. Top determines the
largest positive value that will appear in the transparent windows of
the individual functions, so top must be as large as the largest value
in the full set of functions. Increasing top has the effect of moving
the viewer farther up above the landscape.

bottom The largest negative value visible in the display

Scalar. Default value: -1024 (for genwav)
Bottom determines the largest negative value that will appear in the
transparent windows of the individual functions, so bottom must be as
large in the negative direction as the largest negative value in the
full set of functions. Increasing bottom in the negative direction has
the effect of depeening the valleys in the landscape.

magnification Display magnification

Scalar. Default: 1.0.
The degree to which the amplitude of the functions in the display
should be magnified before being displayed. This parameter is merely
for adjusting the visual contrast of the display. The magnification
option is a multiplier, so a value of 1 implies drawing to scale,
while a value of 10 implies ten times (10x) the size of values in the
module output and 0.1 implies one tenth of the output size.
Magnification is related to, but separate from, the gain options which
affect the values of the output functions and the values stored in any
output files. Magnification is an alternative means of controlling the
size of the functions in the display -- alternative to top and bottom.

pensize The size of the lines in the displays and the

dots on the spiral

Unit: pixels. Default: 1.
This option allows the user to specify the thickness of the lines in
the display and the size of the dots on spiral auditory images. It
also affects the lines and dots in postscript plots. It is provided
primarily for use with printers which have much more resolution than
computer screens. On laser printers a value of 3-5 gives reasonable
line thickness. On the screen, a linewidth greater than 1 produces
slow drawing, and a gagged, blurred display.

hiddenline Draw with overlapping parts of functions

hidden
Switch. Default: on.
This switch specifies whether or not a hidden line algorithm should
be used when drawing the display. It also affects printed displays.
In almost all cases, hiddenline results in more attractive displays of
waveforms, and it often makes complex displays easier to understand,
so the default is on. Note: hiddenline almost doubles the drawing
time so it is sometimes useful to switch it off on slower machines.

II. OUTPUT OPTIONS

The output options are listed and described before the input options
so that the input options will be adjacent to the filterbank options
in the listings produced by genbmm and subsequent modules. The output
options are downchannel, erase_ctn, animate_ctn, bitmap_ctn,
postscript, output, and header. There are also a number of Silent
Options associated with output (see docs/aimSilentOptions).

downchannel Average adjacent channels of multichannel

representations

Units: Number of averagings. Default value: 0.
There is interaction between channels in the transmission-line
filterbank of the physiological version of AIM, and in the neural
encoding of the functional version of AIM. The minimum channel
density for these processes to operate properly is four channels per
ERB and 2 channels per ERB, respectively. For broadband signals like
speech this means that the minimum number of channels is on the order
of 128 and 64, respectively. This channel density can produce
cluttered displays, and more importantly, it is far too many channels
for current speech recognition systems which typically use 12-24
channels. This is not just a computer power problem; the recognition
systems actually perform less well with extra channels. Accordingly,
the option downchannel provides the option of reducing the channel
density at output, so that AIM can operate with the appropriate
channel density and still provide output that is compatible with
displays and speech recognition systems.

Downchannel averages pairs of adjacent channels and the option value
specifies how many times it should execute the averaging process. Each
averaging reduces the number of channels by a factor of 2, so for
proper transmission-line filtering and an output file with 16
channels, set channels_afb=128 and downchannel=3 (three successive
halvings of the number of channels).

A. Animated Cartoons

Four of the AIM instructions produce output in the form of sequences
of spectral frames (gensgm, gencgm, genasa and genepn). Bitmap
versions of the displays of the frames can be stored by AIM and
replayed by review and xreview. When the sequence of frames is played
rapidly, it appears as an animated cartoon that shows the dynamic
behaviour of the spectrum of the sound.

Similarly, the AIM instructions for auditory images (gensai and
genspl) produce sequences of landscape frames, and bitmap versions of
the landscape displays can also be stored by AIM and replayed by
review and xreview. Indeed, it was the desire to produce auditory
image cartoons that led to the development of much of the AIM software
package. The animated cartoons or auditory images show the dynamic
behaviour of features in the images, like the motion of formants in
diphthongs and the motion of notes in a melody.

This section describes the options that control the construction and
storage of sequences of bitmaps; there is a separate manual entries for
the xreview routine that replays the bitmaps (man xreview).

erase_ctn Erase the current frame before presenting

the next frame

Switch. Default value: on.
Normally, when presenting a sequence of frames as an animated cartoon,
one wants to erase the current frame before presenting the next. When
the frames are spectra, however, the set of frames can together form a
meaningful display; for example, the set of rising spectra produced at
the onset of a sound produces a contour map of the onset. The option
erase_ctn enables the user to observe the full set of spectra
simultaneously. (See aimdemo_gtf_spectra or aimdemo_tlf_spectra ).

animate_ctn Store frames in memory and replay all of

them as a cartoon

Switch. Default value: off.
When this option is on, AIM stores the bitmaps of the frames it
produces in the memory of the machine and replays them rapidly when
the instruction is complete. Type RETURN to animate the cartoon again;
type q RETURN to exit the instruction. (This option was important
when machines were slower and before the availability of review and
xreview. It is now largely obsolete.)

bitmap_ctn Store bitmaps of frames in a file for

replay as a cartoon

Switch. Default value: off.
When this option is on, bitmaps of the frames produced for the input
in file_name will be stored in file_name.ctn. The sequence of frames
can later be replayed using either

> review file_name or,

> xreview file_name

Both of these programs enable the user to vary the rate of animation,
the section of the sequence to be view, etc. The xreview version has a
window interface with useful information and is the preferred version
in most cases.

B. Output Files for Printing and Postprocessing

postscript Produce printer-ready output

Switch. Default value: off.
This switch causes AIM to produce a printer-ready version of the
displays it presents on the computer screen. For example, the NAP of
a 32-ms section of cegc can be printed using

> gennap length=32 postscript=on cegc | lpr -Plw

where lpr is the Unix printer-driver and the lw of -Plw specifies
the destination printer. You may need to check the name of your
systems printer driver and laser printer.

Alternately the postscript version of the display may be directed to a
file using an instruction like

> gennap length=32 postscript=on cegc > cegc_nap.ps

and printed later at the users convenience. In this example, the file
name cegc_nap.ps is not generated by AIM; the _nap.ps suffix is
added by the user following standard conventions to indicate that the file
contains a NAP in postscript form.

NOTE: There are a very large number of Silent Options associated with

        postscript printing that greatly facilitate preparation of

        displays for publication (see docs/aimSilentOptions).

THREE POSTSCRIPT CAUTIONS:

Postscript files of landscape displays from AIM are very large. As a
result, we recommend

a) that you NOT switch postscript on without redirecting the output to
a file, as it will cause the output to be display on the screen in a
seemingly endless display,

b) that you be careful NOT to print postscript files on a printer
which does not understand the Postscript language, as it can cause the
printer to put out an extremely long file, one column per page!

c) that you NOT set postscript=on in an options file as it will
generate large files in the directory without your noticing.

output Generate an output file

Switch. Default value: off.
This switch causes the array of functions that defines AIMs
simulation of basilar membrane motion, or a neural activity pattern,
or an auditory image, to be stored in a file for subsequent processing
by the aimtools or other, user defined, operators. By convention, the
file is given the same name as the input file, but with a suffix
reflecting the entry point, to distinguish it from the input file on
the one hand and from other output files on the other hand. The naming
system enables the user to construct and store a set of output files
for one input file without the need to specify a sequence of file
names. The suffixes are those used to identify the modules in the
listing produced by gen -help. So, for example, the following
command line:

> gennap output=on length=32 cegc

will produce an output file named cegc.nap containing a multiplexed
version of the functions that define the NAP of the first 32 ms of
cegc.

The spectrographic representations produced by gensgm and gencgm can
be stored in the same way, as can the sequences of spectra produced by
genasa and genepn. It is the output files of genasa and gencgm that
are used to interface AIM with speech recognition systems (Robinson et
al., 1990; Patterson et al., 1995; Giguere and Woodland, 1994a).
Details of the file formats are presented in docs/aimFileFormat.

header Put a header on the output file

Flag. Default value: on.
By default, a header is prepended to each output file so that
subsequent processors have access to the history of the file. Details
of the header structure are presented in docs/aimFileFormat.

Note: There is an AIM tool hdr which will remove the header from an
AIM output file (man hrd).

III. INPUT OPTIONS

The input options enable the user to process a subsection of the input
wave, and to specify characterisitcs of the wave.

The input options are: input_wave, start_wave, length_wave,
samplerate, swap_wave, dB_wave.

input_wave Default input wave name

Filename. Default value: none.
The name of the wave file to process. This option permits simple
repetitive processing of the same input file without repetitive typing. It
also enables one to circumvent the Unix convention of having the filename
last on the command line. This option is overridden if the user supplies a
wave file name at the end of the command line.

start_wave Start point in wave

Default unit: ms. Default value: 0.
The point in the input wave at which processing should begin. The
start_wave option is expressed in milliseconds and its default value is the
beginning of the file (i.e. 0 ms into the file).

length_wave Length of wave

Default unit: ms. Default value: remainder.
The number of milliseconds of the wave that ought to be processed,
beyond the start point. The special value remainder indicates that
the entire length of the wave from the start point to the end of the
file should be processed.

samplerate Input wave sample rate

Default unit: Hertz. Default value: 20,000 Hz.
The rate at which the input wave was sampled.

swap_wave Swap the bytes in each binary pair of the

input file

Switch. Default: off.
The order of the bytes in short integers varies between manufacturers.
Specifically the order for Sun. SGI and HP is opposite that for DEC and
IBM. The default setting (off) is for the latter byte order.

dB_wave Input wave level for physiological AIM
This option sets level of the input wave for the physiological version
of AIM, that is, the route with the transmission line filterbank.
(The functional version of AIM is level-independent and dB_wave is
ignored when the gammatone filerbank is used.)

Units: dB. Default: 60 dB

dB_wave is a scaling parameter that tells AIM the level of the wave in
your input file relative to AIMs internal standard. It is used for
calibration and investigation of the effects of level in the auditory
system.

Calibration:

To calibrate AIM for a given recording set up, put a sinusoid of known
level (dBSPL) into the recording system and store a sample of it as
short integers in a headerless wave file. Calculate the rms amplitude
of the sinusoid (RMS) (see note below) and then use the following
equation to calculate the appropriate value of dB_wave.

dB_wave = dBSPL - 20log(RMS/200)

For example, if the sinusoid has a known level of 60 dB SPL, and the
recorded version produces a wave with an RMS amplitude of 467.3, then
dB_wave should be set to 52.6.

Note: The RMS value of a stored input wave can be calculated using the
AIM tool stats as follows:

> stats stat=rms line=on <input-wave>

Investigation of Level Effects:

If you change the value of dB_wave from 60 to 80 dB, the SAME input
file is assumed to represent a stimulus that is 20 dB HIGHER in level.
This enables you to investigate the effects of level with a fixed
input file.

Output scaling:

You can scale the genbmm output into absolute units of basilar
membrane velocity in cm/s by multiplying the genbmm output numbers by:

antilog[(dB_wave-60)/20] / (4000*gain_tlf)

where dB_wave and gain_tlf are the values specified at run time.

FILES

.genwavrc The options file for genwav.

BUGS

COPYRIGHT

Copyright (c) Applied Psychology Unit, Medical Research Council, 1995

Permission to use, copy, modify, and distribute this software without fee
is hereby granted for research purposes, provided that this copyright
notice appears in all copies and in all supporting documentation, and that
the software is not redistributed for any fee (except for a nominal
shipping charge). Anyone wanting to incorporate all or part of this
software in a commercial product must obtain a license from the Medical
Research Council.

The MRC makes no representations about the suitability of this
software for any purpose. It is provided "as is" without express or
implied warranty.

THE MRC DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL
THE A.P.U. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES
OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
SOFTWARE.

ACKNOWLEDGEMENTS

The AIM software was developed for Unix workstations by John
Holdsworth and Mike Allerhand of the MRC APU, under the direction of
Roy Patterson. The physiological version of AIM was developed by
Christian Giguere. The options handler is by Paul Manson. The revised
SAI module is by Jay Datta. Michael Akeroyd extended the postscript
facilites and developed the xreview routine for auditory image
cartoons.

The project was supported by the MRC and grants from the U.K. Defense
Research Agency, Farnborough (Research Contract 2239); the EEC Esprit
BR Porgramme, Project ACTS (3207); and the U.K. Hearing Research Trust.

SunOS 5.6

GENWAV (1)

16 April 1997

Generated by manServer 1.07 from /cbu/cnbh/aim/release/man/man1/genwav.1 using man macros.

genwav

Manual Reference Pages - GENWAV (1)

NAME

CONTENTS

SYNOPSIS

DESCRIPTION

FILES

SEE ALSO

BUGS

COPYRIGHT

ACKNOWLEDGEMENTS

Postal Address

PDN resources

Webmaster

Study at Cambridge

About the University

Research at Cambridge