Manual Reference Pages - GENWAV (1)
genwav - display the wave in filename.
genwav [ option=value | -option ] [ filename ]
Genwav sets up an Xwindow and displays a segment of the input wave in the window. The size of the window and the size of the wave are determined by options, as are a number of other input/output functions and printing functions. These options have no direct bearing on the auditory processing performed by AIM. For convenience, then, these Non-Auditory options are associated with the instruction genwav (the one non-auditory instruction), and they are listed at the top of the options tables prior to the auditory options. There are also a large number of Silent Options which control lesser used functions, both auditory and non-auditory. They are listed in docs/aimSilentOptions and there is documentation for some of them at the end of the listing.
There are three classes of Non-Auditory options:
I) DISPLAY OPTIONS that determine the format of the auditory representations of sound on the screen, or on paper when printed.
II) OUTPUT OPTIONS that determine the format and content of files used to store the auditory representations of sounds.
III) INPUT OPTIONS that determine how the wave in the input file should be interpreted.
The output options are presented before the input options so that the input options will be adjacent to the filterbank options in the options tables produced by genbmm and subsequent instructions.
I. DISPLAY OPTIONS
The AIM modules produce output in the form of a set of functions, one for each channel of the auditory filterbank. For example, the output of genbmm is a set of functions that simulate basilar membrane motion produced in response to the input wave. By default, the AIM software puts an Xwindow up on the computer screen and displays the output in the window. This section describes the options that control these displays. There are also a number of Silent Options associated with displays (see docs/aimSilentOptions).
The display options are: title, display, x0-win, y0-win, width_win, height_win, display, view, top, bottom, magnification, pensize, hiddenline.
A. The Display Window Title, Position, and Size
title Title of output display.
Character string. Default: input file name.
The title of the output being displayed. If no title is given, the display bears the name of the file of the input wave.
x0_win Left edge of window
Unit: pixels. Default: centre.
The left edge of the window into which the display will be drawn, relative to the left edge of the screen (i.e. the x-coordinate of the window within the screen). A value of centre will cause centring in the horizontal dimension (provided the window manager does not override).
y0-win Lower edge of window
Unit: pixels. Default: centre.
The lower edge of the window into which the display will be drawn, relative to the lower edge of the screen (i.e. the y-coordinate of the window within the screen). A value of centre will cause centring in the vertical dimension (provided the window manager does not override).
Taken as a pair x0_win and y0-win determine the origin of the window, relative to the screen origin which is assumed to be the lower left corner of the screen.
width_win Window width
Unit: pixels. Default: 640.
The width of the window into which the display will be drawn.
height_win Window height
Unit: pixels. Default: 480.
The height of the window into which the display will be drawn.
B. Display Controls
display Display output on screen
Switch. Default: on.
Normally this switch is on and a bitmap of the output is displayed in a graphical window on the computer screen. The switch is provided because the time taken to create the displays is considerable, and it is useful to turn it dsiplay off using AIM as a preprocessor for speech recognition.
top The largest postive value visible in the display
Scalar. Default value: 1024 (for genwav)
Each of the functions in the multi-channel output of a module is displayed in a transparent window. Provided the channel density is not too low, the functions are related and the set of functions produces a display that looks like a complex landscape. Top determines the largest positive value that will appear in the transparent windows of the individual functions, so top must be as large as the largest value in the full set of functions. Increasing top has the effect of moving the viewer farther up above the landscape.
bottom The largest negative value visible in the display
Scalar. Default value: -1024 (for genwav)
Bottom determines the largest negative value that will appear in the transparent windows of the individual functions, so bottom must be as large in the negative direction as the largest negative value in the full set of functions. Increasing bottom in the negative direction has the effect of depeening the valleys in the landscape.
magnification Display magnification
Scalar. Default: 1.0.
The degree to which the amplitude of the functions in the display should be magnified before being displayed. This parameter is merely for adjusting the visual contrast of the display. The magnification option is a multiplier, so a value of 1 implies drawing to scale, while a value of 10 implies ten times (10x) the size of values in the module output and 0.1 implies one tenth of the output size. Magnification is related to, but separate from, the gain options which affect the values of the output functions and the values stored in any output files. Magnification is an alternative means of controlling the size of the functions in the display -- alternative to top and bottom.
pensize The size of the lines in the displays and the
dots on the spiral
Unit: pixels. Default: 1.
This option allows the user to specify the thickness of the lines in the display and the size of the dots on spiral auditory images. It also affects the lines and dots in postscript plots. It is provided primarily for use with printers which have much more resolution than computer screens. On laser printers a value of 3-5 gives reasonable line thickness. On the screen, a linewidth greater than 1 produces slow drawing, and a gagged, blurred display.
hiddenline Draw with overlapping parts of functions
hidden Switch. Default: on.
This switch specifies whether or not a hidden line algorithm should be used when drawing the display. It also affects printed displays. In almost all cases, hiddenline results in more attractive displays of waveforms, and it often makes complex displays easier to understand, so the default is on. Note: hiddenline almost doubles the drawing time so it is sometimes useful to switch it off on slower machines.
II. OUTPUT OPTIONS
The output options are listed and described before the input options so that the input options will be adjacent to the filterbank options in the listings produced by genbmm and subsequent modules. The output options are downchannel, erase_ctn, animate_ctn, bitmap_ctn, postscript, output, and header. There are also a number of Silent Options associated with output (see docs/aimSilentOptions).
downchannel Average adjacent channels of multichannel
Units: Number of averagings. Default value: 0.
There is interaction between channels in the transmission-line filterbank of the physiological version of AIM, and in the neural encoding of the functional version of AIM. The minimum channel density for these processes to operate properly is four channels per ERB and 2 channels per ERB, respectively. For broadband signals like speech this means that the minimum number of channels is on the order of 128 and 64, respectively. This channel density can produce cluttered displays, and more importantly, it is far too many channels for current speech recognition systems which typically use 12-24 channels. This is not just a computer power problem; the recognition systems actually perform less well with extra channels. Accordingly, the option downchannel provides the option of reducing the channel density at output, so that AIM can operate with the appropriate channel density and still provide output that is compatible with displays and speech recognition systems.
Downchannel averages pairs of adjacent channels and the option value specifies how many times it should execute the averaging process. Each averaging reduces the number of channels by a factor of 2, so for proper transmission-line filtering and an output file with 16 channels, set channels_afb=128 and downchannel=3 (three successive halvings of the number of channels).
A. Animated Cartoons
Four of the AIM instructions produce output in the form of sequences of spectral frames (gensgm, gencgm, genasa and genepn). Bitmap versions of the displays of the frames can be stored by AIM and replayed by review and xreview. When the sequence of frames is played rapidly, it appears as an animated cartoon that shows the dynamic behaviour of the spectrum of the sound.
Similarly, the AIM instructions for auditory images (gensai and genspl) produce sequences of landscape frames, and bitmap versions of the landscape displays can also be stored by AIM and replayed by review and xreview. Indeed, it was the desire to produce auditory image cartoons that led to the development of much of the AIM software package. The animated cartoons or auditory images show the dynamic behaviour of features in the images, like the motion of formants in diphthongs and the motion of notes in a melody.
This section describes the options that control the construction and storage of sequences of bitmaps; there is a separate manual entries for the xreview routine that replays the bitmaps (man xreview).
erase_ctn Erase the current frame before presenting
the next frame
Switch. Default value: on.
Normally, when presenting a sequence of frames as an animated cartoon, one wants to erase the current frame before presenting the next. When the frames are spectra, however, the set of frames can together form a meaningful display; for example, the set of rising spectra produced at the onset of a sound produces a contour map of the onset. The option erase_ctn enables the user to observe the full set of spectra simultaneously. (See aimdemo_gtf_spectra or aimdemo_tlf_spectra ).
animate_ctn Store frames in memory and replay all of
them as a cartoon
Switch. Default value: off.
When this option is on, AIM stores the bitmaps of the frames it produces in the memory of the machine and replays them rapidly when the instruction is complete. Type RETURN to animate the cartoon again; type q RETURN to exit the instruction. (This option was important when machines were slower and before the availability of review and xreview. It is now largely obsolete.)
bitmap_ctn Store bitmaps of frames in a file for
replay as a cartoon
Switch. Default value: off.
When this option is on, bitmaps of the frames produced for the input in file_name will be stored in file_name.ctn. The sequence of frames can later be replayed using either
> review file_name or,
> xreview file_name
Both of these programs enable the user to vary the rate of animation, the section of the sequence to be view, etc. The xreview version has a window interface with useful information and is the preferred version in most cases.
B. Output Files for Printing and Postprocessing
postscript Produce printer-ready output
Switch. Default value: off.
This switch causes AIM to produce a printer-ready version of the displays it presents on the computer screen. For example, the NAP of a 32-ms section of cegc can be printed using
> gennap length=32 postscript=on cegc | lpr -Plw
where lpr is the Unix printer-driver and the lw of -Plw specifies the destination printer. You may need to check the name of your systems printer driver and laser printer.
Alternately the postscript version of the display may be directed to a file using an instruction like
> gennap length=32 postscript=on cegc > cegc_nap.ps
and printed later at the users convenience. In this example, the file name cegc_nap.ps is not generated by AIM; the _nap.ps suffix is added by the user following standard conventions to indicate that the file contains a NAP in postscript form.
NOTE: There are a very large number of Silent Options associated with
postscript printing that greatly facilitate preparation of
displays for publication (see docs/aimSilentOptions).
THREE POSTSCRIPT CAUTIONS:
Postscript files of landscape displays from AIM are very large. As a result, we recommend
a) that you NOT switch postscript on without redirecting the output to a file, as it will cause the output to be display on the screen in a seemingly endless display,
b) that you be careful NOT to print postscript files on a printer which does not understand the Postscript language, as it can cause the printer to put out an extremely long file, one column per page!
c) that you NOT set postscript=on in an options file as it will generate large files in the directory without your noticing.
output Generate an output file
Switch. Default value: off.
This switch causes the array of functions that defines AIMs simulation of basilar membrane motion, or a neural activity pattern, or an auditory image, to be stored in a file for subsequent processing by the aimtools or other, user defined, operators. By convention, the file is given the same name as the input file, but with a suffix reflecting the entry point, to distinguish it from the input file on the one hand and from other output files on the other hand. The naming system enables the user to construct and store a set of output files for one input file without the need to specify a sequence of file names. The suffixes are those used to identify the modules in the listing produced by gen -help. So, for example, the following command line:
> gennap output=on length=32 cegc
will produce an output file named cegc.nap containing a multiplexed version of the functions that define the NAP of the first 32 ms of cegc.
The spectrographic representations produced by gensgm and gencgm can be stored in the same way, as can the sequences of spectra produced by genasa and genepn. It is the output files of genasa and gencgm that are used to interface AIM with speech recognition systems (Robinson et al., 1990; Patterson et al., 1995; Giguere and Woodland, 1994a). Details of the file formats are presented in docs/aimFileFormat.
header Put a header on the output file
Flag. Default value: on.
By default, a header is prepended to each output file so that subsequent processors have access to the history of the file. Details of the header structure are presented in docs/aimFileFormat.
Note: There is an AIM tool hdr which will remove the header from an AIM output file (man hrd).
III. INPUT OPTIONS
The input options enable the user to process a subsection of the input wave, and to specify characterisitcs of the wave.
The input options are: input_wave, start_wave, length_wave, samplerate, swap_wave, dB_wave.
input_wave Default input wave name
Filename. Default value: none.
The name of the wave file to process. This option permits simple repetitive processing of the same input file without repetitive typing. It also enables one to circumvent the Unix convention of having the filename last on the command line. This option is overridden if the user supplies a wave file name at the end of the command line.
start_wave Start point in wave
Default unit: ms. Default value: 0.
The point in the input wave at which processing should begin. The start_wave option is expressed in milliseconds and its default value is the beginning of the file (i.e. 0 ms into the file).
length_wave Length of wave
Default unit: ms. Default value: remainder.
The number of milliseconds of the wave that ought to be processed, beyond the start point. The special value remainder indicates that the entire length of the wave from the start point to the end of the file should be processed.
samplerate Input wave sample rate
Default unit: Hertz. Default value: 20,000 Hz.
The rate at which the input wave was sampled.
swap_wave Swap the bytes in each binary pair of the
Switch. Default: off.
The order of the bytes in short integers varies between manufacturers. Specifically the order for Sun. SGI and HP is opposite that for DEC and IBM. The default setting (off) is for the latter byte order.
dB_wave Input wave level for physiological AIM
This option sets level of the input wave for the physiological version of AIM, that is, the route with the transmission line filterbank. (The functional version of AIM is level-independent and dB_wave is ignored when the gammatone filerbank is used.)
Units: dB. Default: 60 dB
dB_wave is a scaling parameter that tells AIM the level of the wave in your input file relative to AIMs internal standard. It is used for calibration and investigation of the effects of level in the auditory system.
To calibrate AIM for a given recording set up, put a sinusoid of known level (dBSPL) into the recording system and store a sample of it as short integers in a headerless wave file. Calculate the rms amplitude of the sinusoid (RMS) (see note below) and then use the following equation to calculate the appropriate value of dB_wave.
dB_wave = dBSPL - 20log(RMS/200)
For example, if the sinusoid has a known level of 60 dB SPL, and the recorded version produces a wave with an RMS amplitude of 467.3, then dB_wave should be set to 52.6.
Note: The RMS value of a stored input wave can be calculated using the AIM tool stats as follows:
> stats stat=rms line=on <input-wave>
Investigation of Level Effects:
If you change the value of dB_wave from 60 to 80 dB, the SAME input file is assumed to represent a stimulus that is 20 dB HIGHER in level. This enables you to investigate the effects of level with a fixed input file.
You can scale the genbmm output into absolute units of basilar membrane velocity in cm/s by multiplying the genbmm output numbers by:
antilog[(dB_wave-60)/20] / (4000*gain_tlf)
where dB_wave and gain_tlf are the values specified at run time.
.genwavrc The options file for genwav.
Copyright (c) Applied Psychology Unit, Medical Research Council, 1995
Permission to use, copy, modify, and distribute this software without fee is hereby granted for research purposes, provided that this copyright notice appears in all copies and in all supporting documentation, and that the software is not redistributed for any fee (except for a nominal shipping charge). Anyone wanting to incorporate all or part of this software in a commercial product must obtain a license from the Medical Research Council.
The MRC makes no representations about the suitability of this software for any purpose. It is provided "as is" without express or implied warranty.
THE MRC DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE A.P.U. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
The AIM software was developed for Unix workstations by John Holdsworth and Mike Allerhand of the MRC APU, under the direction of Roy Patterson. The physiological version of AIM was developed by Christian Giguere. The options handler is by Paul Manson. The revised SAI module is by Jay Datta. Michael Akeroyd extended the postscript facilites and developed the xreview routine for auditory image cartoons.
The project was supported by the MRC and grants from the U.K. Defense Research Agency, Farnborough (Research Contract 2239); the EEC Esprit BR Porgramme, Project ACTS (3207); and the U.K. Hearing Research Trust.
|SunOS 5.6||GENWAV (1)||16 April 1997|