skip to primary navigationskip to content


Manual Reference Pages  - GENSPL (1)


genspl - generate a spiral auditory image


See Also


genspl [ option=value | -option ] filename


The spiral auditory image is an alternative display of the auditory image which emphasises the pitch of a sound and de-emphasises the timbre, or sound quality. That is, it emphasises the global temporal structure that arises in the NAPs of periodic sounds and focuses attention on the scale of the structure. It de-emphasises the fine-structure of the time-interval patterns in the auditory image and the shape of all but the largest features (Patterson, 1986, 1987b). Since the underlying information is the same for the spiral and rectangular auditory images, the options that control the auditory image itself are the same for genspl as they are for gensai. The options are those with the suffix _ai and they have the same defaults in both cases. The options are napdecay_ai, stdecay_ai, stcrit_ai, stlag_ai, decay_ai, and stinfo_ai; they are described in the manual entry for gensai.

With respect to displays, it is difficult to present the functions in the individual channels of the auditory image in a spiral form; the fine detail of the functions gets lost in the spiral perspective. Accordingly, in the spiral perspective each of the separate SAI pulses is replaced by a dot positioned at the time of the peak of the SAI pulse. [Previously, this representation was referred to as a pulse ribbon (Patterson, 1986, 1987b).] Conceptually, the spiral auditory image is a set of concentric spirals one for each channel of the auditory image. The channel with the highest centre frequency is on the inside with the smallest radius; the channel with the lowest centre frequency is on the outside with the largest radius. The spirals lines are omitted for clarity, leaving just the dots. Sets of dots in adjacent channels often form bars aligned along spokes emmanating from the centre of the spiral. These bars show that the same period exists in a range of adjacent channels in the NAP. Furthermore, this information about correlation across channels appears on the same spoke as the information indicating that the pattern repeats in time. Thus the multi-channel spiral maps both spectral and temporal information concerning the periodicity of the sound onto a single spatial vector -- a spoke of the spiral. It is this property that enables the spiral representation to explain octave perception (Patterson, 1986, 1990; Patterson et al, 1993b; Robinson and Patterson, 1995a,b).


The input and output options for the spiral auditory image are the same as for the rectangular auditory image. Furthermore, the spiral auditory image of an extended sound is a cartoon just like the linear auditory image, so spiral images of dynamic sounds can be generated, stored, animated, and reviewed in the same way as with the rectaulgular auditory image (gensai).




The options that control the position and size of the spiral image window on the screen are the same as for all previous windows. The options that control the timing of the frames and the portion of the the auditory image presented in the display (frstep_aid, pwidth_aid, and nwidth_aid) are the same as for gensai. Note, however, that nwidth_aid is disabled (ignored) because logarithms are not defined for non-positive numbers. This means that the spiral auditory image represses most of the information in transients unless they occur periodically with a period less than 35 ms. The minimum oberservable time-interval is controlled by a new option, orig_spd.

There are three new display options for the spiral view of the auditory image: form_spd, zeroline_spd, and orig_spd. The mathematics of spirals as they pertain to the spiral auditory image are set out in Patterson and Nimmo-Smith (1986), Patterson et al. (1990) and Allerhand et al. (1991).

form_spl The form of the spiral time line (Archemedian or logarithmic) Switch: Default, archimedian.

The software offers two visual representations of the logarithmic spiral, both of which have the base 2. Both representations gather doublings in time onto a single spoke of the spiral, and so both have the general property that

        a = log2(t/T)

where a is the angle between the horizontal axis and the radius drawn to a point on the spiral, T is the period of the sampling rate, and t is ’integration time’ -- the horizontal axis of the rectangular auditory image. Both T and t are in seconds; the angle a is measured in revolutions, or circuits, of the spiral. Every time t doubles a increases by 1, and so the integer part of a (the characteristic of the logarithm) specifies the circuit of the spiral. The fractional part of a (the mantissa of the logarithm) specifies the angle within the circuit.

The archimedian spiral is like a coil of rope; that is, the radius increases by the thickness of the rope on each successive circuit. The form of the archimedian spiral is

        r = k*a = k*log2(t/T)

where r is the radius from the centre of the spiral to a point on the spiral and k is a constant for scaling the display (the thickness of the rope).

The logarithmic spiral has the form

        r = k*[2**a] = k*[2**log2(t/T)] or

        r = k*[t/T]

where ** means exponentiation, and r and k are the radius and a different scaling constant. The logarithmic version of the spiral has the advantage that image time is linear along the path of the spiral. However, it has the disadvantage that it expands exponentially and gives unwarranted emphasis to the longer integration intervals. As a result, the default form of the spiral is archimedian.

zeroline_spd Spiral axis, or time line Switch: Default, off

When the switch is set to "on", a spiral axis, or time line is plotted. It is presented on the outside of the circuit, one channel below the lowest filter channel, just as in the rectangular auditory image. The default value is "off" because the spiral axis contains a large number of points and it is slow to calculate and plot.

Note: The size of the spiral display is scaled so that the radius associated with the current value of pwidth fits inside the rectangle specified for the window (width_win and height_win). By default, width_win and height_win are set equal which produces a square display window. The spiral does not have to be square window, however, and rectangular windows sometimes give a useful sense of depth.

orig_spd Spiral start point and spiral orientation Default units: revolutions. Default value 4.072 revolutions. This option determines the minimum integration interval that appears on the spiral. As a result, it determines the starting angle for the spiral and the angular orientation of structures/features that appear in the image. The option enables the user to set the orientation of the main spoke of the spiral for a given combination of sampling rate and stimulus period. Periods that are an exact power-of-2 times the base period, 1/T, appear on the spoke preceeding horizontally from the centre of the spiral towards the right. By removing a portion of a circuit, the orientation of the spiral can be set to suit the user. A reduction in orig_spd of 0.25 will rotate the main spoke from horizontal to vertical.

When a sound has a long period, like 16 ms, the structure that appears in the spiral image (that is, the activity that is aligned on spokes) falls mainly in the outer circuits of the spiral. orig_spd enables the user omit the higher octaves that occupy the central section of the spiral, and so focus in on the relevant octave of the sound. The integer part of orig_spd determines the number of octaves omitted.

Note: orig_spd can scale and rotate the spiral simultaneously because integer changes in the parameter cause a scaling without rotation and fractional changes cause rotations that are large relative to the accompanying expansion or contraction. The default value, 4.072, assigns a vertical spoke to a period of 8 ms (and its base-2 relatives) when the sampling rate is 20 kHz (or a base-2 relative).

There are also two existing display options which have related but somewhat different functions that are worth specific mention. They are bottom and pensize.

bottom Threshold value for the production of a dot in the spiral auditory image. Units: auditory image strength. Default value, 25 image units. This threshold specifies the value that a pulse in the auditory image must reach, or exceed, in order for it to be presented as a dot in the spiral image.

pensize The size of the dots on the spiral Default units, pixels: Default value, 2 pixels.

The dots on the spiral are actually small squares and the value pensize determines the number of pixels along the side of the square. This same option controls line width in the rectangular displays.

Spiral plots are printed in the same way as other displays, that is, by setting postscript=on and routing them to a printer. Note, however, that printers have greater resolution than computer screens and the default mapping of a 2-pixel square from screen to printer may be a bit large on a small spiral plot. The full resolution of the printer can be accessed by setting pensize=1 and varying the SilentOption figlinewidth. A good combination for most printers is pensize=1 figlinewidth=0.25. (This value, 0.25, is the default for figlinewidth.)




In order to understand the spiral mapping, generate a rectangular auditory image the first note in cegc; that is,

> gensai input=cegc leng=96 top=4000 x0_win=10 or

> gensai input=cegc_br leng=96 top=4000 x0_win=10

For convenience, this note is referred to as C3 although, with a period of 8 ms, it is actually closer to B2. Now imagine the rectangular pulse ribbon that would be formed by replacing each SAI pulse in this rectangular image with a dot. The spiral auditory image is a mapped version of this rectangular pulse ribbon produced by compressing the pulse ribbon vertically by a factor of about 10, stretching it horizontally by a factor of about 2, and then wrapping it counterclockwise into a spiral, with the right-hand edge of the rectangular pulse ribbon at the centre of the spiral and the left-hand edge of the pulse ribbon at the end of the outer circuit. The spiral version of this same auditory image is generated by

> genspl input=cegc leng=96 x0_win=560 or

> genspl input=cegc_br leng=96 x0_win=560

If the spiral version is generated from a separate xterm, the rectangular and spiral images can be presented side by side for comparison.

The dots from vertical columns of pulses in the rectangular auditory image, merge into short bars in the spiral version of the image because of the vertical compression. The bars fall along spokes radiating from the centre of the spiral. The dots from the arches of pulses on either side of the vertical column in the rectangular auditory image appear in a stretched form like "wings" in the spiral auditory image. In the case of C3, three of the bars are aligned on one spoke of the spiral (the vertical spoke); they represent the strong correlations that occur in the auditory image for sections of the NAP separated by 1, 2, and 4 cycles. So, much of the information about periodicity that is distributed across the temporal dimension in the rectangular auditory image is gathered together onto a single spatial vector in the spiral image. The bar associated with the third cycle in the auditory image appears on a secondary spoke that proceeds downwards and a little to the right of vertical in the image.

It is convenient to describe the angle of the spokes on the spiral auditory image in terms of minutes on an analogue clock, since the orientation and scale of the analogue clock are more widely used than degrees, radians, or musical cents, and the twelve numbers on the clock occur at intervals that correspond on the spiral to the notes of the chromatic, equitemperament scale. For example, the two spokes delineated by C3 are at 0 and 25 minutes past the hour. A musical interval of 100 cents is 5 minutes on the clock and so, if the vertical spoke is C, the 25-minute spoke corresponds to F.


    A pitch glide in the spiral auditory image


The spiral auditory image, like its rectangular counterpart, is not limited to periodic sounds. When the pitch of a sound glides smoothly from one note to another the pattern on the spiral auditory image rotates smoothly from one position to another, and when the pitch changes abruptly from one note to another, the spiral pattern dissolves at the end of the first note and forms again in a different orientation at the start of the next note. There are glides between the notes of cegc and they illustrate the dynamic behaviour of the spiral auditory image. A spiral cartoon of cegc can be generated with

> genspl input=cegc bitmap=on   or

> genspl input=cegc_br bitmap=on

and it can be reviewed using

> review cegc   or      > review cegc_br

> xreview cegc  or      > xreview cegc_br

As the sound in cegc proceeds from C3 to E3, G3 and C4, the primary spoke of the pattern on the spiral rotates clockwise from 0 to 20, 35, and 60 minutes, completing one revolution as the pitch rises an octave. Note, however, that each of the spokes has been extended by one circuit towards the centre of the spiral and there are now eight bars rather than four, occupying four spokes rather than two. Thus, in AIM, octaves are perceived to be similar because they produce spoke patterns with the same orientation on the spiral auditory image. Note also, that when the note is G3, the secondary spoke of the spiral pattern is in the vertical position (C). The interaction of spokes on the spiral is extended into a theory of musical consonance in Patterson (1986, 1987a). It is shown that the notes of the major triad are those that have a spoke that coincides with the main spoke of the tonic, and the notes of the minor triad are minor images of those of the major triad.



Allerhand, M., Patterson, R.D., Robinson, K., and Rice, P. (1991). Spiral VOS Interim Report V: Application of the SVOS algorithm. APU Contract Report. (Appendix contains mathematics of spirals).

Patterson, R.D. (1986). "Spiral detection of periodicity and the spiral form of musical scales," Psychology of Music 14, 44-61.

Patterson R.D. and Nimmo Smith, I. (1986). "Thinning periodicity detectors for modulated pulse streams," In B.C.J. Moore and R.D. Patterson (Eds.) Auditory Frequency Selectivity (NATO ASI Series A: Life Sciences, Vol. 19), New York:Plenum, 299-307.

Patterson, R.D. (1987a). "A pulse ribbon model of peripheral auditory processing," In William A. Yost and Charles, S. Watson, (Eds.) Auditory Processing of Complex Sounds. Hillsdale, N.J., Erlbaum, 167-169.

Patterson, R.D. (1987b). "A pulse ribbon model of monaural phase perception," J. Acoust. Soc. Am. 82, 1560-1586.

Patterson, R.D. (1988). "Timbre cues in monaural phase perception: Distinguishing within-channel cues and between-channel cues," In H. Duifhuis, J.W. Horst and H.P. (Eds.), Basic Issues in Hearing. Proceedings of the 8th International Symposium on Hearing. London: Academic Press, 351-358.

Patterson, R.D. (1990) "The tone height of multi-harmonic sounds," Music Perception 8, 203-214.

Patterson, R.D., Allerhand, M., Holdsworth, J., and Rice. P. (1990) Spiral VOS interim report IV: Optimisation of the SVOS algorithm. APU contract report. (Appendices contain mathematics of spirals).

Patterson, R.D., Milroy, R. and Allerhand, M. (1993b). "What is the octave of a harmonically rich note?" Contemporary Music Review Vol. 9, Harwood, Switzerland, 69-81.

Robinson, K.L. & Patterson, R.D. (1995a) "The duration required to identify the instrument, the octave, or the pitch-chroma of a musical note," Music Perception 13, (in press).

Robinson, K.L. & Patterson, R.D. (1995b) "The stimulus duration required to identify vowels, their octave, and their pitch-chroma," J. Acoust. Soc. Am 98, (in press).





.gensplrc The options file for genspl.




Copyright (c) Applied Psychology Unit, Medical Research Council, 1995

Permission to use, copy, modify, and distribute this software without fee is hereby granted for research purposes, provided that this copyright notice appears in all copies and in all supporting documentation, and that the software is not redistributed for any fee (except for a nominal shipping charge). Anyone wanting to incorporate all or part of this software in a commercial product must obtain a license from the Medical Research Council.

The MRC makes no representations about the suitability of this software for any purpose. It is provided "as is" without express or implied warranty.



The AIM software was developed for Unix workstations by John Holdsworth and Mike Allerhand of the MRC APU, under the direction of Roy Patterson. The physiological version of AIM was developed by Christian Giguere. The options handler is by Paul Manson. The revised SAI module is by Jay Datta. Michael Akeroyd extended the postscript facilites and developed the xreview routine for auditory image cartoons.

The project was supported by the MRC and grants from the U.K. Defense Research Agency, Farnborough (Research Contract 2239); the EEC Esprit BR Programme, Project ACTS (3207); and the U.K. Hearing Research Trust.

SunOS 5.6 GENSPL (1) 29 August 1995
Generated by manServer 1.07 from /cbu/cnbh/aim/release/man/man1/genspl.1 using man macros.