Speech analysis and synthesis using an auditory model

Carnegie, Dale A.

Speech analysis and synthesis using an auditory model

dc.contributor.advisor	Holmes, Geoffrey
dc.contributor.advisor	Smith, Lloyd A.
dc.contributor.author	Carnegie, Dale A.
dc.date.accessioned	2022-03-23T20:48:39Z
dc.date.available	2022-03-23T20:48:39Z
dc.date.issued	2000
dc.date.updated	2022-03-23T20:45:39Z
dc.description.abstract	Many traditional speech analysis/synthesis techniques are designed to produce speech with a spectrum that is as close as possible to the original. This may not be necessary because the auditory nerve is the only link from the auditory periphery to the brain, and all information that is processed by the higher auditory system must exist in the auditory nerve firing patterns. Rather than matching the synthesised speech spectra to the original representation, it should be sufficient that the representations of the synthetic and original speech be similar at the auditory nerve level. This thesis develops a speech analysis system that incorporates a computationally efficient model of the auditory periphery. Timing-synchrony information is employed to exploit the in-synchrony phenomena observed in neuron firing patterns to form a nonlinear relative spectrum intensity measure. This measure is used to select specific dominant frequencies to reproduce the speech based on a synthesis-by-sinusoid approach. The resulting speech is found to be intelligible even when only a fraction of the original frequencies are selected for synthesis. Additionally, the synthesised speech is highly noise immune, and exhibits noise reduction due to the coherence property of the frequency transform algorithm, and the dominance effect of the spectrum intensity measure. This noise reduction and low bit rate potential of the speech analysis system is exploited to produce a highly noise immune synthesis that outperforms similar representations formed both by a more physiologically accurate model and a classical non-biological speech processing algorithm. Such a representation has potential application in low-bit rate systems, particularly as a front end to an automatic speech recogniser.
dc.format.mimetype	application/pdf
dc.identifier.uri	https://hdl.handle.net/10289/14791
dc.language.iso	en
dc.publisher	The University of Waikato
dc.rights	All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
dc.title	Speech analysis and synthesis using an auditory model
dc.type	Thesis
pubs.place-of-publication	Hamilton, New Zealand	en_NZ
thesis.degree.grantor	The University of Waikato
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy (PhD)