Enhancing Speech Perception In Cochlear Implants: Novel Approaches In Encoding Temporal Fine Structures and Noise Reduction

Venkateswarlu, Poluboina

Please use this identifier to cite or link to this item: https://idr.l4.nitk.ac.in/jspui/handle/123456789/17731

Title:	Enhancing Speech Perception In Cochlear Implants: Novel Approaches In Encoding Temporal Fine Structures and Noise Reduction
Authors:	Venkateswarlu, Poluboina
Supervisors:	P, Aparna
Keywords:	Cochlear implants;Pitch shifting;Speech enhancement;Speech recognition
Issue Date:	2023
Publisher:	National Institute Of Technology Karnataka Surathkal
Abstract:	Cochlear implants (CIs) significantly enhance audibility and speech intel- ligibility in quiet environments. Nevertheless, speech recognition in noisy conditions remains a notable challenge. Efforts to enhance speech percep- tion in cochlear implants typically follow two approaches: preprocessing, which involves improving the signal-to-noise ratio (SNR), and speech cod- ing, aimed at encoding the significant cues necessary for speech recogni- tion in noisy environments. The current thesis addresses both approaches. The initial approach involves encoding vital cues meaningfully, focusing on examining the impact of temporal fine structures through proportional frequency compression. In the second part, two denoising techniques are proposed as pre-processing to improve the SNR; one is the modified Wiener filter method, and the other one is the Deep denoising method for speech enhancement. The research investigates the significance of TFS cut-off frequencies in CI speech coding to enhance speech perception in noise. Based on observa- tions, an algorithm is introduced to represent TFS through proportionally frequency compressed cues. Additionally, a pitch-shifted overlap-add algo- rithm (PSOLA) is proposed to encode TFS within the neuro-physiological limitations of CI users. Speech recognition scores (SRS) are measured under various signal processing conditions, including a sinewave vocoder without TFS, four unshifted TFS conditions with varying frequency cut- offs, and three PSOLA conditions that shift TFS frequencies. The original envelope remains unchanged across all conditions. The results indicate that the SRS for TFS 600 Hz shifted to 300 Hz through PSOLA outper- forms the no-TFS condition (sinewave vocoder), suggesting that encoding TFS using proportional frequency compression leads to improved speech perception in noise compared to the absence of TFS. Furthermore, a modified Wiener filter method is proposed to enhance speech intelligibility specifically for noisy environments, focusing on the context of cochlear implants. This noise reduction technique aims to min- imize the mean square error (MSE) between the temporal envelopes of the enhanced speech and the clean speech, making it suitable for CI appli- vcations. The study provides a theoretical analysis of the noise suppres- sion function and evaluates its performance using objective and subjective tests. Objective measures such as the speech-to-reverberation modulation energy ratio (SRMR-CI) and extended short-time objective intelligibility (ESTOI) are employed, while subjective evaluation involves speech recog- nition through acoustic simulations of the cochlear implant. The proposed method’s performance is compared with the Weiner filter (WF) and sig- moidal functions, using the sinewave vocoder to simulate cochlear implant perception. Finally, a new method is proposed for speech enhancement with deep learning training. The mathematical derivation supports the effectiveness of the proposed Noisy2Noisyavg (N2Navg ) strategy over the Noise2Noise (N2N) strategy. The target and the input of a deep complex unit- network (DCU-Net) are trained solely using noisy speech samples, eliminating the need for a large number of clean speech samples. The proposed method is compared with state-of-the-art speech-denoising techniques. Experimen- tal results demonstrate that the proposed approach not only reduces the reliance on clean targets but also mitigates the dependency on large data sizes typically associated with speech-denoising techniques. In summary, this research addresses the limitations of current cochlear implant algorithms by proposing novel approaches for TFS encoding, noise reduction, and deep learning-based speech enhancement. The findings contribute to improving speech perception and intelligibility for individuals with cochlear implants, providing insights for further advancements in the field.
URI:	http://idr.nitk.ac.in/jspui/handle/123456789/17731
Appears in Collections:	1. Ph.D Theses

Files in This Item:

There are no files associated with this item.

Show full item record