Acoustic Scene Classification Using Speech Features

Mulimani, Manjunath.

Please use this identifier to cite or link to this item: https://idr.l4.nitk.ac.in/jspui/handle/123456789/16838

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Koolagudi, Shashidhar G.	-
dc.contributor.author	Mulimani, Manjunath.	-
dc.date.accessioned	2021-08-17T10:33:23Z	-
dc.date.available	2021-08-17T10:33:23Z	-
dc.date.issued	2020	-
dc.identifier.uri	http://idr.nitk.ac.in/jspui/handle/123456789/16838	-
dc.description.abstract	Currently, smart devices like smartphones, laptops, tablets, etc., need human intervention in the effective delivery of the services. They are capable of recognizing stuff like speech, music, images, characters and so on. To make smart systems behave as intelligent ones, we need to build a capacity in them, to understand and respond to the surrounding situation accordingly, without human intervention. Enabling the devices to sense the environment in which they are present through analysis of sound is the main objective of the Acoustic Scene Classification. The initial step in analyzing the surroundings is recognition of acoustic events present in day-to-day environment. Such acoustic events are broadly categorized into two types: monophonic and polyphonic. Monophonic acoustic events correspond to the non-overlapped events; in other words, at most one acoustic event is active in a given time. Polyphonic acoustic events correspond to the overlapped events; in other words, multiple acoustic events occur at the same time instance. In this work, we aim to develop the systems for automatic recognition of monophonic and polyphonic acoustic events along with corresponding acoustic scene. Applications of this research work include context-aware mobile devices, robots, intelligent monitoring systems, assistive technologies for hearing-aids and so on. Some of the important issues in this research area are, identifying acoustic event specific features for acoustic event characterization and recognition, optimization of the existing algorithms, developing robust mechanisms for acoustic event recognition in noisy environments, making the-state-of-the-art methods working on big data, developing a joint model that recognizes both acoustic events followed by corresponding scenes etc. Some of the existing approaches towards solutions have major limitations of using known traditional speech features, that are sensitive to noise, use of features from two-dimensional Time-Frequency Representations (TFRs) for recognizing the acoustic events, that demand high computational time;use of deep learning models, that require substantially huge amount of training data. Many novel approaches have been presented in this thesis for recognition of monophonic acoustic events, polyphonic acoustic events and scenes. Two main challenges associated with the real-time Acoustic Event Classification (AEC) are addressed in this thesis. The first one is the effective recognition of acoustic events in noisy environments, and the second one is the use of MapReduce programming model on Hadoop distributed environment to reduce computational complexity. In this thesis, the features are extracted from the spectrograms, which are robust compared to the traditional speech features. Further, an improved Convolutional Recurrent Neural Network (CRNN) and a Deep Neural Network-Driven feature learning models are proposed for Polyphonic Acoustic Event Detection (AED) in real-life recordings. Finally, binaural features are explored to train Kervolutional Recurrent Neural Network (KRNN), which recognizes both acoustic events and a respective scene of an audio signal. Detailed experimental evaluation is carried out to compare the performance of each of the proposed approaches against baseline and state-of-the-art systems.	en_US
dc.language.iso	en	en_US
dc.publisher	National Institute of Technology Karnataka, Surathkal	en_US
dc.subject	Department of Computer Science & Engineering	en_US
dc.subject	Monophonic Acoustic Event Classification (AEC)	en_US
dc.subject	polyphonic	en_US
dc.subject	Acoustic Event Detection (AED)	en_US
dc.subject	Acoustic Scene Classification (ASC)	en_US
dc.subject	Time Frequency Representations (TFRs)	en_US
dc.subject	Map Reduce programming model	en_US
dc.subject	Convolutional Recurrent Neural Network (CRNN)	en_US
dc.subject	Kervolutional Recurrent Neural Network (KRNN)	en_US
dc.title	Acoustic Scene Classification Using Speech Features	en_US
dc.type	Thesis	en_US
Appears in Collections:	1. Ph.D Theses

Files in This Item:

File	Description	Size	Format
155098CS15FV06.pdf		2.07 MB	Adobe PDF	View/Open

Show simple item record