Skip to main content

Robust Automatic Speech Recognition

In Order to Read Online or Download Robust Automatic Speech Recognition Full eBooks in PDF, EPUB, Tuebl and Mobi you need to create a Free account. Get any books you like and read everywhere you want. Fast Download Speed ~ Commercial & Ad Free. We cannot guarantee that every book is in the library!

Robust Automatic Speech Recognition

Robust Automatic Speech Recognition Book
Author : Jinyu Li,Li Deng,Reinhold Haeb-Umbach,Yifan Gong
Publisher : Academic Press
Release : 2015-10-30
ISBN : 0128026162
Language : En, Es, Fr & De

GET BOOK

Book Description :

Robust Automatic Speech Recognition: A Bridge to Practical Applications establishes a solid foundation for automatic speech recognition that is robust against acoustic environmental distortion. It provides a thorough overview of classical and modern noise-and reverberation robust techniques that have been developed over the past thirty years, with an emphasis on practical methods that have been proven to be successful and which are likely to be further developed for future applications. The strengths and weaknesses of robustness-enhancing speech recognition techniques are carefully analyzed. The book covers noise-robust techniques designed for acoustic models which are based on both Gaussian mixture models and deep neural networks. In addition, a guide to selecting the best methods for practical applications is provided. The reader will: Gain a unified, deep and systematic understanding of the state-of-the-art technologies for robust speech recognition Learn the links and relationship between alternative technologies for robust speech recognition Be able to use the technology analysis and categorization detailed in the book to guide future technology development Be able to develop new noise-robust methods in the current era of deep learning for acoustic modeling in speech recognition The first book that provides a comprehensive review on noise and reverberation robust speech recognition methods in the era of deep neural networks Connects robust speech recognition techniques to machine learning paradigms with rigorous mathematical treatment Provides elegant and structural ways to categorize and analyze noise-robust speech recognition techniques Written by leading researchers who have been actively working on the subject matter in both industrial and academic organizations for many years

Robust Automatic Speech Recognition Employing Phoneme dependent Multi environment Enhanced Models Based Linear Normalization

Robust Automatic Speech Recognition Employing Phoneme dependent Multi environment Enhanced Models Based Linear Normalization Book
Author : Igmar Hernández Ochoa
Publisher : Unknown
Release : 2006
ISBN : 0987650XXX
Language : En, Es, Fr & De

GET BOOK

Book Description :

This work shows a robust normalization technique by cascading a speech enhance-ment method followed by a feature vector normalization algorithm. An efficient scheme used to provide speech enhancement is the Spectral Subtraction algorithm, which reduces the effect of additive noise by performing a subtraction of noise spectrum estimate over the complete speech spectrum. On the other hand, a new and promising technique known as PD-MEMLIN (Phoneme-Dependent Multi-Enviroment Models based Linear Normalization) has also shown to be effective. PD-MEMLIN is an empirical feature vector normalization which models clean and noisy spaces by Gaussian Mixture Models (GMMs), and estimates the different compensation linear transformation to be per-formed to clean the signal. In this work the integration of both approaches is proposed. The final design is called PD-MEEMLIN (Phoneme-Dependent Multi-Enviroment Enhanced Models based Linear Normalization), which confirms and improves the effectiv-ness of both approaches. The results obtained show that in very high degraded speech (between -5dB and OdB) PD-MEEMLIN outperforms the SS by a range between 11.4% and 34.5%,for PD-MEMLIN by a range between 11.7% and 24.84%, and for SPLICE by a range between 6.04% and 22.23%. Furthemore, in moderate SNR, i.e. 15 or 20 dB, PD-MEEMLIN is as good as PD-MEMLIN and SS techniques.

Robust Automatic Speech Recognition and Moduling of Auditory Discrimination with Auditory Experiments Spectro temporal Features

Robust Automatic Speech Recognition and Moduling of Auditory Discrimination with Auditory Experiments Spectro temporal Features Book
Author : Marc René Schädler
Publisher : Unknown
Release : 2016
ISBN : 9783814223339
Language : En, Es, Fr & De

GET BOOK

Book Description :

Automatic speech recognition (ASR) systems still do not perform as well as human listeners under realistic conditions. The unmatched ability of humans to understand speech in most difficult acoustic conditions originates from the superior properties of their auditory system. The aim of this thesis is to improve the recognition performance of ASR systems in difficult acoustic conditions by carefully integrating auditory signal processing strategies. To this end, the physiologically inspired extraction of spectro-temporal modulation patterns was successfully integrated into the front-end of a standard ASR system. Furhter the joint spectro-temporal processing could be separated into independent temporal and spectral processes. To investigate the reason for the remaining "man-maschine-gap" in recognition performance, a range of critical auditory discrimination tasks were performed using ASR systems. The comparison with empirical data showed the the seperate spectro-temporal modulation front-end provides a suitable auditory model and revealed the importance of across-frequency processing in speech recognition.

Techniques for Noise Robustness in Automatic Speech Recognition

Techniques for Noise Robustness in Automatic Speech Recognition Book
Author : Tuomas Virtanen,Rita Singh,Bhiksha Raj
Publisher : John Wiley & Sons
Release : 2012-11-28
ISBN : 1119970881
Language : En, Es, Fr & De

GET BOOK

Book Description :

Automatic speech recognition (ASR) systems are finding increasing use in everyday life. Many of the commonplace environments where the systems are used are noisy, for example users calling up a voice search system from a busy cafeteria or a street. This can result in degraded speech recordings and adversely affect the performance of speech recognition systems. As the use of ASR systems increases, knowledge of the state-of-the-art in techniques to deal with such problems becomes critical to system and application engineers and researchers who work with or on ASR technologies. This book presents a comprehensive survey of the state-of-the-art in techniques used to improve the robustness of speech recognition systems to these degrading external influences. Key features: Reviews all the main noise robust ASR approaches, including signal separation, voice activity detection, robust feature extraction, model compensation and adaptation, missing data techniques and recognition of reverberant speech. Acts as a timely exposition of the topic in light of more widespread use in the future of ASR technology in challenging environments. Addresses robustness issues and signal degradation which are both key requirements for practitioners of ASR. Includes contributions from top ASR researchers from leading research units in the field

Robust Automatic Recognition of Birdsongs and Human Speech a Template Based Approach

Robust Automatic Recognition of Birdsongs and Human Speech  a Template Based Approach Book
Author : Kantapon Kaewtip
Publisher : Unknown
Release : 2017
ISBN : 0987650XXX
Language : En, Es, Fr & De

GET BOOK

Book Description :

This dissertation focuses on robust signal processing algorithms for birdsongs and speech signals. Automatic phrase or syllable detection systems of bird sounds are useful in several applications. However, bird-phrase detection is challenging due to segmentation error, duration variability, limited training data, and background noise. Two spectrograms with identical class labels may look different due to time misalignment and frequency variation. In real recording environments such as in a forest, the data can be corrupted by background interference, such as rain, wind, other animals or even other birds vocalizing. A noise-robust classifier needs to handle such conditions. Similarly, Automatic Speech Recognition (ASR) works well in quiet environments, but a large degradation in performance is observed when the speech signal is corrupted by background noise. The ASR performance would benefit from robust representations of speech signals and from robust recognition systems. The first topic of this dissertation focuses on an automatic birdsong-phrase recognition system that is robust to limited training data, class variability, and noise. The algorithm comprises a noise-robust Dynamic-Time-Warping (DTW)- based segmentation and a discriminative classifier for outlier rejection. The algorithm utilizes DTW and prominent (high energy) time-frequency regions of training spectrograms to derive a reliable noise-robust template for each phrase class. The resulting template is then used for segmenting continuous recordings to obtain segment candidates whose spectrogram amplitudes in the prominent regions are used as features to a Support Vector Machine (SVM). In addition, we present a novel approach to training HMMs with extremely limited data. First, the algorithm learns the Global Gaussian Mixture Models (GMMs) for all training phrases available. GMM parameters are then used to initialize state parameters of each individual model. The number of states and the mixture components for each state are determined by the acoustic variation of each phrase type. The (high-energy) time-frequency prominent regions are used to compute the state emitting probability to increase noise-robustness. The second topic of the dissertation deals with noise-robust processing for automatic speech recognition. We also propose a new pitch-based spectral enhancement algorithm based on voiced frames for speech analysis and noise-robust speech processing. The proposed algorithm determines a time-warping function (TWF) and the speaker's pitch with high precision, simultaneously. This technique reduces the smearing effect in between harmonics when the fundamental frequency is not constant within the analysis window. To do so, we propose a metric called the harmonic residual which measures the difference between the actual spectrum and the resynthesized spectrum derived from the linear model of speech production with various combinations of TWF and high-precision pitch values as parameters. The TWF and pitch pair that yields the minimum harmonic residual is selected and the enhanced spectrum is obtained accordingly. We show how this new representation can be also used for automatic speech recognition by proposing a robust spectral representation derived from harmonic amplitude interpolation.

Robustness in Automatic Speech Recognition

Robustness in Automatic Speech Recognition Book
Author : Jean-Claude Junqua,Jean-Paul Haton
Publisher : Springer Science & Business Media
Release : 2012-12-06
ISBN : 1461312973
Language : En, Es, Fr & De

GET BOOK

Book Description :

Foreword Looking back the past 30 years. we have seen steady progress made in the area of speech science and technology. I still remember the excitement in the late seventies when Texas Instruments came up with a toy named "Speak-and-Spell" which was based on a VLSI chip containing the state-of-the-art linear prediction synthesizer. This caused a speech technology fever among the electronics industry. Particularly. applications of automatic speech recognition were rigorously attempt ed by many companies. some of which were start-ups founded just for this purpose. Unfortunately. it did not take long before they realized that automatic speech rec ognition technology was not mature enough to satisfy the need of customers. The fever gradually faded away. In the meantime. constant efforts have been made by many researchers and engi neers to improve the automatic speech recognition technology. Hardware capabilities have advanced impressively since that time. In the past few years. we have been witnessing and experiencing the advent of the "Information Revolution." What might be called the second surge of interest to com mercialize speech technology as a natural interface for man-machine communication began in much better shape than the first one. With computers much more powerful and faster. many applications look realistic this time. However. there are still tremendous practical issues to be overcome in order for speech to be truly the most natural interface between humans and machines.

Robust Speech Recognition and Understanding

Robust Speech Recognition and Understanding Book
Author : Danel Jaso
Publisher : Unknown
Release : 2016-04-01
ISBN : 9781681174662
Language : En, Es, Fr & De

GET BOOK

Book Description :

"Speech recognition systems have become much more robust in recent years with respect to both speaker variability and acoustical variability. Automatic speech recognition (ASR) systems are finding increasing use in everyday life. Many of the commonplace environments where the systems are used are noisy, for example users calling up a voice search system from a busy cafeteria or a street. This can result in degraded speech recordings and adversely affect the performance of speech recognition systems. In addition to achieving speaker independence, many current systems can also automatically compensate for modest amounts of acoustical degradation caused by the effects of unknown noise and unknown linear filtering. As speech recognition and spoken language technologies are being transferred to real applications, the need for greater robustness in recognition technology is becoming increasingly apparent. Substantial progress has also been made over the last decade in the dynamic adaptation of speech recognition systems to new speakers, with techniques that modify or warp the systems' phonetic representations to reflect the acoustical characteristics of individual speakers. Speech recognition systems have also become more robust in recent years, particularly with regard to slowly-varying acoustical sources of degradation. As the use of ASR systems increases, knowledge of the state-of-the-art in techniques to deal with such problems becomes critical to system and application engineers and researchers who work with or on ASR technologies.Robust Speech Recognition and Understanding brings together many different aspects of the current research on automatic speech recognition and language understanding. Additionally, it presents a comprehensive survey of the state-ofthe-art in techniques used to improve the robustness of speech recognition systems to these degrading external influences. "

Robust Speech Recognition of Uncertain or Missing Data

Robust Speech Recognition of Uncertain or Missing Data Book
Author : Dorothea Kolossa,Reinhold Haeb-Umbach
Publisher : Springer Science & Business Media
Release : 2011-07-14
ISBN : 9783642213175
Language : En, Es, Fr & De

GET BOOK

Book Description :

Automatic speech recognition suffers from a lack of robustness with respect to noise, reverberation and interfering speech. The growing field of speech recognition in the presence of missing or uncertain input data seeks to ameliorate those problems by using not only a preprocessed speech signal but also an estimate of its reliability to selectively focus on those segments and features that are most reliable for recognition. This book presents the state of the art in recognition in the presence of uncertainty, offering examples that utilize uncertainty information for noise robustness, reverberation robustness, simultaneous recognition of multiple speech signals, and audiovisual speech recognition. The book is appropriate for scientists and researchers in the field of speech recognition who will find an overview of the state of the art in robust speech recognition, professionals working in speech recognition who will find strategies for improving recognition results in various conditions of mismatch, and lecturers of advanced courses on speech processing or speech recognition who will find a reference and a comprehensive introduction to the field. The book assumes an understanding of the fundamentals of speech recognition using Hidden Markov Models.

Recent Advances in Robust Speech Recognition Technology

Recent Advances in Robust Speech Recognition Technology Book
Author : Javier Ramírez,Juan Manuel Górriz
Publisher : Bentham Science
Release : 2011-01-01
ISBN : 1608051722
Language : En, Es, Fr & De

GET BOOK

Book Description :

This E-book is a collection of articles that describe advances in speech recognition technology. Robustness in speech recognition refers to the need to maintain high speech recognition accuracy even when the quality of the input speech is degraded, or when the acoustical, articulate, or phonetic characteristics of speech in the training and testing environments differ. Obstacles to robust recognition include acoustical degradations produced by additive noise, the effects of linear filtering, nonlinearities in transduction or transmission, as well as impulsive interfering sources, and diminished accuracy caused by changes in articulation produced by the presence of high-intensity noise sources. Although progress over the past decade has been impressive, there are significant obstacles to overcome before speech recognition systems can reach their full potential. Automatic speech recognition (ASR) systems must be robust to all levels, so that they can handle background or channel noise, the occurrence on unfamiliar words, new accents, new users, or unanticipated inputs. They must exhibit more 'intelligence' and integrate speech with other modalities, deriving the user's intent by combining speech with facial expressions, eye movements, gestures, and other input features, and communicating back to the user through multimedia responses. Therefore, as speech recognition technology is transferred from the laboratory to the marketplace, robustness in recognition becomes increasingly significant. This E-book should be useful to computer engineers interested in recent developments in speech recognition technology.

Reconstructing Incomplete and Unreliable Speech Spectrogram for Robust Automatic Speech Recognition

Reconstructing Incomplete and Unreliable Speech Spectrogram for Robust Automatic Speech Recognition Book
Author : Shirin Badiezadegan
Publisher : Unknown
Release : 2015
ISBN : 0987650XXX
Language : En, Es, Fr & De

GET BOOK

Book Description :

"The performance of an automatic speech recognition (ASR) system degrades dramatically when speech is corrupted by background noise. In many ASR applications, however, the presence of the background noise is unavoidable. Feature representations in ASR are usually derived from the short-time spectral magnitude of the speech signal, known as the speech spectrogram. The goal of the work in this thesis is to develop noise robust ASR systems by reconstructing noise corrupted speech spectrograms. This is addressed as a data imputation problem within the framework of missing feature theory in computational auditory scene analysis. This thesis presents a number of data imputation techniques which can add noise robustness to an ASR system while making minimum assumptions about the characteristics of the background noise. There are three major contributions in this thesis work. The first relates to the spectrographic mask estimation which is performed to identify noise corrupted features. Having identified the noise corrupted speech features, a spectrogram reconstruction technique is employed to estimate the underlying clean features and reconstruct the noise corrupted features. A mask estimation method, based on speech enhancement techniques presented previously in the literature, is incorporated in a spectrogram reconstruction approach for noise robust ASR. The presented mask estimation technique is shown to perform well both in stationary and non-stationary noisy environments. More importantly, this technique does not require any prior knowledge of the background noise type or the SNR level.The second contribution of this thesis is a filterbank based approach to spectrogram reconstruction based on discrete wavelet transform (DWT) de-noising. In these techniques, speech spectrogram coefficients are input to a DWT filterbank. Most of the spectrogram reconstruction approaches presented in the literature are model-based techniques that can only provide accurate estimates of the underlying clean speech when the characteristics of the noise corrupted features do not deviate from those of the model. Discrete wavelet transform (DWT) based de-noising methods have been used for signal reconstruction, but often require that the background noise is stationary and modeled by a Gaussian distribution. A novel approach is presented in this thesis for incorporating the information derived from spectrographic masks in a DWT-based de-noising method. It will be shown that the proposed approach reduces the impact of model mismatch associated with parametric approaches and exploits the robustness of non-parametric wavelet de-noising approach. This technique, however, can perform at its best only if some parameters are tuned to the noise conditions. The third contribution of this thesis is a procedure which combines multiple DWT-based reconstructed spectral features using a closed loop optimization algorithm which is related to the overall performance of the ASR system. The feature channels are formed from an ensemble of reconstructed spectrograms generated by applyingDWT-based spectrogram reconstruction with multiple parameter settings. The spectrograms associated with these feature channels differ in the degreeto which spectral information is suppressed across multiple scales and frequencybands.A consistent increase in word accuracy is reported for this multi-channelperformance monitoring approach with respect to animplementation of a more well known minimum mean squared error approach formissing feature based spectrogram reconstruction. " --

Robust Adaptation to Non Native Accents in Automatic Speech Recognition

Robust Adaptation to Non Native Accents in Automatic Speech Recognition Book
Author : Silke Goronzy
Publisher : Springer Science & Business Media
Release : 2002-12-19
ISBN : 9783540003250
Language : En, Es, Fr & De

GET BOOK

Book Description :

Speech recognition technology is being increasingly employed in human-machine interfaces. A remaining problem however is the robustness of this technology to non-native accents, which still cause considerable difficulties for current systems. In this book, methods to overcome this problem are described. A speaker adaptation algorithm that is capable of adapting to the current speaker with just a few words of speaker-specific data based on the MLLR principle is developed and combined with confidence measures that focus on phone durations as well as on acoustic features. Furthermore, a specific pronunciation modelling technique that allows the automatic derivation of non-native pronunciations without using non-native data is described and combined with the previous techniques to produce a robust adaptation to non-native accents in an automatic speech recognition system.

Acoustical and Environmental Robustness in Automatic Speech Recognition

Acoustical and Environmental Robustness in Automatic Speech Recognition Book
Author : Alex Acero
Publisher : Springer Science & Business Media
Release : 1992-11-30
ISBN : 9780792392842
Language : En, Es, Fr & De

GET BOOK

Book Description :

The need for automatic speech recognition systems to be robust with respect to changes in their acoustical environment has become more widely appreciated in recent years, as more systems are finding their way into practical applications. Although the issue of environmental robustness has received only a small fraction of the attention devoted to speaker independence, even speech recognition systems that are designed to be speaker independent frequently perform very poorly when they are tested using a different type of microphone or acoustical environment from the one with which they were trained. The use of microphones other than a "close talking" headset also tends to severely degrade speech recognition -performance. Even in relatively quiet office environments, speech is degraded by additive noise from fans, slamming doors, and other conversations, as well as by the effects of unknown linear filtering arising reverberation from surface reflections in a room, or spectral shaping by microphones or the vocal tracts of individual speakers. Speech-recognition systems designed for long-distance telephone lines, or applications deployed in more adverse acoustical environments such as motor vehicles, factory floors, oroutdoors demand far greaterdegrees ofenvironmental robustness. There are several different ways of building acoustical robustness into speech recognition systems. Arrays of microphones can be used to develop a directionally-sensitive system that resists intelference from competing talkers and other noise sources that are spatially separated from the source of the desired speech signal.

Advances in Audiovisual Speech Processing for Robust Voice Activity Detection and Automatic Speech Recognition

Advances in Audiovisual Speech Processing for Robust Voice Activity Detection and Automatic Speech Recognition Book
Author : Fei Tao (Electrical engineer)
Publisher : Unknown
Release : 2018
ISBN : 0987650XXX
Language : En, Es, Fr & De

GET BOOK

Book Description :

Speech processing systems are widely used in existing commercial applications, including virtual assistants in smartphones and home assistant devices. Speech-based commands provide convenient hands-free functionality for users. Two key speech processing systems in practical applications are voice activity detection (VAD), which aims to detect when a user is speaking to a system, and automatic speech recognition (ASR), which aims to recognize what the user is speaking. A limitation in these speech tasks is the drop in performance observed in noisy environments or when the speech mode differs from neutral speech (e.g., whisper speech). Emerging audiovisual solutions provide principled frameworks to increase the robustness of the systems by incorporating features describing lip motion. This study proposes novel audiovisual solutions for VAD and ASR tasks. The dissertation introduces unsupervised and supervised audiovisual voice activity detection (AV-VAD). The unsupervised approach combines visual features that are characteristic of the semi-periodic nature of the articulatory production around the orofacial area. The visual features are combined using principal component analysis (PCA) to obtain a single feature. The threshold between speech and non-speech activity is automatically estimated with the expectation-maximization (EM) algorithm. The decision boundary is improved by using the Bayesian information criterion (BIC) algorithm, resolving temporal ambiguities caused by different sampling rates and anticipatory movements. The supervised framework corresponds to the bimodal recurrent neural network (BRNN), which captures the taskrelated characteristics in the audio and visual inputs, and models the temporal information within and across modalities. The approach relied on three subnetworks implemented with long short-term memory (LSTM) networks. This framework is implemented with either hand-crafted features or features representations directly derived from the data (i.e., end-toend system). The study also extends this framework by increasing the temporal modeling by using advanced LSTMs (A-LSTMs). For audiovisual automatic speech recognition (AV-ASR), the study explores the use of visual features to compensate for the mismatch observed when the system is evaluated with whisper speech. We propose supervised adaptation schemes which significantly reduce the mismatch between normal and whisper speech across speakers. The study also introduces the Gating neural network (GNN). The GNN aims to attenuate the effect of unreliable features, creating AV-ASR systems that improve, or at least maintain, the performance of an ASR system implemented only with speech. Finally, the dissertation introduces the front-end alignment neural network (AliNN) to address the temporal alignment problem between audio and visual features. This front-end system is important as the lip motion often precedes speech (e.g., anticipatory movements). The framework relies on RNN with attention model. The resulting aligned features are concatenated and fed to conventional back-end ASR systems obtaining performance improvements. The proposed approaches for AV-VAD and AV-ASR systems are evaluated on large audiovisual corpora, achieving competitive performance under real world scenarios, outperforming conventional audio-based VAD and ASR systems or alternative audiovisual systems proposed by previous studies. Taken collectively, this dissertation has made algorithmic advancements for audiovisual systems, representing novel contributions to the field of multimodal processing.

Speech Recognition Over Digital Channels

Speech Recognition Over Digital Channels Book
Author : Antonio Peinado,Jose Segura
Publisher : John Wiley & Sons
Release : 2006-08-04
ISBN : 0470024011
Language : En, Es, Fr & De

GET BOOK

Book Description :

Automatic speech recognition (ASR) is a very attractive means for human-machine interaction. The degree of maturity reached by speech recognition technologies during recent years allows the development of applications that use them. In particular, ASR shows an enormous potential in mobile environments, where devices such as mobile phones or PDAs are used, and for Internet Protocol (IP) applications. Speech Recognition Over Digital Channels is the first book of its kind to offer a complete system comprehension, addressing the topics of distributed and network-based speech recognition issues and standards, the concepts of speech processing and transmission, and system architectures and robustness. Describes the different client/server architectures for remote speech recognition systems, by means of which the client transmits speech parameters through a digital channel to a remote recognition server Focuses on robustness against both adverse acoustic environments (in the front-end) and bit errors/packet loss Discusses four ETSI standards for distributed speech recognition; the understanding of the standards and the technologies behind them Provides the necessary background for the comprehension of remote speech recognition technologies This book will appeal to a wide-ranging audience: engineers using speech recognition systems, researchers involved in ASR systems and those interested in processing and transmitting speech such as signal processing and communications communities. It will also be of interest to technical experts requiring an understanding of recognition over mobile and IP networks, and postgraduate students working on robust speech processing.

Robust Speech and Bird Song Processing Using Multi band Correlograms and Sparse Representations

Robust Speech and Bird Song Processing Using Multi band Correlograms and Sparse Representations Book
Author : Lee Ngee Tan
Publisher : Unknown
Release : 2014
ISBN : 0987650XXX
Language : En, Es, Fr & De

GET BOOK

Book Description :

This dissertation focuses on algorithms for robust speech and bird song processing. Many applications perform well under ideal signal conditions, e.g. noise-free, full bandwidth, sufficient training data. However, a large degradation in performance is generally observed when the input signal condition deviates from these ideal conditions. This dissertation describes robust algorithms for three applications, namely human-pitch detection, automatic speech recognition, and birdsong phrase classification. In the first application, a noise-robust, multi-band summary correlogram (MBSC)-based pitch detector is proposed. Novel signal processing schemes, which include comb-filter channel selection and subband reliability weighting, are designed to enhance the MBSC's peak at the most likely pitch period. In the second application, a feature enhancement scheme using jointly-sparse reference and estimated soft-mask representations, is developed for noise-robust automatic speech recognition (ASR). Reference and estimated soft-mask exemplar-pairs are extracted from clean and noisy utterance-pairs in the training data. Using a sparsity-based dictionary learning algorithm, dictionary representations are trained from the exemplar-pairs. The sparse linear combination of estimated soft-mask dictionary representations that best approximates the test utterance's estimated soft-mask is applied to the reference soft-mask dictionary to produce an enhanced soft-mask. This enhanced soft-mask is then used to perform noise suppression on the spectrogram from which features for ASR are extracted. In the third application, a simple exemplar-based sparse representation (SR) classifier is evaluated on limited data for birdsong phrase classification and verification. Song recordings of the Cassin's Vireo are used for performance evaluation. This study of the SR classifier for bird phrase classification is inspired by a paper that proposed the SR classifier for face recognition and outlier face detection, and reported good performance with only 7 training images per subject. Algorithmic enhancements are subsequently added to the original SR classification framework to improve the classification accuracy of automatically detected and segmented phrases, and phrases sang by bird individuals that are not found in the training set. These algorithmic enhancements include dynamic time warping (DTW) and frame-based feature normalization prior to SR classification. When the class decisions from DTW and first pass SR classification are different, SR classification is repeated with frequency-bin-normalized spectrographic features to resolve the two conflicting decisions.

New Era for Robust Speech Recognition

New Era for Robust Speech Recognition Book
Author : Shinji Watanabe,Marc Delcroix,Florian Metze,John R. Hershey
Publisher : Springer
Release : 2017-10-30
ISBN : 331964680X
Language : En, Es, Fr & De

GET BOOK

Book Description :

This book covers the state-of-the-art in deep neural-network-based methods for noise robustness in distant speech recognition applications. It provides insights and detailed descriptions of some of the new concepts and key technologies in the field, including novel architectures for speech enhancement, microphone arrays, robust features, acoustic model adaptation, training data augmentation, and training criteria. The contributed chapters also include descriptions of real-world applications, benchmark tools and datasets widely used in the field. This book is intended for researchers and practitioners working in the field of speech processing and recognition who are interested in the latest deep learning techniques for noise robustness. It will also be of interest to graduate students in electrical engineering or computer science, who will find it a useful guide to this field of research.

MFI 96 1996 IEEE SICE RSJ International Conference on Multisensor Fusion and Integration for Intelligent Systems December 8 11 1996 Washington D C U S A

MFI  96  1996 IEEE SICE RSJ International Conference on Multisensor Fusion and Integration for Intelligent Systems  December 8 11  1996  Washington  D C   U S A  Book
Author : Anonim
Publisher : Institute of Electrical & Electronics Engineers(IEEE)
Release : 1996
ISBN : 9780780337008
Language : En, Es, Fr & De

GET BOOK

Book Description :

This work covers Multisensor Fusion and Integration (MFI) technology which has developed and expanded into various applications. Information processing architectures for intelligent systems play an important role in realizing high performance intelligent behaviour.

Ensemble Feature Selection for Multi stream Automatic Speech Recognition

Ensemble Feature Selection for Multi stream Automatic Speech Recognition Book
Author : David Gelbart
Publisher : Unknown
Release : 2008
ISBN : 0987650XXX
Language : En, Es, Fr & De

GET BOOK

Book Description :

Download Ensemble Feature Selection for Multi stream Automatic Speech Recognition book written by David Gelbart, available in PDF, EPUB, and Kindle, or read full book online anywhere and anytime. Compatible with any devices.

Bayesian Estimation Employing a Phase sensitive Observation Model for Noise and Reverberation Robust Automatic Speech Recognition

Bayesian Estimation Employing a Phase sensitive Observation Model for Noise and Reverberation Robust Automatic Speech Recognition Book
Author : Volker Leutnant
Publisher : Unknown
Release : 2016
ISBN : 0987650XXX
Language : En, Es, Fr & De

GET BOOK

Book Description :

Speech recognition technology has been emerging into everyday life. The acceptance of speech recognition systems is, however, still suffering from their lack of robustness w.r.t. acoustic environmental noise and reverberation. This problem is probably most severe when hands-free systems are employed to capture human speech. While allowing the user to move freely without the need of wearing a headset or holding a microphone, performance of hands- free systems is particularly highly sensitive to the acoustic conditions of the environment they are employed in. The reason for this may be found in the increased distance of the speaker to the microphone compared to the use of a headset, which leads to a degradation of the acoustic signal. Since the training of a speech recognizers acoustic model is often carried out with clean speech signals, the signal modification by reverberation and noise results in a mismatch between the statistics of the observed feature vectors at training and testing stage, and thus in an increased word error rate. But even in the case of matched noisy reverberant training the performance deteriorates, since the temporal feature correlations introduced by reverberation violate the conditional independence assumption inherent to hidden Markov model based speech recognition. In this thesis a detailed (statistical) analysis of how reverberation and noise affect the speech signal and eventually the feature vectors passed to the recognizer is carried out to address those issues. The findings lead to the derivation of a novel statistical observation model which relates the features of the noisy reverberant speech signal to those of the underlying clean speech signal and the noise. It is eventually employed in the context of model-based Bayesian feature enhancement with subsequent speech recognition. The derived observation model thereby generalizes both the observation model for noisy speech ... ; eng

Robust Speech Recognition Using Microphone Arrays

Robust Speech Recognition Using Microphone Arrays Book
Author : Iain A. McCowan,Queensland University of Technology
Publisher : Unknown
Release : 2001
ISBN : 0987650XXX
Language : En, Es, Fr & De

GET BOOK

Book Description :

Download Robust Speech Recognition Using Microphone Arrays book written by Iain A. McCowan,Queensland University of Technology, available in PDF, EPUB, and Kindle, or read full book online anywhere and anytime. Compatible with any devices.