![]() |
![]() |
Your cart is empty |
||
Books > Computing & IT > Applications of computing > Audio processing > Speech recognition & synthesis
This work addresses the evaluation of the human and the automatic speaker recognition performances under different channel distortions caused by bandwidth limitation, codecs, and electro-acoustic user interfaces, among other impairments. Its main contribution is the demonstration of the benefits of communication channels of extended bandwidth, together with an insight into how speaker-specific characteristics of speech are preserved through different transmissions. It provides sufficient motivation for considering speaker recognition as a criterion for the migration from narrowband to enhanced bandwidths, such as wideband and super-wideband.
Proactive Spoken Dialogue Interaction in Multi-Party Environments describes spoken dialogue systems that act as independent dialogue partners in the conversation with and between users. The resulting novel characteristics such as proactiveness and multi-party capabilities pose new challenges on the dialogue management component of such a system and require the use and administration of an extensive dialogue history. In order to assist the proactive spoken dialogue systems development, a comprehensive data collection seems mandatory and may be performed in a Wizard-of-Oz environment. Such an environment builds also the appropriate basis for an extensive usability and acceptance evaluation. Proactive Spoken Dialogue Interaction in Multi-Party Environments is a useful reference for students and researchers in speech processing.
Spoken dialog systems have the potential to offer highly intuitive user interfaces, as they allow systems to be controlled using natural language. However, the complexity inherent in natural language dialogs means that careful testing of the system must be carried out from the very beginning of the design process. This book examines how user models can be used to support such early evaluations in two ways: by running simulations of dialogs, and by estimating the quality judgments of users. First, a design environment supporting the creation of dialog flows, the simulation of dialogs, and the analysis of the simulated data is proposed. How the quality of user simulations may be quantified with respect to their suitability for both formative and summative evaluation is then discussed. The remainder of the book is dedicated to the problem of predicting quality judgments of users based on interaction data. New modeling approaches are presented, which process the dialogs as sequences, and which allow knowledge about the judgment behavior of users to be incorporated into predictions. All proposed methods are validated with example evaluation studies.
In Monitoring Adaptive Spoken Dialog Systems, authors Alexander Schmitt and Wolfgang Minker investigate statistical approaches that allow for recognition of negative dialog patterns in Spoken Dialog Systems (SDS). The presented stochastic methods allow a flexible, portable and accurate use. Beginning with the foundations of machine learning and pattern recognition, this monograph examines how frequently users show negative emotions in spoken dialog systems and develop novel approaches to speech-based emotion recognition using hybrid approach to model emotions. The authors make use of statistical methods based on acoustic, linguistic and contextual features to examine the relationship between the interaction flow and the occurrence of emotions using non-acted recordings several thousand real users from commercial and non-commercial SDS. Additionally, the authors present novel statistical methods that spot problems within a dialog based on interaction patterns. The approaches enable future SDS to offer more natural and robust interactions. This work provides insights, lessons and inspiration for future research and development, not only for spoken dialog systems, but for data-driven approaches to human-machine interaction in general.
This book provides a comprehensive overview of the recent advancement in the field of automatic speech recognition with a focus on deep learning models including deep neural networks and many of their variants. This is the first automatic speech recognition book dedicated to the deep learning approach. In addition to the rigorous mathematical treatment of the subject, the book also presents insights and theoretical foundation of a series of highly successful deep learning models.
This book focuses on speech processing in the presence of low-bit rate coding and varying background environments. The methods presented in the book exploit the speech events which are robust in noisy environments. Accurate estimation of these crucial events will be useful for carrying out various speech tasks such as speech recognition, speaker recognition and speech rate modification in mobile environments. The authors provide insights into designing and developing robust methods to process the speech in mobile environments. Covering temporal and spectral enhancement methods to minimize the effect of noise and examining methods and models on speech and speaker recognition applications in mobile environments.
This book describes the basic principles underlying the generation, coding and transmission of speech and audio signals and reveals the latest advances in this area. Waveform coding and parametric coding of speech are described and the fundamental principles behind these methods are delineated. Examples of speech coding standards in use today and their practical implementation are discussed. The principles underlying speech enhancement and speech recognition are also presented, along with the latest recent advances in these areas.
Current speech recognition systems are based on speaker independent speech models and suffer from inter-speaker variations in speech signal characteristics. This work develops an integrated approach for speech and speaker recognition in order to gain space for self-learning opportunities of the system. This work introduces a reliable speaker identification which enables the speech recognizer to create robust speaker dependent models In addition, this book gives a new approach to solve the reverse problem, how to improve speech recognition if speakers can be recognized. The speaker identification enables the speaker adaptation to adapt to different speakers which results in an optimal long-term adaptation.
The accurate determination of the speech spectrum, particularly for short frames, is commonly pursued in diverse areas including speech processing, recognition, and acoustic phonetics. With this book the author makes the subject of spectrum analysis understandable to a wide audience, including those with a solid background in general signal processing and those without such background. In keeping with these goals, this is not a book that replaces or attempts to cover the material found in a general signal processing textbook. Some essential signal processing concepts are presented in the first chapter, but even there the concepts are presented in a generally understandable fashion as far as is possible. Throughout the book, the focus is on applications to speech analysis; mathematical theory is provided for completeness, but these developments are set off in boxes for the benefit of those readers with sufficient background. Other readers may proceed through the main text, where the key results and applications will be presented in general heuristic terms, and illustrated with software routines and practical "show-and-tell" discussions of the results. At some points, the book refers to and uses the implementations in the Praat speech analysis software package, which has the advantages that it is used by many scientists around the world, and it is free and open source software. At other points, special software routines have been developed and made available to complement the book, and these are provided in the Matlab programming language. If the reader has the basic Matlab package, he/she will be able to immediately implement the programs in that platform---no extra "toolboxes" are required.
In this work, the authors present a fully statistical approach to model non--native speakers' pronunciation. Second-language speakers pronounce words in multiple different ways compared to the native speakers. Those deviations, may it be phoneme substitutions, deletions or insertions, can be modelled automatically with the new method presented here. The methods is based on a discrete hidden Markov model as a word pronunciation model, initialized on a standard pronunciation dictionary. The implementation and functionality of the methodology has been proven and verified with a test set of non-native English in the regarding accent. The book is written for researchers with a professional interest in phonetics and automatic speech and speaker recognition.
This book constitutes the refereed proceedings of the 16th International Conference on Text, Speech and Dialogue, TSD 2013, held in Pilsen, Czech Republic, in September 2013. The 65 papers presented together with 5 invited talks were carefully reviewed and selected from 148 submissions. The main topics of this year's conference was corpora, texts and transcription, speech analysis, recognition and synthesis, and their intertwining within NL dialogue systems. The topics also included speech recognition, corpora and language resources, speech and spoken language generation, tagging, classification and parsing of text and speech, semantic processing of text and speech, integrating applications of text and speech processing, as well as automatic dialogue systems, and multimodal techniques and modelling.
Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.
"Emotion Recognition Using Speech Features" provides coverage of emotion-specific features present in speech. The author also discusses suitable models for capturing emotion-specific information for distinguishing different emotions. The content of this book is important for designing and developing natural and sophisticated speech systems. In this Brief, Drs. Rao and Koolagudi lead a discussion of how emotion-specific information is embedded in speech and how to acquire emotion-specific knowledge using appropriate statistical models. Additionally, the authors provide information about exploiting multiple evidences derived from various features and models. The acquired emotion-specific knowledge is useful for synthesizing emotions. Features includes discussion of: * Global and local prosodic features at syllable, word and phrase levels, helpful for capturing emotion-discriminative information; * Exploiting complementary evidences obtained from excitation sources, vocal tract systems and prosodic features in order to enhance the emotion recognition performance; * Proposed multi-stage and hybrid models for improving the emotion recognition performance. This brief is for researchers working in areas related to speech-based products such as mobile phone manufacturing companies, automobile companies, and entertainment products as well as researchers involved in basic and applied speech processing research.
This volume constitutes the refereed proceedings of the Spanish Conference, IberSPEECH 2012: Joint VII "Jornadas en Tecnologia del Habla" and III Iberian SLTech Workshop, held in Madrid, Spain, in November 21-23, 2012. The 29 revised papers were carefully reviewed and selected from 80 submissions. The papers are organized in topical sections on speaker characterization and recognition; audio and speech segmentation; pathology detection and speech characterization; dialogue and multimodal systems; robustness in automatic speech recognition; applications of speech and language technologies.
"Adaptive Digital Filters" presents an important discipline applied
to the domain of speech processing. The book first makes the reader
acquainted with the basic terms of filtering and adaptive
filtering, before introducing the field of advanced modern
algorithms, some of which are contributed by the authors
themselves. Working in the field of adaptive signal processing
requires the use of complex mathematical tools. The book offers a
detailed presentation of the mathematical models that is clear and
consistent, an approach that allows everyone with a college level
of mathematics knowledge to successfully follow the mathematical
derivations and descriptions of algorithms.
"Automatic Speech Signal Analysis for Clinical Diagnosis and
Assessment of Speech Disorders "provides a survey of methods
designed to aid clinicians in the diagnosis and monitoring of
speech disorders such as dysarthria and dyspraxia, with an emphasis
on the signal processing techniques, statistical validity of the
results presented in the literature, and the appropriateness of
methodsthat do not requirespecialized equipment, rigorously
controlled recording procedures or highly skilled personnel to
interpret results.
Modern communication devices, such as mobile phones, teleconferencing systems, VoIP, etc., are often used in noisy and reverberant environments. Therefore, signals picked up by the microphones from telecommunication devices contain not only the desired near-end speech signal, but also interferences such as the background noise, far-end echoes produced by the loudspeaker, and reverberations of the desired source. These interferences degrade the fidelity and intelligibility of the near-end speech in human-to-human telecommunications and decrease the performance of human-to-machine interfaces (i.e., automatic speech recognition systems). The proposed book deals with the fundamental challenges of speech processing in modern communication, including speech enhancement, interference suppression, acoustic echo cancellation, relative transfer function identification, source localization, dereverberation, and beamforming in reverberant environments. Enhancement of speech signals is necessary whenever the source signal is corrupted by noise. In highly non-stationary noise environments, noise transients, and interferences may be extremely annoying. Acoustic echo cancellation is used to eliminate the acoustic coupling between the loudspeaker and the microphone of a communication device. Identification of the relative transfer function between sensors in response to a desired speech signal enables to derive a reference noise signal for suppressing directional or coherent noise sources. Source localization, dereverberation, and beamforming in reverberant environments further enable to increase the intelligibility of the near-end speech signal.
This thesis discusses the privacy issues in speech-based applications such as biometric authentication, surveillance, and external speech processing services. Author Manas A. Pathak presents solutions for privacy-preserving speech processing applications such as speaker verification, speaker identification and speech recognition. The author also introduces some of the tools from cryptography and machine learning and current techniques for improving the efficiency and scalability of the presented solutions. Experiments with prototype implementations of the solutions for execution time and accuracy on standardized speech datasets are also included in the text. Using the framework proposed may now make it possible for a surveillance agency to listen for a known terrorist without being able to hear conversation from non-targeted, innocent civilians."
Data driven methods have long been used in Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) synthesis and have more recently been introduced for dialogue management, spoken language understanding, and Natural Language Generation. Machine learning is now present "end-to-end" in Spoken Dialogue Systems (SDS). However, these techniques require data collection and annotation campaigns, which can be time-consuming and expensive, as well as dataset expansion by simulation. In this book, we provide an overview of the current state of the field and of recent advances, with a specific focus on adaptivity.
This book constitutes the proceedings of the First Indo-Japanese conference on Perception and Machine Intelligence, PerMIn 2012, held in Kolkata, India, in January 2012. The 41 papers, presented together with 1 keynote paper and 3 plenary papers, were carefully reviewed and selected for inclusion in the book. The papers are organized in topical sections named perception; human-computer interaction; e-nose and e-tongue; machine intelligence and application; image and video processing; and speech and signal processing.
Spoken dialog systems have the potential to offer highly intuitive user interfaces, as they allow systems to be controlled using natural language. However, the complexity inherent in natural language dialogs means that careful testing of the system must be carried out from the very beginning of the design process. This book examines how user models can be used to support such early evaluations in two ways: by running simulations of dialogs, and by estimating the quality judgments of users. First, a design environment supporting the creation of dialog flows, the simulation of dialogs, and the analysis of the simulated data is proposed. How the quality of user simulations may be quantified with respect to their suitability for both formative and summative evaluation is then discussed. The remainder of the book is dedicated to the problem of predicting quality judgments of users based on interaction data. New modeling approaches are presented, which process the dialogs as sequences, and which allow knowledge about the judgment behavior of users to be incorporated into predictions. All proposed methods are validated with example evaluation studies.
Dialect Accent Features for Establishing Speaker Identity: A Case Study discusses the subject of forensic voice identification and speaker profiling. Specifically focusing on speaker profiling and using dialects of the Hindi language, widely used in India, the authors have contributed to the body of research on speaker identification by using accent feature as the discriminating factor. This case study contributes to the understanding of the speaker identification process in a situation where unknown speech samples are in different language/dialect than the recording of a suspect. The authors' data establishes that vowel quality, quantity, intonation and tone of a speaker as compared to Khariboli (standard Hindi) could be the potential features for identification of dialect accent.
The author covers the fundamentals of both information and communication security including current developments in some of the most critical areas of automatic speech recognition. Included are topics on speech watermarking, speech encryption, steganography, multilevel security systems comprising speaker identification, real transmission of watermarked or encrypted speech signals, and more. The book is especially useful for information security specialist, government security analysts, speech development professionals, and for individuals involved in the study and research of speech recognition at advanced levels.
Novel Techniques for Dialectal Arabic Speech describes approaches to improve automatic speech recognition for dialectal Arabic. Since speech resources for dialectal Arabic speech recognition are very sparse, the authors describe how existing Modern Standard Arabic (MSA) speech data can be applied to dialectal Arabic speech recognition, while assuming that MSA is always a second language for all Arabic speakers. In this book, Egyptian Colloquial Arabic (ECA) has been chosen as a typical Arabic dialect. ECA is the first ranked Arabic dialect in terms of number of speakers, and a high quality ECA speech corpus with accurate phonetic transcription has been collected. MSA acoustic models were trained using news broadcast speech. In order to cross-lingually use MSA in dialectal Arabic speech recognition, the authors have normalized the phoneme sets for MSA and ECA. After this normalization, they have applied state-of-the-art acoustic model adaptation techniques like Maximum Likelihood Linear Regression (MLLR) and Maximum A-Posteriori (MAP) to adapt existing phonemic MSA acoustic models with a small amount of dialectal ECA speech data. Speech recognition results indicate a significant increase in recognition accuracy compared to a baseline model trained with only ECA data.
The advances in computing and networking have sparked an enormous interest in deploying automatic speech recognition on mobile devices and over communication networks. This book brings together academic researchers and industrial practitioners to address the issues in this emerging realm and presents the reader with a comprehensive introduction to the subject of speech recognition in devices and networks. It covers network, distributed and embedded speech recognition systems. |
![]() ![]() You may like...
The Art of Biblical Interpretation…
Heidi J. Hornik, Ian Boxall, …
Hardcover
R1,711
Discovery Miles 17 110
Sermons - For Beginners, Lay Speakers…
Melissa Weeks-Richardson
Hardcover
|