![]() |
![]() |
Your cart is empty |
||
Books > Language & Literature > Language & linguistics > Computational linguistics
Meaning is a fundamental concept in Natural Language Processing (NLP), in the tasks of both Natural Language Understanding (NLU) and Natural Language Generation (NLG). This is because the aims of these fields are to build systems that understand what people mean when they speak or write, and that can produce linguistic strings that successfully express to people the intended content. In order for NLP to scale beyond partial, task-specific solutions, researchers in these fields must be informed by what is known about how humans use language to express and understand communicative intents. The purpose of this book is to present a selection of useful information about semantics and pragmatics, as understood in linguistics, in a way that's accessible to and useful for NLP practitioners with minimal (or even no) prior training in linguistics.
First published a decade ago, "CJKV Information Processing" quickly became the unsurpassed source of information on processing text in Chinese, Japanese, Korean, and Vietnamese. The book has now been thoroughly updated to provide web and application developers with the latest techniques and tools for disseminating information directly to audiences in East Asia. This new second edition reflects the considerable impact that Unicode, XML, OpenType, and newer operating systems such as Windows XP, Vista, Mac OS X, and Linux have had on East Asian text processing in recent years. Written by its original author, Ken Lunde, a Senior Computer Scientist in CJKV Type Development at Adobe Systems, this book will help you: learn about CJKV writing systems and scripts, and their transliteration methods; explore recent trends and developments in character sets and encodings, particularly Unicode; examine the world of typography, specifically how CJKV text is laid out on a page; learn information processing techniques, such as code conversion algorithms and how to apply them using different programming languages; process CJKV text using different platforms, text editors, and word processors; become more informed about CJKV dictionaries, dictionary software, and machine translation software and services; and, manage CJKV content and presentation when publishing in print or for the Web And much more. Internationalizing and localizing applications is paramount in today's global market - especially for audiences in East Asia, the fastest-growing segment of the computing world. "CJKV Information Processing" will help you understand how to develop web and other applications effectively in a field that many find difficult to master.
The majority of natural language processing (NLP) is English language processing, and while there is good language technology support for (standard varieties of) English, support for Albanian, Burmese, or Cebuano--and most other languages--remains limited. Being able to bridge this digital divide is important for scientific and democratic reasons but also represents an enormous growth potential. A key challenge for this to happen is learning to align basic meaning-bearing units of different languages. In this book, the authors survey and discuss recent and historical work on supervised and unsupervised learning of such alignments. Specifically, the book focuses on so-called cross-lingual word embeddings. The survey is intended to be systematic, using consistent notation and putting the available methods on comparable form, making it easy to compare wildly different approaches. In so doing, the authors establish previously unreported relations between these methods and are able to present a fast-growing literature in a very compact way. Furthermore, the authors discuss how best to evaluate cross-lingual word embedding methods and survey the resources available for students and researchers interested in this topic.
Language, apart from its cultural and social dimension, has a scientific side that is connected not only to the study of 'grammar' in a more or less traditional sense, but also to disciplines like mathematics, physics, chemistry and biology. This book explores developments in linguistic theory, looking in particular at the theory of generative grammar from the perspective of the natural sciences. It highlights the complex and dynamic nature of language, suggesting that a comprehensive and full understanding of such a species-specific property will only be achieved through interdisciplinary work.
This book introduces audio watermarking methods in transform domain based on matrix decomposition for copyright protection. Chapter 1 discusses the application and properties of digital watermarking. Chapter 2 proposes a blind lifting wavelet transform (LWT) based watermarking method using fast Walsh Hadamard transform (FWHT) and singular value decomposition (SVD) for audio copyright protection. Chapter 3 presents a blind audio watermarking method based on LWT and QR decomposition (QRD) for audio copyright protection. Chapter 4 introduces an audio watermarking algorithm based on FWHT and LU decomposition (LUD). Chapter 5 proposes an audio watermarking method based on LWT and Schur decomposition (SD). Chapter 6 explains in details on the challenges and future trends of audio watermarking in various application areas. Introduces audio watermarking methods for copyright protection and ownership protection; Describes watermarking methods with encryption and decryption that provide excellent performance in terms of imperceptibility, robustness, and data payload; Discusses in details on the challenges and future research direction of audio watermarking in various application areas.
The question of what types of data and evidence can be used is one of the most important topics in linguistics. This book is the first to comprehensively present the methodological problems associated with linguistic data and evidence. Its originality is twofold. First, the authors' approach accounts for a series of unexplained characteristics of linguistic theorising: the uncertainty and diversity of data, the role of evidence in the evaluation of hypotheses, and the problem solving strategies, as well as the emergence and resolution of inconsistencies. Second, the findings are obtained by the application of a new model of plausible argumentation which is also of relevance from a general argumentation theoretical point of view. All concepts and theses are systematically introduced and illustrated by a number of examples from different linguistic theories, and a detailed case-study section shows how the proposed model can be applied to specific linguistic problems.
This book presents a statistical parametric speech synthesis (SPSS) framework for developing a speech synthesis system where the desired speech is generated from the parameters of vocal tract and excitation source. Throughout the book, the authors discuss novel source modeling techniques to enhance the naturalness and overall intelligibility of the SPSS system. This book provides several important methods and models for generating the excitation source parameters for enhancing the overall quality of synthesized speech. The contents of the book are useful for both researchers and system developers. For researchers, the book is useful for knowing the current state-of-the-art excitation source models for SPSS and further refining the source models to incorporate the realistic semantics present in the text. For system developers, the book is useful to integrate the sophisticated excitation source models mentioned to the latest models of mobile/smart phones.
This book applies formal language and automata theory in the context of Tibetan computational linguistics; further, it constructs a Tibetan-spelling formal grammar system that generates a Tibetan-spelling formal language group, and an automata group that can recognize the language group. In addition, it investigates the application technologies of Tibetan-spelling formal language and automata. Given its creative and original approach, the book offers a valuable reference guide for researchers, teachers and graduate students in the field of computational linguistics.
Argumentation mining is an application of natural language processing (NLP) that emerged a few years ago and has recently enjoyed considerable popularity, as demonstrated by a series of international workshops and by a rising number of publications at the major conferences and journals of the field. Its goals are to identify argumentation in text or dialogue; to construct representations of the constellation of claims, supporting and attacking moves (in different levels of detail); and to characterize the patterns of reasoning that appear to license the argumentation. Furthermore, recent work also addresses the difficult tasks of evaluating the persuasiveness and quality of arguments. Some of the linguistic genres that are being studied include legal text, student essays, political discourse and debate, newspaper editorials, scientific writing, and others. The book starts with a discussion of the linguistic perspective, characteristics of argumentative language, and their relationship to certain other notions such as subjectivity. Besides the connection to linguistics, argumentation has for a long time been a topic in Artificial Intelligence, where the focus is on devising adequate representations and reasoning formalisms that capture the properties of argumentative exchange. It is generally very difficult to connect the two realms of reasoning and text analysis, but we are convinced that it should be attempted in the long term, and therefore we also touch upon some fundamentals of reasoning approaches. Then the book turns to its focus, the computational side of mining argumentation in text. We first introduce a number of annotated corpora that have been used in the research. From the NLP perspective, argumentation mining shares subtasks with research fields such as subjectivity and sentiment analysis, semantic relation extraction, and discourse parsing. Therefore, many technical approaches are being borrowed from those (and other) fields. We break argumentation mining into a series of subtasks, starting with the preparatory steps of classifying text as argumentative (or not) and segmenting it into elementary units. Then, central steps are the automatic identification of claims, and finding statements that support or oppose the claim. For certain applications, it is also of interest to compute a full structure of an argumentative constellation of statements. Next, we discuss a few steps that try to 'dig deeper': to infer the underlying reasoning pattern for a textual argument, to reconstruct unstated premises (so-called 'enthymemes'), and to evaluate the quality of the argumentation. We also take a brief look at 'the other side' of mining, i.e., the generation or synthesis of argumentative text. The book finishes with a summary of the argumentation mining tasks, a sketch of potential applications, and a--necessarily subjective--outlook for the field.
Natural language processing (NLP) went through a profound transformation in the mid-1980s when it shifted to make heavy use of corpora and data-driven techniques to analyze language. Since then, the use of statistical techniques in NLP has evolved in several ways. One such example of evolution took place in the late 1990s or early 2000s, when full-fledged Bayesian machinery was introduced to NLP. This Bayesian approach to NLP has come to accommodate various shortcomings in the frequentist approach and to enrich it, especially in the unsupervised setting, where statistical learning is done without target prediction examples. In this book, we cover the methods and algorithms that are needed to fluently read Bayesian learning papers in NLP and to do research in the area. These methods and algorithms are partially borrowed from both machine learning and statistics and are partially developed "in-house" in NLP. We cover inference techniques such as Markov chain Monte Carlo sampling and variational inference, Bayesian estimation, and nonparametric modeling. In response to rapid changes in the field, this second edition of the book includes a new chapter on representation learning and neural networks in the Bayesian context. We also cover fundamental concepts in Bayesian statistics such as prior distributions, conjugacy, and generative modeling. Finally, we review some of the fundamental modeling techniques in NLP, such as grammar modeling, neural networks and representation learning, and their use with Bayesian analysis.
This updated book expands upon prosody for recognition applications of speech processing. It includes importance of prosody for speech processing applications; builds on why prosody needs to be incorporated in speech processing applications; and presents methods for extraction and representation of prosody for applications such as speaker recognition, language recognition and speech recognition. The updated book also includes information on the significance of prosody for emotion recognition and various prosody-based approaches for automatic emotion recognition from speech.
What is the lexicon, what does it contain, and how is it structured? What principles determine the functioning of the lexicon as a component of natural language grammar? What role does lexical information play in linguistic theory? This accessible introduction aims to answer these questions, and explores the relation of the lexicon to grammar as a whole. It includes a critical overview of major theoretical frameworks, and puts forward a unified treatment of lexical structure and design. The text can be used for introductory and advanced courses, and for courses that touch upon different aspects of the lexicon, such as lexical semantics, lexicography, syntax, general linguistics, computational lexicology and ontology design. The book provides students with a set of tools which will enable them to work with lexical data for all kinds of purposes, including an abundance of exercises and in-class activities designed to ensure that students are actively engaged with the content and effectively acquire the necessary knowledge and skills they need.
This book constitutes the refereed proceedings of the 14th China Workshop on Machine Translation, CWMT 2018, held in Wuyishan, China, in October 2018. The 9 papers presented in this volume were carefully reviewed and selected from 17 submissions and focus on all aspects of machine translation, including preprocessing, neural machine translation models, hybrid model, evaluation method, and post-editing.
The two-volume set LNCS 9623 + 9624 constitutes revised selected papers from the CICLing 2016 conference which took place in Konya, Turkey, in April 2016. The total of 89 papers presented in the two volumes was carefully reviewed and selected from 298 submissions. The book also contains 4 invited papers and a memorial paper on Adam Kilgarriff's Legacy to Computational Linguistics. The papers are organized in the following topical sections: Part I: In memoriam of Adam Kilgarriff; general formalisms; embeddings, language modeling, and sequence labeling; lexical resources and terminology extraction; morphology and part-of-speech tagging; syntax and chunking; named entity recognition; word sense disambiguation and anaphora resolution; semantics, discourse, and dialog. Part II: machine translation and multilingualism; sentiment analysis, opinion mining, subjectivity, and social media; text classification and categorization; information extraction; and applications.
This book constitutes the thoroughly refereed proceedings of the Eleventh International Symposium on Natural Language Processing (SNLP-2016), held in Phranakhon Si Ayutthaya, Thailand on February 10-12, 2016. The SNLP promotes research in natural language processing and related fields, and provides a unique opportunity for researchers, professionals and practitioners to discuss various current and advanced issues of interest in NLP. The 2016 symposium was expanded to include the First Workshop in Intelligent Informatics and Smart Technology. Of the 66 high-quality papers accepted, this book presents twelve from the Symposium on Natural Language Processing track and ten from the Workshop in Intelligent Informatics and Smart Technology track (SSAI: Special Session on Artificial Intelligence).
This book presents the consolidated acoustic data for all phones in Standard Colloquial Bengali (SCB), commonly known as Bangla, a Bengali language used by 350 million people in India, Bangladesh, and the Bengali diaspora. The book analyzes the real speech of selected native speakers of the Bangla dialect to ensure that a proper acoustical database is available for the development of speech technologies. The acoustic data presented consists of averages and their normal spread, represented by the standard deviations of necessary acoustic parameters including e.g. formant information for multiple native speakers of both sexes. The study employs two important speech technologies:(1) text to speech synthesis (TTS) and (2) automatic speech recognition (ASR). The procedures, particularly those related to the use of technologies, are described in sufficient detail to enable researchers to use them to create technical acoustic databases for any other Indian dialect. The book offers a unique resource for scientists and industrial practitioners who are interested in the acoustic analysis and processing of Indian dialects to develop similar dialect databases of their own.
Many applications within natural language processing involve performing text-to-text transformations, i.e., given a text in natural language as input, systems are required to produce a version of this text (e.g., a translation), also in natural language, as output. Automatically evaluating the output of such systems is an important component in developing text-to-text applications. Two approaches have been proposed for this problem: (i) to compare the system outputs against one or more reference outputs using string matching-based evaluation metrics and (ii) to build models based on human feedback to predict the quality of system outputs without reference texts. Despite their popularity, reference-based evaluation metrics are faced with the challenge that multiple good (and bad) quality outputs can be produced by text-to-text approaches for the same input. This variation is very hard to capture, even with multiple reference texts. In addition, reference-based metrics cannot be used in production (e.g., online machine translation systems), when systems are expected to produce outputs for any unseen input. In this book, we focus on the second set of metrics, so-called Quality Estimation (QE) metrics, where the goal is to provide an estimate on how good or reliable the texts produced by an application are without access to gold-standard outputs. QE enables different types of evaluation that can target different types of users and applications. Machine learning techniques are used to build QE models with various types of quality labels and explicit features or learnt representations, which can then predict the quality of unseen system outputs. This book describes the topic of QE for text-to-text applications, covering quality labels, features, algorithms, evaluation, uses, and state-of-the-art approaches. It focuses on machine translation as application, since this represents most of the QE work done to date. It also briefly describes QE for several other applications, including text simplification, text summarization, grammatical error correction, and natural language generation.
The two-volume set LNCS 13451 and 13452 constitutes revised selected papers from the CICLing 2019 conference which took place in La Rochelle, France, April 2019.The total of 95 papers presented in the two volumes was carefully reviewed and selected from 335 submissions. The book also contains 3 invited papers. The papers are organized in the following topical sections: General, Information extraction, Information retrieval, Language modeling, Lexical resources, Machine translation, Morphology, sintax, parsing, Name entity recognition, Semantics and text similarity, Sentiment analysis, Speech processing, Text categorization, Text generation, and Text mining.
This book constitutes the proceedings of the 21st International Conference on Developments in Language Theory, DLT 2017, held in Liege, Belgium, in August 2017.The 24 full papers and 6 (abstract of) invited papers were carefully reviewed and selected from 47 submissions. The papers cover the following topics and areas: combinatorial and algebraic properties of words and languages; grammars acceptors and transducers for strings, trees, graphics, arrays; algebraic theories for automata and languages; codes; efficient text algorithms; symbolic dynamics; decision problems; relationships to complexity theory and logic; picture description and analysis, polyominoes and bidimensional patterns; cryptography; concurrency; celluar automata; bio-inspiredcomputing; quantum computing.
Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language. It is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining. In fact, this research has spread outside of computer science to the management sciences and social sciences due to its importance to business and society as a whole. The growing importance of sentiment analysis coincides with the growth of social media such as reviews, forum discussions, blogs, micro-blogs, Twitter, and social networks. For the first time in human history, we now have a huge volume of opinionated data recorded in digital form for analysis. Sentiment analysis systems are being applied in almost every business and social domain because opinions are central to almost all human activities and are key influencers of our behaviors. Our beliefs and perceptions of reality, and the choices we make, are largely conditioned on how others see and evaluate the world. For this reason, when we need to make a decision we often seek out the opinions of others. This is true not only for individuals but also for organizations. This book is a comprehensive introductory and survey text. It covers all important topics and the latest developments in the field with over 400 references. It is suitable for students, researchers and practitioners who are interested in social media analysis in general and sentiment analysis in particular. Lecturers can readily use it in class for courses on natural language processing, social media analysis, text mining, and data mining. Lecture slides are also available online. Table of Contents: Preface / Sentiment Analysis: A Fascinating Problem / The Problem of Sentiment Analysis / Document Sentiment Classification / Sentence Subjectivity and Sentiment Classification / Aspect-Based Sentiment Analysis / Sentiment Lexicon Generation / Opinion Summarization / Analysis of Comparative Opinions / Opinion Search and Retrieval / Opinion Spam Detection / Quality of Reviews / Concluding Remarks / Bibliography / Author Biography
This book is an advanced introduction to semantics that presents this crucial component of human language through the lens of the 'Meaning-Text' theory - an approach that treats linguistic knowledge as a huge inventory of correspondences between thought and speech. Formally, semantics is viewed as an organized set of rules that connect a representation of meaning (Semantic Representation) to a representation of the sentence (Deep-Syntactic Representation). The approach is particularly interesting for computer assisted language learning, natural language processing and computational lexicography, as our linguistic rules easily lend themselves to formalization and computer applications. The model combines abstract theoretical constructions with numerous linguistic descriptions, as well as multiple practice exercises that provide a solid hands-on approach to learning how to describe natural language semantics.
This is the latest addition to a group of handbooks covering the field of morphology, alongside The Oxford Handbook of Case (2008), The Oxford Handbook of Compounding (2009), and The Oxford Handbook of Derivational Morphology (2014). It provides a comprehensive state-of-the-art overview of work on inflection - the expression of grammatical information through changes in word forms. The volume's 24 chapters are written by experts in the field from a variety of theoretical backgrounds, with examples drawn from a wide range of languages. The first part of the handbook covers the fundamental building blocks of inflectional form and content: morphemes, features, and means of exponence. Part 2 focuses on what is arguably the most characteristic property of inflectional systems, paradigmatic structure, and the non-trivial nature of the mapping between function and form. The third part deals with change and variation over time, and the fourth part covers computational issues from a theoretical and practical standpoint. Part 5 addresses psycholinguistic questions relating to language acquisition and neurocognitive disorders. The final part is devoted to sketches of individual inflectional systems, illustrating a range of typological possibilities across a genetically diverse set of languages from Africa, Asia and the Pacific, Australia, Europe, and South America.
This book constitutes the refereed proceedings of the 7th International Conference of the CLEF Initiative, CLEF 2016, held in Toulouse, France, in September 2016. The 10 full papers and 8 short papers presented together with 5 best of the labs papers were carefully reviewed and selected from 36 submissions. In addition to these talks, this volume contains the results of 7 benchmarking labs reporting their year long activities in overview talks and lab sessions. The papers address all aspects of information access in any modality and language and cover a broad rangeof topics in the fields of multilingual and multimodal information access evaluation.
This book presents an overview of speaker recognition technologies with an emphasis on dealing with robustness issues. Firstly, the book gives an overview of speaker recognition, such as the basic system framework, categories under different criteria, performance evaluation and its development history. Secondly, with regard to robustness issues, the book presents three categories, including environment-related issues, speaker-related issues and application-oriented issues. For each category, the book describes the current hot topics, existing technologies, and potential research focuses in the future. The book is a useful reference book and self-learning guide for early researchers working in the field of robust speech recognition.
Thanks to the availability of texts on the Web in recent years, increased knowledge and information have been made available to broader audiences. However, the way in which a text is written-its vocabulary, its syntax-can be difficult to read and understand for many people, especially those with poor literacy, cognitive or linguistic impairment, or those with limited knowledge of the language of the text. Texts containing uncommon words or long and complicated sentences can be difficult to read and understand by people as well as difficult to analyze by machines. Automatic text simplification is the process of transforming a text into another text which, ideally conveying the same message, will be easier to read and understand by a broader audience. The process usually involves the replacement of difficult or unknown phrases with simpler equivalents and the transformation of long and syntactically complex sentences into shorter and less complex ones. Automatic text simplification, a research topic which started 20 years ago, now has taken on a central role in natural language processing research not only because of the interesting challenges it posesses but also because of its social implications. This book presents past and current research in text simplification, exploring key issues including automatic readability assessment, lexical simplification, and syntactic simplification. It also provides a detailed account of machine learning techniques currently used in simplification, describes full systems designed for specific languages and target audiences, and offers available resources for research and development together with text simplification evaluation techniques. |
![]() ![]() You may like...
Corpus Stylistics in Heart of Darkness…
Lorenzo Mastropierro
Hardcover
R4,582
Discovery Miles 45 820
Foundation Models for Natural Language…
Gerhard PaaĆ, Sven Giesselbach
Hardcover
R935
Discovery Miles 9 350
|