![]() |
![]() |
Your cart is empty |
||
Books > Language & Literature > Language & linguistics > Computational linguistics
This book provides an overview of various techniques for the alignment of bitexts. It describes general concepts and strategies that can be applied to map corresponding parts in parallel documents on various levels of granularity. Bitexts are valuable linguistic resources for many different research fields and practical applications. The most predominant application is machine translation, in particular, statistical machine translation. However, there are various other threads that can be followed which may be supported by the rich linguistic knowledge implicitly stored in parallel resources. Bitexts have been explored in lexicography, word sense disambiguation, terminology extraction, computer-aided language learning and translation studies to name just a few. The book covers the essential tasks that have to be carried out when building parallel corpora starting from the collection of translated documents up to sub-sentential alignments. In particular, it describes various approaches to document alignment, sentence alignment, word alignment and tree structure alignment. It also includes a list of resources and a comprehensive review of the literature on alignment techniques. Table of Contents: Introduction / Basic Concepts and Terminology / Building Parallel Corpora / Sentence Alignment / Word Alignment / Phrase and Tree Alignment / Concluding Remarks
A major part of natural language processing now depends on the use of text data to build linguistic analyzers. We consider statistical, computational approaches to modeling linguistic structure. We seek to unify across many approaches and many kinds of linguistic structures. Assuming a basic understanding of natural language processing and/or machine learning, we seek to bridge the gap between the two fields. Approaches to decoding (i.e., carrying out linguistic structure prediction) and supervised and unsupervised learning of models that predict discrete structures as outputs are the focus. We also survey natural language processing problems to which these methods are being applied, and we address related topics in probabilistic inference, optimization, and experimental methodology. Table of Contents: Representations and Linguistic Data / Decoding: Making Predictions / Learning Structure from Annotated Data / Learning Structure from Incomplete Data / Beyond Decoding: Inference
Search for information is no longer exclusively limited within the native language of the user, but is more and more extended to other languages. This gives rise to the problem of cross-language information retrieval (CLIR), whose goal is to find relevant information written in a different language to a query. In addition to the problems of monolingual information retrieval (IR), translation is the key problem in CLIR: one should translate either the query or the documents from a language to another. However, this translation problem is not identical to full-text machine translation (MT): the goal is not to produce a human-readable translation, but a translation suitable for finding relevant documents. Specific translation methods are thus required. The goal of this book is to provide a comprehensive description of the specific problems arising in CLIR, the solutions proposed in this area, as well as the remaining problems. The book starts with a general description of the monolingual IR and CLIR problems. Different classes of approaches to translation are then presented: approaches using an MT system, dictionary-based translation and approaches based on parallel and comparable corpora. In addition, the typical retrieval effectiveness using different approaches is compared. It will be shown that translation approaches specifically designed for CLIR can rival and outperform high-quality MT systems. Finally, the book offers a look into the future that draws a strong parallel between query expansion in monolingual IR and query translation in CLIR, suggesting that many approaches developed in monolingual IR can be adapted to CLIR. The book can be used as an introduction to CLIR. Advanced readers can also find more technical details and discussions about the remaining research challenges in the future. It is suitable to new researchers who intend to carry out research on CLIR. Table of Contents: Preface / Introduction / Using Manually Constructed Translation Systems and Resources for CLIR / Translation Based on Parallel and Comparable Corpora / Other Methods to Improve CLIR / A Look into the Future: Toward a Unified View of Monolingual IR and CLIR? / References / Author Biography
This book describes the framework of inductive dependency parsing, a methodology for robust and efficient syntactic analysis of unrestricted natural language text. Coverage includes a theoretical analysis of central models and algorithms, and an empirical evaluation of memory-based dependency parsing using data from Swedish and English. A one-stop reference to dependency-based parsing of natural language, it will interest researchers and system developers in language technology, and is suitable for graduate or advanced undergraduate courses.
This two-volume set, consisting of LNCS 6608 and LNCS 6609, constitutes the thoroughly refereed proceedings of the 12th International Conference on Computer Linguistics and Intelligent Processing, held in Tokyo, Japan, in February 2011. The 74 full papers, presented together with 4 invited papers, were carefully reviewed and selected from 298 submissions. The contents have been ordered according to the following topical sections: lexical resources; syntax and parsing; part-of-speech tagging and morphology; word sense disambiguation; semantics and discourse; opinion mining and sentiment detection; text generation; machine translation and multilingualism; information extraction and information retrieval; text categorization and classification; summarization and recognizing textual entailment; authoring aid, error correction, and style analysis; and speech recognition and generation.
Our world is being revolutionized by data-driven methods: access to large amounts of data has generated new insights and opened exciting new opportunities in commerce, science, and computing applications. Processing the enormous quantities of data necessary for these advances requires large clusters, making distributed computing paradigms more crucial than ever. MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. The programming model provides an easy-to-understand abstraction for designing scalable algorithms, while the execution framework transparently handles many system-level details, ranging from scheduling to synchronization to fault tolerance. This book focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning. We introduce the notion of MapReduce design patterns, which represent general reusable solutions to commonly occurring problems across a variety of problem domains. This book not only intends to help the reader "think in MapReduce", but also discusses limitations of the programming model as well. Table of Contents: Introduction / MapReduce Basics / MapReduce Algorithm Design / Inverted Indexing for Text Retrieval / Graph Algorithms / EM Algorithms for Text Processing / Closing Remarks
The subject of the present inquiry is the approach-to-the-truth research, which started with the publication of Sir Karl Popper's Conjectures and Refutations. In the decade before this publication, Popper fiercely attacked the ideas of Rudolf Carnap about confirmation and induction; and ten years later, in the famous tenth chapter of Conjectures he introduced his own ideas about scientific progress and verisimilitude (cf. the quotation on page 6). Abhorring inductivism for its apprecia tion of logical weakness rather than strength, Popper tried to show that fallibilism could serve the purpose of approach to the truth. To substantiate this idea he formalized the common sense intuition about preferences, that is: B is to be preferred to A if B has more advantages andfewer drawbacks than A. In 1974, however, David Millerand Pavel Tichy proved that Popper's formal explication could not be used to compare false theories. Subsequently, many researchers proposed alternatives or tried to improve Popper's original definition."
Many approaches have already been proposed for classification and modeling in the literature. These approaches are usually based on mathematical mod els. Computer systems can easily handle mathematical models even when they are complicated and nonlinear (e.g., neural networks). On the other hand, it is not always easy for human users to intuitively understand mathe matical models even when they are simple and linear. This is because human information processing is based mainly on linguistic knowledge while com puter systems are designed to handle symbolic and numerical information. A large part of our daily communication is based on words. We learn from various media such as books, newspapers, magazines, TV, and the Inter net through words. We also communicate with others through words. While words play a central role in human information processing, linguistic models are not often used in the fields of classification and modeling. If there is no goal other than the maximization of accuracy in classification and model ing, mathematical models may always be preferred to linguistic models. On the other hand, linguistic models may be chosen if emphasis is placed on interpretability."
From a linguistic perspective, it is quanti?cation which makes all the di?- ence between "having no dollars" and "having a lot of dollars." And it is the meaning of the quanti?er "most" which eventually decides if "Most Ame- cans voted Kerry" or "Most Americans voted Bush" (as it stands). Natural language(NL)quanti?erslike"all,""almostall,""many"etc. serveanimp- tant purpose because they permit us to speak about properties of collections, as opposed to describing speci?c individuals only; in technical terms, qu- ti?ers are a 'second-order' construct. Thus the quantifying statement "Most Americans voted Bush" asserts that the set of voters of George W. Bush c- prisesthemajorityofAmericans, while"Bushsneezes"onlytellsussomething about a speci?c individual. By describing collections rather than individuals, quanti?ers extend the expressive power of natural languages far beyond that of propositional logic and make them a universal communication medium. Hence language heavily depends on quantifying constructions. These often involve fuzzy concepts like "tall," and they frequently refer to fuzzy quantities in agreement like "about ten," "almost all," "many" etc. In order to exploit this expressive power and make fuzzy quanti?cation available to technical applications, a number of proposals have been made how to model fuzzy quanti?ers in the framework of fuzzy set theory. These approaches usually reduce fuzzy quanti?cation to a comparison of scalar or fuzzy cardinalities 197, 132].
The present volume contributes to the growing body of work on sentence pro- cessing. The goal of work in this area is to construct a theory of human sen- tence processing in general, i.e., given a grammar of some particular language and a general characterization of the human sentence processing mechanisms, the particular processing system for that language should follows automati- cally. At least that's the goal. What is needed in order to pursue this goal is systematic in-depth analysis of the sentence routines of individual languages. With respect to German, that is precisely what the present volume delivers. In sharp contrast to a decade ago, the study of German sentence process- ing is flourishing today. Four lively and active centers have emerged. The University of Freiburg is one prominent center, represented in the present vol- ume by the editors Barbara Hemforth and Lars Konieczny (who was at Freiburg for many years) as well as by Christoph Scheepers (who is now in Glasgow) and Christoph Holscher. The University of Potsdam has recently begun an interdisciplinary collaboration on sentence processing involving Matthias Schlesewsky, Gisbert Fanselow, Reinhold Kliegl and Josef Krems. The University of Jena has several investigators trained in linguistics and interested in language processing. That group is represented here by Markus Bader and also includes his colleagues Michael Meng and Josef Bayer.
"In case you are considering to adopt this book for courses with over 50 students, please contact ""[email protected]"" for more information. "
The last three chapters of the book provide an introduction to type theory (higher-order logic). It is shown how various mathematical concepts can be formalized in this very expressive formal language. This expressive notation facilitates proofs of the classical incompleteness and undecidability theorems which are very elegant and easy to understand. The discussion of semantics makes clear the important distinction between standard and nonstandard models which is so important in understanding puzzling phenomena such as the incompleteness theorems and Skolem's Paradox about countable models of set theory. Some of the numerous exercises require giving formal proofs. A computer program called ETPS which is available from the web facilitates doing and checking such exercises. "Audience: " This volume will be of interest to mathematicians, computer scientists, and philosophers in universities, as well as to computer scientists in industry who wish to use higher-order logic for hardware and software specification and verification. "
One of the aims of Natural Language Processing is to facilitate .the use of computers by allowing their users to communicate in natural language. There are two important aspects to person-machine communication: understanding and generating. While natural language understanding has been a major focus of research, natural language generation is a relatively new and increasingly active field of research. This book presents an overview of the state of the art in natural language generation, describing both new results and directions for new research. The principal emphasis of natural language generation is not only to facili tate the use of computers but also to develop a computational theory of human language ability. In doing so, it is a tool for extending, clarifying and verifying theories that have been put forth in linguistics, psychology and sociology about how people communicate. A natural language generator will typically have access to a large body of knowledge from which to select information to present to users as well as numer of expressing it. Generating a text can thus be seen as a problem of ous ways decision-making under multiple constraints: constraints from the propositional knowledge at hand, from the linguistic tools available, from the communicative goals and intentions to be achieved, from the audience the text is aimed at and from the situation and past discourse. Researchers in generation try to identify the factors involved in this process and determine how best to represent the factors and their dependencies."
Computational Models of Mixed-Initiative Interaction brings together research that spans several disciplines related to artificial intelligence, including natural language processing, information retrieval, machine learning, planning, and computer-aided instruction, to account for the role that mixed initiative plays in the design of intelligent systems. The ten contributions address the single issue of how control of an interaction should be managed when abilities needed to solve a problem are distributed among collaborating agents. Managing control of an interaction among humans and computers to gather and assemble knowledge and expertise is a major challenge that must be met to develop machines that effectively collaborate with humans. This is the first collection to specifically address this issue.
In this book we address robustness issues at the speech recognition and natural language parsing levels, with a focus on feature extraction and noise robust recognition, adaptive systems, language modeling, parsing, and natural language understanding. This book attempts to give a clear overview of the main technologies used in language and speech processing, along with an extensive bibliography to enable topics of interest to be pursued further. It also brings together speech and language technologies often considered separately. Robustness in Language and Speech Technology serves as a valuable reference and although not intended as a formal university textbook, contains some material that can be used for a course at the graduate or undergraduate level.
This book offers a state-of-the-art survey of methods and techniques for structuring, acquiring and maintaining lexical resources for speech and language processing. The first chapter provides a broad survey of the field of computational lexicography, introducing most of the issues, terms and topics which are addressed in more detail in the rest of the book. The next two chapters focus on the structure and the content of man-made lexicons, concentrating respectively on (morpho-)syntactic and (morpho-)phonological information. Both chapters adopt a declarative constraint-based methodology and pay ample attention to the various ways in which lexical generalizations can be formalized and exploited to enhance the consistency and to reduce the redundancy of lexicons. A complementary perspective is offered in the next two chapters, which present techniques for automatically deriving lexical resources from text corpora. These chapters adopt an inductive data-oriented methodology and focus also on methods for tokenization, lemmatization and shallow parsing. The next three chapters focus on speech applications, more specifically on the organization of speech data bases, and on the use of lexica in speech synthesis and speech recognition. The last chapter takes a psycholinguistic perspective and addresses the relation between storage and computation in the mental lexicon. The relevance of these topics for speech and language processing is obvious, for since NLP systems need large lexica in order to achieve reasonable coverage, and since the construction and maintenance of large-size lexical resources is a complex and costly task, it is of crucial importance for those who design or build such systems to be aware of the latest developments in this fast-moving field. The intended audience for this book includes advanced students and professional scientists working in the areas of computational linguistics and language and speech technology.
Corpus-based methods will be found at the heart of many language and speech processing systems. This book provides an in-depth introduction to these technologies through chapters describing basic statistical modeling techniques for language and speech, the use of Hidden Markov Models in continuous speech recognition, the development of dialogue systems, part-of-speech tagging and partial parsing, data-oriented parsing and n-gram language modeling. The book attempts to give both a clear overview of the main technologies used in language and speech processing, along with sufficient mathematics to understand the underlying principles. There is also an extensive bibliography to enable topics of interest to be pursued further. Overall, we believe that the book will give newcomers a solid introduction to the field and it will give existing practitioners a concise review of the principal technologies used in state-of-the-art language and speech processing systems. Corpus-Based Methods in Language and Speech Processing is an initiative of ELSNET, the European Network in Language and Speech. In its activities, ELSNET attaches great importance to the integration of language and speech, both in research and in education. The need for and the potential of this integration are well demonstrated by this publication.
This book is aimed at providing an overview of several aspects of semantic role labeling. Chapter 1 begins with linguistic background on the definition of semantic roles and the controversies surrounding them. Chapter 2 describes how the theories have led to structured lexicons such as FrameNet, VerbNet and the PropBank Frame Files that in turn provide the basis for large scale semantic annotation of corpora. This data has facilitated the development of automatic semantic role labeling systems based on supervised machine learning techniques. Chapter 3 presents the general principles of applying both supervised and unsupervised machine learning to this task, with a description of the standard stages and feature choices, as well as giving details of several specific systems. Recent advances include the use of joint inference to take advantage of context sensitivities, and attempts to improve performance by closer integration of the syntactic parsing task with semantic role labeling. Chapter 3 also discusses the impact the granularity of the semantic roles has on system performance. Having outlined the basic approach with respect to English, Chapter 4 goes on to discuss applying the same techniques to other languages, using Chinese as the primary example. Although substantial training data is available for Chinese, this is not the case for many other languages, and techniques for projecting English role labels onto parallel corpora are also presented. Table of Contents: Preface / Semantic Roles / Available Lexical Resources / Machine Learning for Semantic Role Labeling / A Cross-Lingual Perspective / Summary
Yorick Wilks is a central figure in the fields of Natural Language Processing and Artificial Intelligence. This book celebrates Wilks s career from the perspective of his peers in original chapters each of which analyses an aspect of his work and links it to current thinking in that area. This volume forms a two-part set together with Words and Intelligence I: Selected Works by Yorick Wilks, by the same editors."
This is the first comprehensive overview of computational approaches to Arabic morphology. The subtitle aims to reflect that widely different computational approaches to the Arabic morphological system have been proposed. The book provides a showcase of the most advanced language technologies applied to one of the most vexing problems in linguistics. It covers knowledge-based and empirical-based approaches.
This book teaches the principles of natural language processing and covers linguistics issues. It also details the language-processing functions involved, including part-of-speech tagging using rules and stochastic techniques. A key feature of the book is the author's hands-on approach throughout, with extensive exercises, sample code in Prolog and Perl, and a detailed introduction to Prolog. The book is suitable for researchers and students of natural language processing and computational linguistics.
Human language acquisition has been studied for centuries, but using computational modeling for such studies is a relatively recent trend. However, computational approaches to language learning have become increasingly popular, mainly due to advances in developing machine learning techniques, and the availability of vast collections of experimental data on child language learning and child-adult interaction. Many of the existing computational models attempt to study the complex task of learning a language under cognitive plausibility criteria (such as memory and processing limitations that humans face), and to explain the developmental stages observed in children. By simulating the process of child language learning, computational models can show us which linguistic representations are learnable from the input that children have access to, and which mechanisms yield the same patterns of behaviour that children exhibit during this process. In doing so, computational modeling provides insight into the plausible mechanisms involved in human language acquisition, and inspires the development of better language models and techniques. This book provides an overview of the main research questions in the field of human language acquisition. It reviews the most commonly used computational frameworks, methodologies and resources for modeling child language learning, and the evaluation techniques used for assessing these computational models. The book is aimed at cognitive scientists who want to become familiar with the available computational methods for investigating problems related to human language acquisition, as well as computational linguists who are interested in applying their skills to the study of child language acquisition. Different aspects of language learning are discussed in separate chapters, including the acquisition of the individual words, the general regularities which govern word and sentence form, and the associations between form and meaning. For each of these aspects, the challenges of the task are discussed and the relevant empirical findings on children are summarized. Furthermore, the existing computational models that attempt to simulate the task under study are reviewed, and a number of case studies are presented. Table of Contents: Overview / Computational Models of Language Learning / Learning Words / Putting Words Together / Form--Meaning Associations / Final Thoughts
Automatic Text Categorization and Clustering are becoming more and more important as the amount of text in electronic format grows and the access to it becomes more necessary and widespread. Well known applications are spam filtering and web search, but a large number of everyday uses exist (intelligent web search, data mining, law enforcement, etc.) Currently, researchers are employing many intelligent techniques for text categorization and clustering, ranging from support vector machines and neural networks to Bayesian inference and algebraic methods, such as Latent Semantic Indexing. This volume offers a wide spectrum of research work developed for intelligent text categorization and clustering. In the following, we give a brief introduction of the chapters that are included in this book.
ThisbookdiscusseshowTypeLogicalGrammarcanbemodi?edinsuch awaythatasystematictreatmentofanaphoraphenomenabecomesp- sible without giving up the general architecture of this framework. By Type Logical Grammar, I mean the version of Categorial Grammar that arose out of the work of Lambek, 1958 and Lambek, 1961. There Ca- gorial types are analyzed as formulae of a logical calculus. In particular, the Categorial slashes are interpreted as forms of constructive impli- tion in the sense of Intuitionistic Logic. Such a theory of grammar is per se attractive for a formal linguist who is interested in the interplay between formal logic and the structure of language. What makes L- bekstyleCategorialGrammarevenmoreexcitingisthefactthat(asvan Benthem,1983pointsout)theCurry-Howardcorrespondence-acentral part of mathematical proof theory which establishes a deep connection betweenconstructivelogicsandthe?-calculus-suppliesthetypelogical syntax with an extremely elegant and independently motivated interface to model-theoretic semantics. Prima facie, anaphora does not 't very well into the Categorial picture of the syntax-semantics interface. The Curry-Howard based composition of meaning operates in a local way, and meaning ass- bly is linear, i.e., every piece of lexical meaning is used exactly once. Anaphora, on the other hand, is in principle unbounded, and it involves by de?nition the multiple use of certain semantic resources. The latter problem has been tackled by several Categorial grammarians by ass- ing su?ciently complex lexical meanings for anaphoric expressions, but the locality problem is not easy to solve in a purely lexical way.
Computational semantics is the art and science of computing meaning in natural language. The meaning of a sentence is derived from the meanings of the individual words in it, and this process can be made so precise that it can be implemented on a computer. Designed for students of linguistics, computer science, logic and philosophy, this comprehensive text shows how to compute meaning using the functional programming language Haskell. It deals with both denotational meaning (where meaning comes from knowing the conditions of truth in situations), and operational meaning (where meaning is an instruction for performing cognitive action). Including a discussion of recent developments in logic, it will be invaluable to linguistics students wanting to apply logic to their studies, logic students wishing to learn how their subject can be applied to linguistics, and functional programmers interested in natural language processing as a new application area.
Considerable progress has been made in recent years in the development of dialogue systems that support robust and efficient human-machine interaction using spoken language. Spoken dialogue technology allows various interactive applications to be built and used for practical purposes, and research focuses on issues that aim to increase the system's communicative competence by including aspects of error correction, cooperation, multimodality, and adaptation in context. This book gives a comprehensive view of state-of-the-art techniques that are used to build spoken dialogue systems. It provides an overview of the basic issues such as system architectures, various dialogue management methods, system evaluation, and also surveys advanced topics concerning extensions of the basic model to more conversational setups. The goal of the book is to provide an introduction to the methods, problems, and solutions that are used in dialogue system development and evaluation. It presents dialogue modelling and system development issues relevant in both academic and industrial environments and also discusses requirements and challenges for advanced interaction management and future research. Table of Contents: Preface / Introduction to Spoken Dialogue Systems / Dialogue Management / Error Handling / Case Studies: Advanced Approaches to Dialogue Management / Advanced Issues / Methodologies and Practices of Evaluation / Future Directions / References / Author Biographies |
![]() ![]() You may like...
Foundation Models for Natural Language…
Gerhard PaaĆ, Sven Giesselbach
Hardcover
|