![]() |
![]() |
Your cart is empty |
||
Books > Language & Literature > Language & linguistics > Computational linguistics
Automatic Text Categorization and Clustering are becoming more and more important as the amount of text in electronic format grows and the access to it becomes more necessary and widespread. Well known applications are spam filtering and web search, but a large number of everyday uses exist (intelligent web search, data mining, law enforcement, etc.) Currently, researchers are employing many intelligent techniques for text categorization and clustering, ranging from support vector machines and neural networks to Bayesian inference and algebraic methods, such as Latent Semantic Indexing. This volume offers a wide spectrum of research work developed for intelligent text categorization and clustering. In the following, we give a brief introduction of the chapters that are included in this book.
Human language acquisition has been studied for centuries, but using computational modeling for such studies is a relatively recent trend. However, computational approaches to language learning have become increasingly popular, mainly due to advances in developing machine learning techniques, and the availability of vast collections of experimental data on child language learning and child-adult interaction. Many of the existing computational models attempt to study the complex task of learning a language under cognitive plausibility criteria (such as memory and processing limitations that humans face), and to explain the developmental stages observed in children. By simulating the process of child language learning, computational models can show us which linguistic representations are learnable from the input that children have access to, and which mechanisms yield the same patterns of behaviour that children exhibit during this process. In doing so, computational modeling provides insight into the plausible mechanisms involved in human language acquisition, and inspires the development of better language models and techniques. This book provides an overview of the main research questions in the field of human language acquisition. It reviews the most commonly used computational frameworks, methodologies and resources for modeling child language learning, and the evaluation techniques used for assessing these computational models. The book is aimed at cognitive scientists who want to become familiar with the available computational methods for investigating problems related to human language acquisition, as well as computational linguists who are interested in applying their skills to the study of child language acquisition. Different aspects of language learning are discussed in separate chapters, including the acquisition of the individual words, the general regularities which govern word and sentence form, and the associations between form and meaning. For each of these aspects, the challenges of the task are discussed and the relevant empirical findings on children are summarized. Furthermore, the existing computational models that attempt to simulate the task under study are reviewed, and a number of case studies are presented. Table of Contents: Overview / Computational Models of Language Learning / Learning Words / Putting Words Together / Form--Meaning Associations / Final Thoughts
In the late 1990s, AI witnessed an increasing use of the term 'argumentation' within its bounds: in natural language processing, in user interface design, in logic programming and nonmonotonic reasoning, in Al's interface with the legal community, and in the newly emerging field of multi-agent systems. It seemed to me that many of these uses of argumentation were inspired by (of ten inspired) guesswork, and that a great majority of the AI community were unaware that there was a maturing, rich field of research in Argumentation Theory (and Critical Thinking and Informal Logic) that had been steadily re building a scholarly approach to the area over the previous twenty years or so. Argumentation Theory, on its side; was developing theories and approaches that many in the field felt could have a role more widely in research and soci ety, but were for the most part unaware that AI was one of the best candidates for such application."
ThisbookdiscusseshowTypeLogicalGrammarcanbemodi?edinsuch awaythatasystematictreatmentofanaphoraphenomenabecomesp- sible without giving up the general architecture of this framework. By Type Logical Grammar, I mean the version of Categorial Grammar that arose out of the work of Lambek, 1958 and Lambek, 1961. There Ca- gorial types are analyzed as formulae of a logical calculus. In particular, the Categorial slashes are interpreted as forms of constructive impli- tion in the sense of Intuitionistic Logic. Such a theory of grammar is per se attractive for a formal linguist who is interested in the interplay between formal logic and the structure of language. What makes L- bekstyleCategorialGrammarevenmoreexcitingisthefactthat(asvan Benthem,1983pointsout)theCurry-Howardcorrespondence-acentral part of mathematical proof theory which establishes a deep connection betweenconstructivelogicsandthe?-calculus-suppliesthetypelogical syntax with an extremely elegant and independently motivated interface to model-theoretic semantics. Prima facie, anaphora does not 't very well into the Categorial picture of the syntax-semantics interface. The Curry-Howard based composition of meaning operates in a local way, and meaning ass- bly is linear, i.e., every piece of lexical meaning is used exactly once. Anaphora, on the other hand, is in principle unbounded, and it involves by de?nition the multiple use of certain semantic resources. The latter problem has been tackled by several Categorial grammarians by ass- ing su?ciently complex lexical meanings for anaphoric expressions, but the locality problem is not easy to solve in a purely lexical way.
The ideal of using human language to control machines requires a practical theory of natural language communication that includes grammatical analysis of language signs, plus a model of the cognitive agent, with interfaces for recognition and action, an internal database, and an algorithm for reading content in and out. This book offers a functional framework for theoretical analysis of natural language communication and for practical applications of natural language processing.
The Language of Design articulates the theory that there is a language of design. Drawing upon insights from computational language processing, the language of design is modeled computationally through latent semantic analysis (LSA), lexical chain analysis (LCA), and sentiment analysis (SA). The statistical co-occurrence of semantics (LSA), semantic relations (LCA), and semantic modifiers (SA) in design text is used to illustrate how the reality producing effect of language is itself an enactment of design, allowing a new understanding of the connections between creative behaviors. The computation of the language of design makes it possible to make direct measurements of creative behaviors which are distributed across social spaces and mediated through language. The book demonstrates how machine understanding of design texts based on computation over the language of design yields practical applications for design management.
This book provides system developers and researchers in natural language processing and computational linguistics with the necessary background information for working with the Arabic language. The goal is to introduce Arabic linguistic phenomena and review the state-of-the-art in Arabic processing. The book discusses Arabic script, phonology, orthography, morphology, syntax and semantics, with a final chapter on machine translation issues. The chapter sizes correspond more or less to what is linguistically distinctive about Arabic, with morphology getting the lion's share, followed by Arabic script. No previous knowledge of Arabic is needed. This book is designed for computer scientists and linguists alike. The focus of the book is on Modern Standard Arabic; however, notes on practical issues related to Arabic dialects and languages written in the Arabic script are presented in different chapters. Table of Contents: What is "Arabic"? / Arabic Script / Arabic Phonology and Orthography / Arabic Morphology / Computational Morphology Tasks / Arabic Syntax / A Note on Arabic Semantics / A Note on Arabic and Machine Translation
Search for information is no longer exclusively limited within the native language of the user, but is more and more extended to other languages. This gives rise to the problem of cross-language information retrieval (CLIR), whose goal is to find relevant information written in a different language to a query. In addition to the problems of monolingual information retrieval (IR), translation is the key problem in CLIR: one should translate either the query or the documents from a language to another. However, this translation problem is not identical to full-text machine translation (MT): the goal is not to produce a human-readable translation, but a translation suitable for finding relevant documents. Specific translation methods are thus required. The goal of this book is to provide a comprehensive description of the specific problems arising in CLIR, the solutions proposed in this area, as well as the remaining problems. The book starts with a general description of the monolingual IR and CLIR problems. Different classes of approaches to translation are then presented: approaches using an MT system, dictionary-based translation and approaches based on parallel and comparable corpora. In addition, the typical retrieval effectiveness using different approaches is compared. It will be shown that translation approaches specifically designed for CLIR can rival and outperform high-quality MT systems. Finally, the book offers a look into the future that draws a strong parallel between query expansion in monolingual IR and query translation in CLIR, suggesting that many approaches developed in monolingual IR can be adapted to CLIR. The book can be used as an introduction to CLIR. Advanced readers can also find more technical details and discussions about the remaining research challenges in the future. It is suitable to new researchers who intend to carry out research on CLIR. Table of Contents: Preface / Introduction / Using Manually Constructed Translation Systems and Resources for CLIR / Translation Based on Parallel and Comparable Corpora / Other Methods to Improve CLIR / A Look into the Future: Toward a Unified View of Monolingual IR and CLIR? / References / Author Biography
How far can you take fuzzy logic, the brilliant conceptual framework made famous by George Klir? With this book, you can find out. The authors of this updated edition have extended Klir s work by taking fuzzy logic into even more areas of application. It serves a number of functions, from an introductory text on the concept of fuzzy logic to a treatment of cutting-edge research problems suitable for a fully paid-up member of the fuzzy logic community.
In both the linguistic and the language engineering community, the creation and use of annotated text collections (or annotated corpora) is currently a hot topic. Annotated texts are of interest for research as well as for the development of natural language pro cessing (NLP) applications. Unfortunately, the annotation of text material, especially more interesting linguistic annotation, is as yet a difficult task and can entail a substan tial amount of human involvement. Allover the world, work is being done to replace as much as possible of this human effort by computer processing. At the frontier of what can already be done (mostly) automatically we find syntactic wordclass tagging, the annotation of the individual words in a text with an indication of their morpho syntactic classification. This book describes the state of the art in syntactic wordclass tagging. As an attempt to give an overall view of the field, this book is of interest to (at least) two, possibly very different, types of reader. The first type consists of those people who are using, or are planning to use, tagged material and taggers. They will want to know what the possibilities and impossibilities of tagging are, but are not necessarily interested in the internal working of automatic taggers. This, on the other hand, is the main interest of our second type of reader, the builders of automatic taggers and other natural language processing software."
From a linguistic perspective, it is quanti?cation which makes all the di?- ence between "having no dollars" and "having a lot of dollars." And it is the meaning of the quanti?er "most" which eventually decides if "Most Ame- cans voted Kerry" or "Most Americans voted Bush" (as it stands). Natural language(NL)quanti?erslike"all,""almostall,""many"etc. serveanimp- tant purpose because they permit us to speak about properties of collections, as opposed to describing speci?c individuals only; in technical terms, qu- ti?ers are a 'second-order' construct. Thus the quantifying statement "Most Americans voted Bush" asserts that the set of voters of George W. Bush c- prisesthemajorityofAmericans, while"Bushsneezes"onlytellsussomething about a speci?c individual. By describing collections rather than individuals, quanti?ers extend the expressive power of natural languages far beyond that of propositional logic and make them a universal communication medium. Hence language heavily depends on quantifying constructions. These often involve fuzzy concepts like "tall," and they frequently refer to fuzzy quantities in agreement like "about ten," "almost all," "many" etc. In order to exploit this expressive power and make fuzzy quanti?cation available to technical applications, a number of proposals have been made how to model fuzzy quanti?ers in the framework of fuzzy set theory. These approaches usually reduce fuzzy quanti?cation to a comparison of scalar or fuzzy cardinalities 197, 132].
Computational semantics is the art and science of computing meaning in natural language. The meaning of a sentence is derived from the meanings of the individual words in it, and this process can be made so precise that it can be implemented on a computer. Designed for students of linguistics, computer science, logic and philosophy, this comprehensive text shows how to compute meaning using the functional programming language Haskell. It deals with both denotational meaning (where meaning comes from knowing the conditions of truth in situations), and operational meaning (where meaning is an instruction for performing cognitive action). Including a discussion of recent developments in logic, it will be invaluable to linguistics students wanting to apply logic to their studies, logic students wishing to learn how their subject can be applied to linguistics, and functional programmers interested in natural language processing as a new application area.
This book presents a critical overview of current work on
linguistic features and establishes new bases for their use in the
study and understanding of language.
This book is aimed at providing an overview of several aspects of semantic role labeling. Chapter 1 begins with linguistic background on the definition of semantic roles and the controversies surrounding them. Chapter 2 describes how the theories have led to structured lexicons such as FrameNet, VerbNet and the PropBank Frame Files that in turn provide the basis for large scale semantic annotation of corpora. This data has facilitated the development of automatic semantic role labeling systems based on supervised machine learning techniques. Chapter 3 presents the general principles of applying both supervised and unsupervised machine learning to this task, with a description of the standard stages and feature choices, as well as giving details of several specific systems. Recent advances include the use of joint inference to take advantage of context sensitivities, and attempts to improve performance by closer integration of the syntactic parsing task with semantic role labeling. Chapter 3 also discusses the impact the granularity of the semantic roles has on system performance. Having outlined the basic approach with respect to English, Chapter 4 goes on to discuss applying the same techniques to other languages, using Chinese as the primary example. Although substantial training data is available for Chinese, this is not the case for many other languages, and techniques for projecting English role labels onto parallel corpora are also presented. Table of Contents: Preface / Semantic Roles / Available Lexical Resources / Machine Learning for Semantic Role Labeling / A Cross-Lingual Perspective / Summary
Many approaches have already been proposed for classification and modeling in the literature. These approaches are usually based on mathematical mod els. Computer systems can easily handle mathematical models even when they are complicated and nonlinear (e.g., neural networks). On the other hand, it is not always easy for human users to intuitively understand mathe matical models even when they are simple and linear. This is because human information processing is based mainly on linguistic knowledge while com puter systems are designed to handle symbolic and numerical information. A large part of our daily communication is based on words. We learn from various media such as books, newspapers, magazines, TV, and the Inter net through words. We also communicate with others through words. While words play a central role in human information processing, linguistic models are not often used in the fields of classification and modeling. If there is no goal other than the maximization of accuracy in classification and model ing, mathematical models may always be preferred to linguistic models. On the other hand, linguistic models may be chosen if emphasis is placed on interpretability."
This book teaches the principles of natural language processing and covers linguistics issues. It also details the language-processing functions involved, including part-of-speech tagging using rules and stochastic techniques. A key feature of the book is the author's hands-on approach throughout, with extensive exercises, sample code in Prolog and Perl, and a detailed introduction to Prolog. The book is suitable for researchers and students of natural language processing and computational linguistics.
Considerable progress has been made in recent years in the development of dialogue systems that support robust and efficient human-machine interaction using spoken language. Spoken dialogue technology allows various interactive applications to be built and used for practical purposes, and research focuses on issues that aim to increase the system's communicative competence by including aspects of error correction, cooperation, multimodality, and adaptation in context. This book gives a comprehensive view of state-of-the-art techniques that are used to build spoken dialogue systems. It provides an overview of the basic issues such as system architectures, various dialogue management methods, system evaluation, and also surveys advanced topics concerning extensions of the basic model to more conversational setups. The goal of the book is to provide an introduction to the methods, problems, and solutions that are used in dialogue system development and evaluation. It presents dialogue modelling and system development issues relevant in both academic and industrial environments and also discusses requirements and challenges for advanced interaction management and future research. Table of Contents: Preface / Introduction to Spoken Dialogue Systems / Dialogue Management / Error Handling / Case Studies: Advanced Approaches to Dialogue Management / Advanced Issues / Methodologies and Practices of Evaluation / Future Directions / References / Author Biographies
This book introduces Chinese language-processing issues and techniques to readers who already have a basic background in natural language processing (NLP). Since the major difference between Chinese and Western languages is at the word level, the book primarily focuses on Chinese morphological analysis and introduces the concept, structure, and interword semantics of Chinese words. The following topics are covered: a general introduction to Chinese NLP; Chinese characters, morphemes, and words and the characteristics of Chinese words that have to be considered in NLP applications; Chinese word segmentation; unknown word detection; word meaning and Chinese linguistic resources; interword semantics based on word collocation and NLP techniques for collocation extraction. Table of Contents: Introduction / Words in Chinese / Challenges in Chinese Morphological Processing / Chinese Word Segmentation / Unknown Word Identification / Word Meaning / Chinese Collocations / Automatic Chinese Collocation Extraction / Appendix / References / Author Biographies
Our world is being revolutionized by data-driven methods: access to large amounts of data has generated new insights and opened exciting new opportunities in commerce, science, and computing applications. Processing the enormous quantities of data necessary for these advances requires large clusters, making distributed computing paradigms more crucial than ever. MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. The programming model provides an easy-to-understand abstraction for designing scalable algorithms, while the execution framework transparently handles many system-level details, ranging from scheduling to synchronization to fault tolerance. This book focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning. We introduce the notion of MapReduce design patterns, which represent general reusable solutions to commonly occurring problems across a variety of problem domains. This book not only intends to help the reader "think in MapReduce", but also discusses limitations of the programming model as well. Table of Contents: Introduction / MapReduce Basics / MapReduce Algorithm Design / Inverted Indexing for Text Retrieval / Graph Algorithms / EM Algorithms for Text Processing / Closing Remarks
The ninth campaign of the Cross-Language Evaluation Forum (CLEF) for European languages was held from January to September 2008. There were seven main eval- tion tracks in CLEF 2008 plus two pilot tasks. The aim, as usual, was to test the p- formance of a wide range of multilingual information access (MLIA) systems or s- tem components. This year, 100 groups, mainly but not only from academia, parti- pated in the campaign. Most of the groups were from Europe but there was also a good contingent from North America and Asia plus a few participants from South America and Africa. Full details regarding the design of the tracks, the methodologies used for evaluation, and the results obtained by the participants can be found in the different sections of these proceedings. The results of the CLEF 2008 campaign were presented at a two-and-a-half day workshop held in Aarhus, Denmark, September 17-19, and attended by 150 resear- ers and system developers. The annual workshop, held in conjunction with the European Conference on Digital Libraries, plays an important role by providing the opportunity for all the groups that have participated in the evaluation campaign to get together comparing approaches and exchanging ideas. The schedule of the workshop was divided between plenary track overviews, and parallel, poster and breakout sessions presenting this year's experiments and discu- ing ideas for the future. There were several invited talks.
th TSD 2009was the 12 eventin the series of InternationalConferenceson Text, Speech andDialoguesupportedbytheInternationalSpeechCommunicationAssociation(ISCA) ? and Czech Society for Cybernetics and Informatics (CSKI). This year, TSD was held in Plzen ? (Pilsen), in the Primavera Conference Center, during September 13-17, 2009 and it was organized by the University of West Bohemia in Plzen ? in cooperation with Masaryk University of Brno, Czech Republic. Like its predecessors, TSD 2009 hi- lighted to both the academic and scienti?c world the importance of text and speech processing and its most recent breakthroughsin current applications. Both experienced researchers and professionals as well as newcomers to the text and speech processing ?eld, interested in designing or evaluating interactive software, developing new int- action technologies, or investigatingoverarchingtheories of text and speech processing found in the TSD conference a forum to communicate with people sharing similar - terests. The conference is an interdisciplinary forum, intertwining research in speech and language processing with its applications in everyday practice. We feel that the mixture of different approaches and applications offered a great opportunity to get - quaintedwith currentactivitiesin all aspects oflanguagecommunicationand to witness the amazing vitality of researchers from developing countries too. This year's conference was partially oriented toward semantic processing, which was chosen as the main topic of the conference. All invited speakers (Frederick Jelinek, Louise Guthrie, Roberto Pieraccini, Tilman Becker, and Elmar Not ] h) gave lectures on thenewestresultsintherelativelybroadandstillunexploredareaofsemanticprocessing."
Linguistic annotation and text analytics are active areas of research and development, with academic conferences and industry events such as the Linguistic Annotation Workshops and the annual Text Analytics Summits. This book provides a basic introduction to both fields, and aims to show that good linguistic annotations are the essential foundation for good text analytics. After briefly reviewing the basics of XML, with practical exercises illustrating in-line and stand-off annotations, a chapter is devoted to explaining the different levels of linguistic annotations. The reader is encouraged to create example annotations using the WordFreak linguistic annotation tool. The next chapter shows how annotations can be created automatically using statistical NLP tools, and compares two sets of tools, the OpenNLP and Stanford NLP tools. The second half of the book describes different annotation formats and gives practical examples of how to interchange annotations between different formats using XSLT transformations. The two main text analytics architectures, GATE and UIMA, are then described and compared, with practical exercises showing how to configure and customize them. The final chapter is an introduction to text analytics, describing the main applications and functions including named entity recognition, coreference resolution and information extraction, with practical examples using both open source and commercial tools. Copies of the example files, scripts, and stylesheets used in the book are available from the companion website, located at the book website. Table of Contents: Working with XML / Linguistic Annotation / Using Statistical NLP Tools / Annotation Interchange / Annotation Architectures / Text Analytics
This volume constitutes the thoroughly refereed post-conference proceedings of the First and Second International Symposia on Sanskrit Computational Linguistics, held in Rocquencourt, France, in October 2007 and in Providence, RI, USA, in May 2008 respectively. The 11 revised full papers of the first and the 12 revised papers of the second symposium presented with an introduction and a keynote talk were carefully reviewed and selected from the lectures given at both events. The papers address several topics such as the structure of the Paninian grammatical system, computational linguistics, lexicography, lexical databases, formal description of sanskrit grammar, phonology and morphology, machine translation, philology, and OCR.
Dependency-based methods for syntactic parsing have become increasingly popular in natural language processing in recent years. This book gives a thorough introduction to the methods that are most widely used today. After an introduction to dependency grammar and dependency parsing, followed by a formal characterization of the dependency parsing problem, the book surveys the three major classes of parsing models that are in current use: transition-based, graph-based, and grammar-based models. It continues with a chapter on evaluation and one on the comparison of different methods, and it closes with a few words on current trends and future prospects of dependency parsing. The book presupposes a knowledge of basic concepts in linguistics and computer science, as well as some knowledge of parsing methods for constituency-based representations. Table of Contents: Introduction / Dependency Parsing / Transition-Based Parsing / Graph-Based Parsing / Grammar-Based Parsing / Evaluation / Comparison / Final Thoughts
This volume presents the proceedings of the Third International Sanskrit C- putational Linguistics Symposium hosted by the University of Hyderabad, Hyderabad, IndiaduringJanuary15-17,2009.TheseriesofsymposiaonSanskrit Computational Linguistics began in 2007. The ?rst symposium was hosted by INRIA atRocquencourt, Francein October 2007asa partofthe jointcollabo- tion between INRIA and the University of Hyderabad. This joint collaboration expanded both geographically as well as academically covering more facets of Sanskrit Computaional Linguistics, when the second symposium was hosted by Brown University, USA in May 2008. We received 16 submissions, which were reviewed by the members of the Program Committee. After discussion, nine of them were selected for presen- tion. These nine papers fall under four broad categories: four papers deal with the structure of Pan - ini's Astad - hyay - - ?. Two of them deal with parsing issues, . .. two with various aspects of machine translation, and the last one with the Web concordance of an important Sanskrit text. Ifwelookretrospectivelyoverthelasttwoyears, thethreesymposiainsucc- sion have seen not only continuity of some of the themes, but also steady growth of the community. As is evident, researchers from diverse disciplines such as l- guistics, computer science, philology, and vy- akarana are collaborating with the . scholars from other disciplines, witnessing the growth of Sanskrit computational linguistics as an emergent discipline. We are grateful to S.D. Joshi, Jan Houben, and K.V.R. Krishnamacharyulu for accepting our invitation to deliver the invited speeches." |
![]() ![]() You may like...
Corpus Stylistics in Heart of Darkness…
Lorenzo Mastropierro
Hardcover
R4,472
Discovery Miles 44 720
Foundation Models for Natural Language…
Gerhard PaaĂź, Sven Giesselbach
Hardcover
Linguistic Inquiries into Donald…
Ulrike Schneider, Matthias Eitelmann
Hardcover
R4,138
Discovery Miles 41 380
|