![]() |
![]() |
Your cart is empty |
||
Books > Language & Literature > Language & linguistics > Computational linguistics
This book presents methods and approaches used to identify the true author of a doubtful document or text excerpt. It provides a broad introduction to all text categorization problems (like authorship attribution, psychological traits of the author, detecting fake news, etc.) grounded in stylistic features. Specifically, machine learning models as valuable tools for verifying hypotheses or revealing significant patterns hidden in datasets are presented in detail. Stylometry is a multi-disciplinary field combining linguistics with both statistics and computer science. The content is divided into three parts. The first, which consists of the first three chapters, offers a general introduction to stylometry, its potential applications and limitations. Further, it introduces the ongoing example used to illustrate the concepts discussed throughout the remainder of the book. The four chapters of the second part are more devoted to computer science with a focus on machine learning models. Their main aim is to explain machine learning models for solving stylometric problems. Several general strategies used to identify, extract, select, and represent stylistic markers are explained. As deep learning represents an active field of research, information on neural network models and word embeddings applied to stylometry is provided, as well as a general introduction to the deep learning approach to solving stylometric questions. In turn, the third part illustrates the application of the previously discussed approaches in real cases: an authorship attribution problem, seeking to discover the secret hand behind the nom de plume Elena Ferrante, an Italian writer known worldwide for her My Brilliant Friend's saga; author profiling in order to identify whether a set of tweets were generated by a bot or a human being and in this second case, whether it is a man or a woman; and an exploration of stylistic variations over time using US political speeches covering a period of ca. 230 years. A solutions-based approach is adopted throughout the book, and explanations are supported by examples written in R. To complement the main content and discussions on stylometric models and techniques, examples and datasets are freely available at the author's Github website.
This book re-examines the notion of word associations, more precisely collocations. It attempts to come to a potentially more generally applicable definition of collocation and how to best extract, identify and measure collocations. The book highlights the role played by (i) automatic linguistic annotation (part-of-speech tagging, syntactic parsing, etc.), (ii) using semantic criteria to facilitate the identification of collocations, (iii) multi-word structured, instead of the widespread assumption of bipartite collocational structures, for capturing the intricacies of the phenomenon of syntagmatic attraction, (iv) considering collocation and valency as near neighbours in the lexis-grammar continuum and (v) the mathematical properties of statistical association measures in the automatic extraction of collocations from corpora. This book is an ideal guide to the use of statistics in collocation analysis and lexicography, as well as a practical text to the development of skills in the application of computational lexicography. Lexical Collocation Analysis: Advances and Applications begins with a proposal for integrating both collocational and valency phenomena within the overarching theoretical framework of construction grammar. Next the book makes the case for integrating advances in syntactic parsing and in collocational analysis. Chapter 3 offers an innovative look at complementing corpus data and dictionaries in the identification of specific types of collocations consisting of restricted predicate-argument combinations. This strategy complements corpus collocational data with network analysis techniques applied to dictionary entries. Chapter 4 explains the potential of collocational graphs and networks both as a visualization tool and as an analytical technique. Chapter 5 introduces MERGE (Multi-word Expressions from the Recursive Grouping of Elements), a data-driven approach to the identification and extraction of multi-word expressions from corpora. Finally the book concludes with an analysis and evaluation of factors influencing the performance of collocation extraction methods in parsed corpora.
Research monograph presenting a new approach to Computational Linguistics The ultimate goal of Computational Linguistics is to teach the computer to understand Natural Language. This research monograph presents a description of English according to algorithms which can be programmed into a computer to analyse natural language texts. The algorithmic approach uses series of instructions, written in Natural Language and organised in flow charts, with the aim of analysing certain aspects of the grammar of a sentence. One problem with text processing is the difficulty in distinguishing word forms that belong to parts of speech taken out of context. In order to solve this problem, Hristo Georgiev starts with the assumption that every word is either a verb or a non-verb. Form here he presents an algorithm which allows the computer to recognise parts of speech which to a human would be obvious though the meaning of the words. Emphasis for a computer is placed on verbs, nouns, participles and adjectives. English Algorithmic Grammar presents information for computers to recognise tenses, syntax, parsing, reference, and clauses. The final chapters of the book examine the further applications of an algorithmic approach to English grammar, and suggests ways in which the computer can be programmed to recognise meaning. This is an innovative, cutting-edge approach to computational linguistics that will be essential reading for academics researching computational linguistics, machine translation and natural language processing.
Corpus linguistics is often regarded as a methodology in its own right, but little attention has been given to the theoretical perspectives from which the subject can be approached. The present book contributes to filling this gap. Bringing together original contributions by internationally renowned authors, the chapters include coverage of the lexical priming theory, parole-linguistics, a four-part model of language system and language use, and the concept of local textual functions. The theoretical arguments are illustrated and complemented by case studies using data from large corpora such as the BNC, smaller purpose-built corpora, and Google searches. By presenting theoretical positions in corpus linguistics, "Text, Discourse, and Corpora" provides an essential overview for advanced undergraduate, postgraduate and academic readers. "Corpus and Discourse Series" editors are: Wolfgang Teubert, University of Birmingham, and Michaela Mahlberg, Liverpool Hope University College. Editorial Board: Frantisek Cermak (Prague), Susan Conrad (Portland), Geoffrey Leech (Lancaster), Elena Tognini-Bonelli (Lecce and TWC), Ruth Wodak (Lancaster and Vienna), and Feng Zhiwei (Beijing). Corpus linguistics provides the methodology to extract meaning from texts. Taking as its starting point the fact that language is not a mirror of reality but lets us share what we know, believe and think about reality, it focuses on language as a social phenomenon, and makes visible the attitudes and beliefs expressed by the members of a discourse community. Consisting of both spoken and written language, discourse always has historical, social, functional, and regional dimensions. Discourse can be monolingual or multilingual, interconnected by translations. Discourse is where language and social studies meet. "The Corpus and Discourse" series consists of two strands. The first, "Research in Corpus and Discourse", features innovative contributions to various aspects of corpus linguistics and a wide range of applications, from language technology via the teaching of a second language to a history of mentalities. The second strand, "Studies in Corpus and Discourse", is comprised of key texts bridging the gap between social studies and linguistics. Although equally academically rigorous, this strand will be aimed at a wider audience of academics and postgraduate students working in both disciplines.
This book presents multibiometric watermarking techniques for security of biometric data. This book also covers transform domain multibiometric watermarking techniques and their advantages and limitations. The authors have developed novel watermarking techniques with a combination of Compressive Sensing (CS) theory for the security of biometric data at the system database of the biometric system. The authors show how these techniques offer higher robustness, authenticity, better imperceptibility, increased payload capacity, and secure biometric watermarks. They show how to use the CS theory for the security of biometric watermarks before embedding into the host biometric data. The suggested methods may find potential applications in the security of biometric data at various banking applications, access control of laboratories, nuclear power stations, military base, and airports.
This book brings together selected revised papers representing a multidisciplinary approach to language, music, and gesture, as well as their interaction. Among the number of multidisciplinary and comparative studies of the structure and organization of language and music, the presented book broadens the scope with the inclusion of gesture problems in the analyzed spectrum. A unique feature of the presented collection is that the papers, compiled in one volume, allow readers to see similarities and differences in gesture as an element of non-verbal communication and gesture as the main element of dance. In addition to enhancing the analysis, the data on the perception and comprehension of speech, music, and dance in regard to both their functioning in a natural situation and their reflection in various forms of performing arts makes this collection extremely useful for those who are interested in human cognitive abilities and performing skills. The book begins with a philosophical overview of recent neurophysiological studies reflecting the complexity of higher cognitive functions, which references the idea of the baroque style in art being neither linear nor stable. The following papers are allocated into 5 sections. The papers of the section "Language-Music-Gesture As Semiotic Systems" discuss the issues of symbolic and semiotic aspects of language, music, and gesture, including from the perspective of their notation. This is followed by the issues of "Language-Music-Gesture Onstage" and interaction within the idea of the "World as a Text." The papers of "Teaching Language and Music" present new teaching methods that take into account the interaction of all the cognitive systems examined. The papers of the last two sections focus on issues related primarily to language: The section "Verbalization Of Music And Gesture" considers the problem of describing musical text and non-verbal behavior with language, and papers in the final section "Emotions In Linguistics And Ai-Communication Systems" analyze the ways of expressing emotions in speech and the problems of organizing emotional communication with computer agents.
This book reports on an outstanding thesis that has significantly advanced the state-of-the-art in the automated analysis and classification of speech and music. It defines several standard acoustic parameter sets and describes their implementation in a novel, open-source, audio analysis framework called openSMILE, which has been accepted and intensively used worldwide. The book offers extensive descriptions of key methods for the automatic classification of speech and music signals in real-life conditions and reports on the evaluation of the framework developed and the acoustic parameter sets that were selected. It is not only intended as a manual for openSMILE users, but also and primarily as a guide and source of inspiration for students and scientists involved in the design of speech and music analysis methods that can robustly handle real-life conditions.
This book explains advanced theoretical and application-related issues in grammatical inference, a research area inside the inductive inference paradigm for machine learning. The first three chapters of the book deal with issues regarding theoretical learning frameworks; the next four chapters focus on the main classes of formal languages according to Chomsky's hierarchy, in particular regular and context-free languages; and the final chapter addresses the processing of biosequences. The topics chosen are of foundational interest with relatively mature and established results, algorithms and conclusions. The book will be of value to researchers and graduate students in areas such as theoretical computer science, machine learning, computational linguistics, bioinformatics, and cognitive psychology who are engaged with the study of learning, especially of the structure underlying the concept to be learned. Some knowledge of mathematics and theoretical computer science, including formal language theory, automata theory, formal grammars, and algorithmics, is a prerequisite for reading this book.
In the course of his career, Professor Halliday has continued to address the issue of the application of linguistic scholarship for computational and quantitative studies. The sixth volume in the collected works of Professor M.A.K. Halliday includes works that span the last five decades, covering such topics as machine translation: the early years; and probabilistic grammar. The last section of this volume includes discussion of recent collaborative efforts bringing together those working in systemic functional grammar, fuzzy logic and "intelligent computing," engaging in what Halliday refers to as computing with meaning. The Collected Works of M.A.K. Halliday is a series that brings together Halliday's publications in many branches of linguistics, both theoretical and applied (a distinction which he himself rejects), including grammar and semantics, discourse analysis and stylistics, phonology, sociolinguistics, computational linguistics, language education and child language development.
The relation between ontologies and language is currently at the forefront of natural language processing (NLP). Ontologies, as widely used models in semantic technologies, have much in common with the lexicon. A lexicon organizes words as a conventional inventory of concepts, while an ontology formalizes concepts and their logical relations. A shared lexicon is the prerequisite for knowledge-sharing through language, and a shared ontology is the prerequisite for knowledge-sharing through information technology. In building models of language, computational linguists must be able to accurately map the relations between words and the concepts that they can be linked to. This book focuses on the technology involved in enabling integration between lexical resources and semantic technologies. It will be of interest to researchers and graduate students in NLP, computational linguistics, and knowledge engineering, as well as in semantics, psycholinguistics, lexicology and morphology/syntax.
This book explores novel aspects of social robotics, spoken dialogue systems, human-robot interaction, spoken language understanding, multimodal communication, and system evaluation. It offers a variety of perspectives on and solutions to the most important questions about advanced techniques for social robots and chat systems. Chapters by leading researchers address key research and development topics in the field of spoken dialogue systems, focusing in particular on three special themes: dialogue state tracking, evaluation of human-robot dialogue in social robotics, and socio-cognitive language processing. The book offers a valuable resource for researchers and practitioners in both academia and industry whose work involves advanced interaction technology and who are seeking an up-to-date overview of the key topics. It also provides supplementary educational material for courses on state-of-the-art dialogue system technologies, social robotics, and related research fields.
This book introduces Meaningful Purposive Interaction Analysis (MPIA) theory, which combines social network analysis (SNA) with latent semantic analysis (LSA) to help create and analyse a meaningful learning landscape from the digital traces left by a learning community in the co-construction of knowledge. The hybrid algorithm is implemented in the statistical programming language and environment R, introducing packages which capture - through matrix algebra - elements of learners' work with more knowledgeable others and resourceful content artefacts. The book provides comprehensive package-by-package application examples, and code samples that guide the reader through the MPIA model to show how the MPIA landscape can be constructed and the learner's journey mapped and analysed. This building block application will allow the reader to progress to using and building analytics to guide students and support decision-making in learning.
In this thesis, the author makes several contributions to the study of design of graphical materials. The thesis begins with a review of the relationship between design and aesthetics, and the use of mathematical models to capture this relationship. Then, a novel method for linking linguistic concepts to colors using the Latent Dirichlet Allocation Dual Topic Model is proposed. Next, the thesis studies the relationship between aesthetics and spatial layout by formalizing the notion of visual balance. Applying principles of salience and Gaussian mixture models over a body of about 120,000 aesthetically rated professional photographs, the author provides confirmation of Arnhem's theory about spatial layout. The thesis concludes with a description of tools to support automatically generating personalized design.
This book presents a method of linking the ordered structure of the cosmos with human thoughts: the theory of language holography. In the view presented here, the cosmos is in harmony with the human body and language, and human thoughts are holographic with the cosmos at the level of language. In a word, the holographic relation is nothing more than the bridge by means of which Guanlian Qian connects the cosmos, human, and language. This is a vitally important contribution to linguistic and philosophical studies that cannot be ignored. The book has two main focus areas: outer language holography and inner language holography. These two areas constitute the core of the dynamic and holistic view put forward in the theory of language holography. The book's main properties can be summarized into the following points: First and foremost, it is a book created in toto by a Chinese scholar devoted to pragmatics, theoretical linguistics, and philosophy of language. Secondly, the book was accepted by a top Chinese publisher and was republished the second year, which reflected its value and appeal. Thirdly, in terms of writing style, the book is characterized by succinctness and logic. As a result, it reads fluidly and smoothly without redundancies, which is not that common in linguistic or even philosophical works. Lastly, as stated by the author in the introduction, "Creation is the development of previous capacities, but it is also the generation of new ones"; this book can be said to put this concept into practice. Overall, the book offers a unique resource to readers around the world who want to know more about the truly original and innovative studies of language in Chinese academia.
This textbook examines empirical linguistics from a theoretical linguist's perspective. It provides both a theoretical discussion of what quantitative corpus linguistics entails and detailed, hands-on, step-by-step instructions to implement the techniques in the field. The statistical methodology and R-based coding from this book teach readers the basic and then more advanced skills to work with large data sets in their linguistics research and studies. Massive data sets are now more than ever the basis for work that ranges from usage-based linguistics to the far reaches of applied linguistics. This book presents much of the methodology in a corpus-based approach. However, the corpus-based methods in this book are also essential components of recent developments in sociolinguistics, historical linguistics, computational linguistics, and psycholinguistics. Material from the book will also be appealing to researchers in digital humanities and the many non-linguistic fields that use textual data analysis and text-based sensorimetrics. Chapters cover topics including corpus processing, frequencing data, and clustering methods. Case studies illustrate each chapter with accompanying data sets, R code, and exercises for use by readers. This book may be used in advanced undergraduate courses, graduate courses, and self-study.
This contributed volume explores the achievements gained and the remaining puzzling questions by applying dynamical systems theory to the linguistic inquiry. In particular, the book is divided into three parts, each one addressing one of the following topics: 1) Facing complexity in the right way: mathematics and complexity 2) Complexity and theory of language 3) From empirical observation to formal models: investigation of specific linguistic phenomena, like enunciation, deixis, or the meaning of the metaphorical phrases The application of complexity theory to describe cognitive phenomena is a recent and very promising trend in cognitive science. At the time when dynamical approaches triggered a paradigm shift in cognitive science some decade ago, the major topic of research were the challenges imposed by classical computational approaches dealing with the explanation of cognitive phenomena like consciousness, decision making and language. The target audience primarily comprises researchers and experts in the field but the book may also be beneficial for graduate and post-graduate students who want to enter the field.
This book brings together work on Turkish natural language and speech processing over the last 25 years, covering numerous fundamental tasks ranging from morphological processing and language modeling, to full-fledged deep parsing and machine translation, as well as computational resources developed along the way to enable most of this work. Owing to its complex morphology and free constituent order, Turkish has proved to be a fascinating language for natural language and speech processing research and applications. After an overview of the aspects of Turkish that make it challenging for natural language and speech processing tasks, this book discusses in detail the main tasks and applications of Turkish natural language and speech processing. A compendium of the work on Turkish natural language and speech processing, it is a valuable reference for new researchers considering computational work on Turkish, as well as a one-stop resource for commercial and research institutions planning to develop applications for Turkish. It also serves as a blueprint for similar work on other Turkic languages such as Azeri, Turkmen and Uzbek.
All human speech has expression. It is part of the 'humanness' of speech, and is a quality listeners expect to find. Without expression, speech sounds lifeless and artificial. Remove expression, and what's left is the bare bones of the intended message, but none of the feelings which surround the message. The purpose of this book is to present research examining expressive content in speech with a view to simulating expression in computer speech. Human beings communicate expressively with each other in conversation: now in the computer age there is a perceived eed for machines to communicate expressively with humans in dialogue.
This book offers an introduction to modern natural language processing using machine learning, focusing on how neural networks create a machine interpretable representation of the meaning of natural language. Language is crucially linked to ideas - as Webster's 1923 "English Composition and Literature" puts it: "A sentence is a group of words expressing a complete thought". Thus the representation of sentences and the words that make them up is vital in advancing artificial intelligence and other "smart" systems currently being developed. Providing an overview of the research in the area, from Bengio et al.'s seminal work on a "Neural Probabilistic Language Model" in 2003, to the latest techniques, this book enables readers to gain an understanding of how the techniques are related and what is best for their purposes. As well as a introduction to neural networks in general and recurrent neural networks in particular, this book details the methods used for representing words, senses of words, and larger structures such as sentences or documents. The book highlights practical implementations and discusses many aspects that are often overlooked or misunderstood. The book includes thorough instruction on challenging areas such as hierarchical softmax and negative sampling, to ensure the reader fully and easily understands the details of how the algorithms function. Combining practical aspects with a more traditional review of the literature, it is directly applicable to a broad readership. It is an invaluable introduction for early graduate students working in natural language processing; a trustworthy guide for industry developers wishing to make use of recent innovations; and a sturdy bridge for researchers already familiar with linguistics or machine learning wishing to understand the other.
This book integrates current advances in biology, economics of information and linguistics research through applications using agent-based modeling and social network analysis to develop scenarios of communication and language emergence in the social aspects of biological communications. The book presents a model of communication emergence that can be applied both to human and non-human living organism networks. The model is based on economic concepts and individual behavior fundamental for the study of trust and reputation networks in social science, particularly in economics; it is also based on the theory of the emergence of norms and historical path dependence that has been influential in institutional economics. Also included are mathematical models and code for agent-based models to explore various scenarios of language evolution, as well as a computer application that explores language and communication in biological versus social organisms, and the emergence of various meanings and grammars in human networks. Emergence of Communication in Socio-Biological Networks offers both a completely novel approach to communication emergence and language evolution and provides a path for the reader to explore various scenarios of language and communication that are not constrained to the human networks alone. By illustrating how computational social science and the complex systems approach can incorporate multiple disciplines and offer an integrated theory-model approach to the evolution of language, the book will be of interest to researchers working with computational linguistics, mathematical linguistics, and complex systems.
In this book, leading researchers in morphology, syntax, language acquisition, psycholinguistics, and computational linguistics address central questions about the form and acquisition of analogy in grammar. What kinds of patterns do speakers select as the basis for analogical extension? What types of items are particularly susceptible or resistant to analogical pressures? At what levels do analogical processes operate and how do processes interact? What formal mechanisms are appropriate for modelling analogy? The novel synthesis of typological, theoretical, computational, and developmental paradigms in this volume brings us closer to answering these questions than ever before.
The last decades have witnessed a renewed interest in near-synonymy. In particular, recent distributional corpus-based approaches used for semantic analysis have successfully uncovered subtle distinctions in meaning between near-synonyms. However, most studies have dealt with the semantic structure of sets of near-synonyms from a synchronic perspective, while their diachronic evolution generally has been neglected. Against this backdrop, the aim of this book is to examine five adjectival near-synonyms in the history of American English from the understudied semantic domain of SMELL: fragrant, perfumed, scented, sweet-scented, and sweet-smelling. Their distribution is analyzed across a wide range of contexts, including semantic, morphosyntactic, and stylistic ones, since distributional patterns of this type serve as a proxy for semantic (dis)similarity. The data is submitted to various univariate and multivariate statistical techniques, making it possible to uncover fine-grained (dis)similarities among the near-synonyms, as well as possible changes in their prototypical structures. The book sheds valuable light on the diachronic development of lexical near-synonyms, a dimension that has up to now been relatively disregarded.
Cross-Disciplinary Advances in Applied Natural Language Processing: Issues and Approaches defines the role of ANLP within NLP, and alongside other disciplines such as linguistics, computer science, and cognitive science. The description also includes the categorization of current ANLP research, and examples of current research in ANLP. This book is a useful reference for teachers, students, and materials developers in fields spanning linguistics, computer science, and cognitive science.
Tense and aspect are means by which language refers to time-how an event takes place in the past, present, or future. They play a key role in understanding the grammar and structure of all languages, and interest in them reaches across linguistics. The Oxford Handbook of Tense and Aspect is a comprehensive, authoritative, and accessible guide to the topics and theories that currently form the front line of research into tense, aspect, and related areas. The volume contains 36 chapters, divided into 6 sections, written by internationally known experts in theoretical linguistics.
This book presents the consolidated acoustic data for all phones in Standard Colloquial Bengali (SCB), commonly known as Bangla, a Bengali language used by 350 million people in India, Bangladesh, and the Bengali diaspora. The book analyzes the real speech of selected native speakers of the Bangla dialect to ensure that a proper acoustical database is available for the development of speech technologies. The acoustic data presented consists of averages and their normal spread, represented by the standard deviations of necessary acoustic parameters including e.g. formant information for multiple native speakers of both sexes. The study employs two important speech technologies:(1) text to speech synthesis (TTS) and (2) automatic speech recognition (ASR). The procedures, particularly those related to the use of technologies, are described in sufficient detail to enable researchers to use them to create technical acoustic databases for any other Indian dialect. The book offers a unique resource for scientists and industrial practitioners who are interested in the acoustic analysis and processing of Indian dialects to develop similar dialect databases of their own. |
![]() ![]() You may like...
Systems Modeling: Methodologies and…
Antonio Puliafito, Kishor S. Trivedi
Hardcover
R3,064
Discovery Miles 30 640
Interdisciplinary Knowledge Organization
Rick Szostak, Claudio Gnoli, …
Hardcover
R3,556
Discovery Miles 35 560
Memristor Emulator Circuits
Abdullah G. Alharbi, Masud H. Chowdhury
Hardcover
R3,020
Discovery Miles 30 200
Aromatic and Medicinal Plants of…
David Ramiro Aguillón-Gutiérrez, Cristian Torres-León, …
Hardcover
R5,081
Discovery Miles 50 810
Vortex Dynamics - From Physical to…
Ilkay Bakirtas, Nalan Antar
Hardcover
R2,946
Discovery Miles 29 460
|