![]() |
Welcome to Loot.co.za!
Sign in / Register |Wishlists & Gift Vouchers |Help | Advanced search
|
Your cart is empty |
||
|
Books > Language & Literature > Language & linguistics > Computational linguistics
On social media, new forms of communication arise rapidly, many of which are intense, dispersed, and create new communities at a global scale. Such communities can act as distinct information bubbles with their own perspective on the world, and it is difficult for people to find and monitor all these perspectives and relate the different claims made. Within this digital jungle of perspectives on truth, it is difficult to make informed decisions on important things like vaccinations, democracy, and climate change. Understanding and modeling this phenomenon in its full complexity requires an interdisciplinary approach, utilizing the ample data provided by digital communication to offer new insights and opportunities. This interdisciplinary book gives a comprehensive view on social media communication, the different forms it takes, the impact and the technology used to mine it, and defines the roadmap to a more transparent Web.
Corpora are ubiquitous in linguistic research, yet to date, there has been no consensus on how to conceptualize corpus representativeness and collect corpus samples. This pioneering book bridges this gap by introducing a conceptual and methodological framework for corpus design and representativeness. Written by experts in the field, it shows how corpora can be designed and built in a way that is both optimally suited to specific research agendas, and adequately representative of the types of language use in question. It considers questions such as 'what types of texts should be included in the corpus?', and 'how many texts are required?' - highlighting that the degree of representativeness rests on the dual pillars of domain considerations and distribution considerations. The authors introduce, explain, and illustrate all aspects of this corpus representativeness framework in a step-by-step fashion, using examples and activities to help readers develop practical skills in corpus design and evaluation.
Corpus Linguistics has revolutionised the world of language study and is an essential component of work in Applied Linguistics. This book, now in its second edition, provides a thorough introduction to all the key research issues in Corpus Linguistics, from the point of view of Applied Linguistics. The field has progressed a great deal since the first edition, so this edition has been completely rewritten to reflect these advances, whilst still maintaining the emphasis on hands-on corpus research of the first edition. It includes chapters on qualitative and quantitative research, applications in language teaching, discourse studies, and beyond. It also includes an extensive discussion of the place of Corpus Linguistics in linguistic theory, and provides numerous detailed examples of corpus studies throughout. Providing an accessible but thorough grounding to the fascinating, fast-moving field of Corpus Linguistics, this book is essential reading for the student and the researcher alike.
This book covers theoretical work, applications, approaches, and techniques for computational models of information and its presentation by language (artificial, human, or natural in other ways). Computational and technological developments that incorporate natural language are proliferating. Adequate coverage encounters difficult problems related to ambiguities and dependency on context and agents (humans or computational systems). The goal is to promote computational systems of intelligent natural language processing and related models of computation, language, thought, mental states, reasoning, and other cognitive processes.
Automating Linguistics offers an in-depth study of the history of the mathematisation and automation of the sciences of language. In the wake of the first mathematisation of the 1930s, two waves followed: machine translation in the 1950s and the development of computational linguistics and natural language processing in the 1960s. These waves were pivotal given the work of large computerised corpora in the 1990s and the unprecedented technological development of computers and software.Early machine translation was devised as a war technology originating in the sciences of war, amidst the amalgamate of mathematics, physics, logics, neurosciences, acoustics, and emerging sciences such as cybernetics and information theory. Machine translation was intended to provide mass translations for strategic purposes during the Cold War. Linguistics, in turn, did not belong to the sciences of war, and played a minor role in the pioneering projects of machine translation.Comparing the two trends, the present book reveals how the sciences of language gradually integrated the technologies of computing and software, resulting in the second-wave mathematisation of the study of language, which may be called mathematisation-automation. The integration took on various shapes contingent upon cultural and linguistic traditions (USA, ex-USSR, Great Britain and France). By contrast, working with large corpora in the 1990s, though enabled by unprecedented development of computing and software, was primarily a continuation of traditional approaches in the sciences of language sciences, such as the study of spoken and written texts, lexicography, and statistical studies of vocabulary.
This SpringerBrief presents the data- information-and-time (DIT) model that precisely clarifies the semantics behind the terms data, information and their relations to the passage of real time. According to the DIT model a data item is a symbol that appears as a pattern (e.g., visual, sound, gesture, or any bit pattern) in physical space. It is generated by a human or a machine in the current contextual situation and is linked to a concept in the human mind or a set of operations of a machine. An information item delivers the sense or the idea that a human mind extracts out of a given natural language proposition that contains meaningful data items. Since the given tangible, intangible and temporal context are part of the explanation of a data item, a change of context can have an effect on the meaning of data and the sense of a proposition. The DIT model provides a framework to show how the flow of time can change the truth-value of a proposition. This book compares our notions of data, information, and time in differing contexts: in human communication, in the operation of a computer system and in a biological system. In the final Section a few simple examples demonstrate how the lessons learned from the DIT-model can help to improve the design of a computer system.
This handbook is a comprehensive practical resource on corpus linguistics. It features a range of basic and advanced approaches, methods and techniques in corpus linguistics, from corpus compilation principles to quantitative data analyses. The Handbook is organized in six Parts. Parts I to III feature chapters that discuss key issues and the know-how related to various topics around corpus design, methods and corpus types. Parts IV-V aim to offer a user-friendly introduction to the quantitative analysis of corpus data: for each statistical technique discussed, chapters provide a practical guide with R and come with supplementary online material. Part VI focuses on how to write a corpus linguistic paper and how to meta-analyze corpus linguistic research. The volume can serve as a course book as well as for individual study. It will be an essential reading for students of corpus linguistics as well as experienced researchers who want to expand their knowledge of the field.
Labelling data is one of the most fundamental activities in science, and has underpinned practice, particularly in medicine, for decades, as well as research in corpus linguistics since at least the development of the Brown corpus. With the shift towards Machine Learning in Artificial Intelligence (AI), the creation of datasets to be used for training and evaluating AI systems, also known in AI as corpora, has become a central activity in the field as well. Early AI datasets were created on an ad-hoc basis to tackle specific problems. As larger and more reusable datasets were created, requiring greater investment, the need for a more systematic approach to dataset creation arose to ensure increased quality. A range of statistical methods were adopted, often but not exclusively from the medical sciences, to ensure that the labels used were not subjective, or to choose among different labels provided by the coders. A wide variety of such methods is now in regular use. This book is meant to provide a survey of the most widely used among these statistical methods supporting annotation practice. As far as the authors know, this is the first book attempting to cover the two families of methods in wider use. The first family of methods is concerned with the development of labelling schemes and, in particular, ensuring that such schemes are such that sufficient agreement can be observed among the coders. The second family includes methods developed to analyze the output of coders once the scheme has been agreed upon, particularly although not exclusively to identify the most likely label for an item among those provided by the coders. The focus of this book is primarily on Natural Language Processing, the area of AI devoted to the development of models of language interpretation and production, but many if not most of the methods discussed here are also applicable to other areas of AI, or indeed, to other areas of Data Science.
Sentence comprehension - the way we process and understand spoken and written language - is a central and important area of research within psycholinguistics. This book explores the contribution of computational linguistics to the field, showing how computational models of sentence processing can help scientists in their investigation of human cognitive processes. It presents the leading computational model of retrieval processes in sentence processing, the Lewis and Vasishth cue-based retrieval mode, and develops a principled methodology for parameter estimation and model comparison/evaluation using benchmark data, to enable researchers to test their own models of retrieval against the present model. It also provides readers with an overview of the last 20 years of research on the topic of retrieval processes in sentence comprehension, along with source code that allows researchers to extend the model and carry out new research. Comprehensive in its scope, this book is essential reading for researchers in cognitive science.
At present, Web 2.0 technologies are making traditional research genres evolve and form complex genre assemblage with other genres online. This book takes the perspective of genre analysis to provide a timely examination of professional and public communication of science. It gives an updated overview on the increasing diversification of genres for communicating scientific research today by reviewing relevant theories that contribute an understanding of genre evolution and innovation in Web 2.0. The book also offers a much-needed critical enquiry into the dynamics of languages for academic and research communication and reflects on current language-related issues such as academic Englishes, ELF lects, translanguaging, polylanguaging and the multilingualisation of science. Additionally, it complements the critical reflections with data from small-scale specialised corpora and exploratory survey research. The book also includes pedagogical orientations for teaching/training researchers in the STEMM disciplines and proposes several avenues for future enquiry into research genres across languages.
This book sheds new light on corpus-assisted translation pedagogy, an intersection of three distinct but cognate disciplines: corpus linguistics, translation and pedagogy. By taking an innovative and empirical approach to translation teaching, the study utilizes mixed methods, including translation experiments, surveys and in-depth focus groups. The results demonstrated the unique advantages and at the same time called attention to possible pitfalls of using corpora for translation teaching purposes. This book enriches our understanding of corpus application in the setting of translation between Chinese and English, two languages which are each distinctly different from one another. Readers will also discover new horizons in this burgeoning and interdisciplinary field of research. This book appeals to a broad readership, from scholars and researchers who are interested in translation technology to widen the scope of translation studies, translation trainers in search of effective teaching approaches to a growing number of cross-disciplinary postgraduate students longing to improve their translation skills and competence.
This book discusses the state of the art of automated essay scoring, its challenges and its potential. One of the earliest applications of artificial intelligence to language data (along with machine translation and speech recognition), automated essay scoring has evolved to become both a revenue-generating industry and a vast field of research, with many subfields and connections to other NLP tasks. In this book, we review the developments in this field against the backdrop of Elias Page's seminal 1966 paper titled "The Imminence of Grading Essays by Computer." Part 1 establishes what automated essay scoring is about, why it exists, where the technology stands, and what are some of the main issues. In Part 2, the book presents guided exercises to illustrate how one would go about building and evaluating a simple automated scoring system, while Part 3 offers readers a survey of the literature on different types of scoring models, the aspects of essay quality studied in prior research, and the implementation and evaluation of a scoring engine. Part 4 offers a broader view of the field inclusive of some neighboring areas, and Part \ref{part5} closes with summary and discussion. This book grew out of a week-long course on automated evaluation of language production at the North American Summer School for Logic, Language, and Information (NASSLLI), attended by advanced undergraduates and early-stage graduate students from a variety of disciplines. Teachers of natural language processing, in particular, will find that the book offers a useful foundation for a supplemental module on automated scoring. Professionals and students in linguistics, applied linguistics, educational technology, and other related disciplines will also find the material here useful.
This book explores some of the ethical, legal, and social implications of chatbots, or conversational artificial agents. It reviews the possibility of establishing meaningful social relationships with chatbots and investigates the consequences of those relationships for contemporary debates in the philosophy of Artificial Intelligence. The author introduces current technological challenges of AI and discusses how technological progress and social change influence our understanding of social relationships. He then argues that chatbots introduce epistemic uncertainty into human social discourse, but that this can be ameliorated by introducing a new ontological classification or 'status' for chatbots. This step forward would allow humans to reap the benefits of this technological development, without the attendant losses. Finally, the author considers the consequences of chatbots on human-human relationships, providing analysis on robot rights, human-centered design, and the social tension between robophobes and robophiles.
The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing (NLP) applications.This book provides an overview of text ranking with neural network architectures known as transformers, of which BERT (Bidirectional Encoder Representations from Transformers) is the best-known example. The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in NLP, information retrieval (IR), and beyond. This book provides a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. It covers a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage architectures and dense retrieval techniques that perform ranking directly. Two themes pervade the book: techniques for handling long documents, beyond typical sentence-by-sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness (i.e., result quality) and efficiency (e.g., query latency, model and index size). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this book also attempts to prognosticate where the field is heading.
The book presents current research and developments in multilingual speech recognition. The author presents a Multilingual Phone Recognition System (Multi-PRS), developed using a common multilingual phone-set derived from the International Phonetic Alphabets (IPA) based transcription of six Indian languages - Kannada, Telugu, Bengali, Odia, Urdu, and Assamese. The author shows how the performance of Multi-PRS can be improved using tandem features. The book compares Monolingual Phone Recognition Systems (Mono-PRS) versus Multi-PRS and baseline versus tandem system. Methods are proposed to predict Articulatory Features (AFs) from spectral features using Deep Neural Networks (DNN). Multitask learning is explored to improve the prediction accuracy of AFs. Then, the AFs are explored to improve the performance of Multi-PRS using lattice rescoring method of combination and tandem method of combination. The author goes on to develop and evaluate the Language Identification followed by Monolingual phone recognition (LID-Mono) and common multilingual phone-set based multilingual phone recognition systems.
This book presents a taxonomy framework and survey of methods relevant to explaining the decisions and analyzing the inner workings of Natural Language Processing (NLP) models. The book is intended to provide a snapshot of Explainable NLP, though the field continues to rapidly grow. The book is intended to be both readable by first-year M.Sc. students and interesting to an expert audience. The book opens by motivating a focus on providing a consistent taxonomy, pointing out inconsistencies and redundancies in previous taxonomies. It goes on to present (i) a taxonomy or framework for thinking about how approaches to explainable NLP relate to one another; (ii) brief surveys of each of the classes in the taxonomy, with a focus on methods that are relevant for NLP; and (iii) a discussion of the inherent limitations of some classes of methods, as well as how to best evaluate them. Finally, the book closes by providing a list of resources for further research on explainability.
This Element provides a basic introduction to sentiment analysis, aimed at helping students and professionals in corpus linguistics to understand what sentiment analysis is, how it is conducted, and where it can be applied. It begins with a definition of sentiment analysis and a discussion of the domains where sentiment analysis is conducted and used the most. Then, it introduces two main methods that are commonly used in sentiment analysis known as supervised machine-learning and unsupervised learning (or lexicon-based) methods, followed by a step-by-step explanation of how to perform sentiment analysis with R. The Element then provides two detailed examples or cases of sentiment and emotion analysis, with one using an unsupervised method and the other using a supervised learning method.
This pioneering volume lays out a set of methodological principles to guide the description of interpersonal grammar in different languages. It compares interpersonal systems and structures across a range of world languages, showing how discourse, interpersonal relationships between the speakers, and the purpose of their communication, all play a role in shaping the grammatical structures used in interaction. Following an introduction setting out these principles, each chapter focuses on a particular language - Khorchin Mongolian, Mandarin, Tagalog, Pitjantjatjara, Spanish, Brazilian Portuguese, British Sign Language and Scottish Gaelic - and explores mood, polarity, tagging, vocation, assessment and comment systems. The book provides a model for functional grammatical description that can be used to inform work on system and structure across languages as a foundation for functional language typology.
This book deals with two fundamental issues in the semiotics of the image. The first is the relationship between image and observer: how does one look at an image? To answer this question, this book sets out to transpose the theory of enunciation formulated in linguistics over to the visual field. It also aims to clarify the gains made in contemporary visual semiotics relative to the semiology of Roland Barthes and Emile Benveniste. The second issue addressed is the relation between the forces, forms and materiality of the images. How do different physical mediums (pictorial, photographic and digital) influence visual forms? How does materiality affect the generativity of forms? On the forces within the images, the book addresses the philosophical thought of Gilles Deleuze and Rene Thom as well as the experiment of Aby Warburg's Atlas Mnemosyne. The theories discussed in the book are tested on a variety of corpora for analysis, including both paintings and photographs, taken from traditional as well as contemporary sources in a variety of social sectors (arts and sciences). Finally, semiotic methodology is contrasted with the computational analysis of large collections of images (Big Data), such as the "Media Visualization" analyses proposed by Lev Manovich and Cultural Analytics in the field of Computer Science to evaluate the impact of automatic analysis of visual forms on Digital Art History and more generally on the image sciences.
This book presents the concept of the double hierarchy linguistic term set and its extensions, which can deal with dynamic and complex decision-making problems. With the rapid development of science and technology and the acceleration of information updating, the complexity of decision-making problems has become increasingly obvious. This book provides a comprehensive and systematic introduction to the latest research in the field, including measurement methods, consistency methods, group consensus and large-scale group consensus decision-making methods, as well as their practical applications. Intended for engineers, technicians, and researchers in the fields of computer linguistics, operations research, information science, management science and engineering, it also serves as a textbook for postgraduate and senior undergraduate university students.
At present, Web 2.0 technologies are making traditional research genres evolve and form complex genre assemblage with other genres online. This book takes the perspective of genre analysis to provide a timely examination of professional and public communication of science. It gives an updated overview on the increasing diversification of genres for communicating scientific research today by reviewing relevant theories that contribute an understanding of genre evolution and innovation in Web 2.0. The book also offers a much-needed critical enquiry into the dynamics of languages for academic and research communication and reflects on current language-related issues such as academic Englishes, ELF lects, translanguaging, polylanguaging and the multilingualisation of science. Additionally, it complements the critical reflections with data from small-scale specialised corpora and exploratory survey research. The book also includes pedagogical orientations for teaching/training researchers in the STEMM disciplines and proposes several avenues for future enquiry into research genres across languages.
Weighted finite-state transducers (WFSTs) are commonly used by engineers and computational linguists for processing and generating speech and text. This book first provides a detailed introduction to this formalism. It then introduces Pynini, a Python library for compiling finite-state grammars and for combining, optimizing, applying, and searching finite-state transducers. This book illustrates this library's conventions and use with a series of case studies. These include the compilation and application of context-dependent rewrite rules, the construction of morphological analyzers and generators, and text generation and processing applications.
From tech giants to plucky startups, the world is full of companies boasting that they are on their way to replacing human interpreters, but are they right? Interpreters vs Machines offers a solid introduction to recent theory and research on human and machine interpreting, and then invites the reader to explore the future of interpreting. With a foreword by Dr Henry Liu, the 13th International Federation of Translators (FIT) President, and written by consultant interpreter and researcher Jonathan Downie, this book offers a unique combination of research and practical insight into the field of interpreting. Written in an innovative, accessible style with humorous touches and real-life case studies, this book is structured around the metaphor of playing and winning a computer game. It takes interpreters of all experience levels on a journey to better understand their own work, learn how computers attempt to interpret and explore possible futures for human interpreters. With five levels and split into 14 chapters, Interpreters vs Machines is key reading for all professional interpreters as well as students and researchers of Interpreting and Translation Studies, and those with an interest in machine interpreting.
Now in its second edition, Text Analysis with R provides a practical introduction to computational text analysis using the open source programming language R. R is an extremely popular programming language, used throughout the sciences; due to its accessibility, R is now used increasingly in other research areas. In this volume, readers immediately begin working with text, and each chapter examines a new technique or process, allowing readers to obtain a broad exposure to core R procedures and a fundamental understanding of the possibilities of computational text analysis at both the micro and the macro scale. Each chapter builds on its predecessor as readers move from small scale "microanalysis" of single texts to large scale "macroanalysis" of text corpora, and each concludes with a set of practice exercises that reinforce and expand upon the chapter lessons. The book's focus is on making the technical palatable and making the technical useful and immediately gratifying. Text Analysis with R is written with students and scholars of literature in mind but will be applicable to other humanists and social scientists wishing to extend their methodological toolkit to include quantitative and computational approaches to the study of text. Computation provides access to information in text that readers simply cannot gather using traditional qualitative methods of close reading and human synthesis. This new edition features two new chapters: one that introduces dplyr and tidyr in the context of parsing and analyzing dramatic texts to extract speaker and receiver data, and one on sentiment analysis using the syuzhet package. It is also filled with updated material in every chapter to integrate new developments in the field, current practices in R style, and the use of more efficient algorithms.
This book builds on decades of research and provides contemporary theoretical foundations for practical applications to intelligent technologies and advances in artificial intelligence (AI). Reflecting the growing realization that computational models of human reasoning and interactions can be improved by integrating heterogeneous information resources and AI techniques, its ultimate goal is to promote integrated computational approaches to intelligent computerized systems. The book covers a range of interrelated topics, in particular, computational reasoning, language, syntax, semantics, memory, and context information. The respective chapters use and develop logically oriented methods and techniques, and the topics selected are from those areas of logic that contribute to AI and provide its mathematical foundations. The intended readership includes researchers working in the areas of traditional logical foundations, and on new approaches to intelligent computational systems. |
You may like...
Trends in E-Tools and Resources for…
Gloria Corpas Pastor, Isabel Duran Munoz
Hardcover
R3,025
Discovery Miles 30 250
The Natural Language for Artificial…
Dioneia Motta Monte-Serrat, Carlo Cattani
Paperback
R2,767
Discovery Miles 27 670
Foundation Models for Natural Language…
Gerhard PaaĆ, Sven Giesselbach
Hardcover
R884
Discovery Miles 8 840
From Data to Evidence in English…
Carla Suhr, Terttu Nevalainen, …
Hardcover
R3,929
Discovery Miles 39 290
Artificial Intelligence for Healthcare…
Boris Galitsky, Saveli Goldberg
Paperback
R2,991
Discovery Miles 29 910
|