![]() |
Welcome to Loot.co.za!
Sign in / Register |Wishlists & Gift Vouchers |Help | Advanced search
|
Your cart is empty |
||
|
Books > Computing & IT > Applications of computing > Databases > Data capture & analysis
A comprehensive compilation of new developments in data linkage methodology The increasing availability of large administrative databases has led to a dramatic rise in the use of data linkage, yet the standard texts on linkage are still those which describe the seminal work from the 1950-60s, with some updates. Linkage and analysis of data across sources remains problematic due to lack of discriminatory and accurate identifiers, missing data and regulatory issues. Recent developments in data linkage methodology have concentrated on bias and analysis of linked data, novel approaches to organising relationships between databases and privacy-preserving linkage. Methodological Developments in Data Linkage brings together a collection of contributions from members of the international data linkage community, covering cutting edge methodology in this field. It presents opportunities and challenges provided by linkage of large and often complex datasets, including analysis problems, legal and security aspects, models for data access and the development of novel research areas. New methods for handling uncertainty in analysis of linked data, solutions for anonymised linkage and alternative models for data collection are also discussed. Key Features : Presents cutting edge methods for a topic of increasing importance to a wide range of research areas, with applications to data linkage systems internationally Covers the essential issues associated with data linkage today Includes examples based on real data linkage systems, highlighting the opportunities, successes and challenges that the increasing availability of linkage data provides Novel approach incorporates technical aspects of both linkage, management and analysis of linked data This book will be of core interest to academics, government employees, data holders, data managers, analysts and statisticians who use administrative data. It will also appeal to researchers in a variety of areas, including epidemiology, biostatistics, social statistics, informatics, policy and public health.
S-PLUS is a powerful environment for the statistical and graphical analysis of data. It provides the tools to implement many statistical ideas which have been made possible by the widespread availability of workstations having good graphics and computational capabilities. This book is a guide to using S-PLUS to perform statistical analyses and provides both an introduction to the use of S-PLUS and a course in modern statistical methods. S-PLUS is available for both Windows and UNIX workstations, and both versions are covered in depth. The aim of the book is to show how to use S-PLUS as a powerful and graphical data analysis system. Readers are assumed to have a basic grounding in statistics, and so the book in intended for would-be users of S-PLUS and both students and researchers using statistics. Throughout, the emphasis is on presenting practical problems and full analyses of real data sets. Many of the methods discussed are state-of-the-art approaches to topics such as linear, nonlinear, and smooth regression models, tree-based methods, multivariate analysis and pattern recognition, survival analysis, time series and spatial statistics. Throughout, modern techniques such as robust methods, non-parametric smoothing, and bootstrapping are used where appropriate. This third edition is intended for users of S-PLUS 4.5, 5.0, 2000 or later, although S-PLUS 3.3/4 are also considered. The major change from the second edition is coverage of the current versions of S-PLUS. The material has been extensively rewritten using new examples and the latest computationally intensive methods. The companion volume on S Programming will provide an in-depth guide for those writing software in the S language. The authors have written several software libraries that enhance S-PLUS; these and all the datasets used are available on the Internet in versions for Windows and UNIX. There are extensive on-line complements covering advanced material, user-contributed extensions, further exercises, and new features of S-PLUS as they are introduced. Dr. Venables is now Statistician with CSRIO in Queensland, having been at the Department of Statistics, University of Adelaide, for many years previously. He has given many short courses on S-PLUS in Australia, Europe, and the USA. Professor Ripley holds the Chair of Applied Statistics at the University of Oxford, and is the author of four other books on spatial statistics, simulation, pattern recognition, and neural networks.
It is universally accepted today that parallel processing is here to stay but that software for parallel machines is still difficult to develop. However, there is little recognition of the fact that changes in processor architecture can significantly ease the development of software. In the seventies the availability of processors that could address a large name space directly, eliminated the problem of name management at one level and paved the way for the routine development of large programs. Similarly, today, processor architectures that can facilitate cheap synchronization and provide a global address space can simplify compiler development for parallel machines. If the cost of synchronization remains high, the pro gramming of parallel machines will remain significantly less abstract than programming sequential machines. In this monograph Bob Iannucci presents the design and analysis of an architecture that can be a better building block for parallel machines than any von Neumann processor. There is another very interesting motivation behind this work. It is rooted in the long and venerable history of dataflow graphs as a formalism for ex pressing parallel computation. The field has bloomed since 1974, when Dennis and Misunas proposed a truly novel architecture using dataflow graphs as the parallel machine language. The novelty and elegance of dataflow architectures has, however, also kept us from asking the real question: "What can dataflow architectures buy us that von Neumann ar chitectures can't?" In the following I explain in a round about way how Bob and I arrived at this question."
This book provides a comprehensive introduction on opinion analysis for online reviews. It offers the newest research on opinion mining, including theories, algorithms and datasets. A new feature presentation method is highlighted for sentiment classification. Then, a three-phase framework for sentiment classification is proposed, where a set of sentiment classifiers are selected automatically to make predictions. Such predictions are integrated via ensemble learning. Finally, to solve the problem of combination explosion encountered, a greedy algorithm is devised to select the base classifiers.
This book provides a comprehensive introduction on opinion analysis for online reviews. It offers the newest research on opinion mining, including theories, algorithms and datasets. A new feature presentation method is highlighted for sentiment classification. Then, a three-phase framework for sentiment classification is proposed, where a set of sentiment classifiers are selected automatically to make predictions. Such predictions are integrated via ensemble learning. Finally, to solve the problem of combination explosion encountered, a greedy algorithm is devised to select the base classifiers.
This book provides a first-hand account of business analytics and its implementation, and an account of the brief theoretical framework underpinning each component of business analytics. The themes of the book include (1) learning the contours and boundaries of business analytics which are in scope; (2) understanding the organization design aspects of an analytical organization; (3) providing knowledge on the domain focus of developing business activities for financial impact in functional analysis; and (4) deriving a whole gamut of business use cases in a variety of situations to apply the techniques. The book gives a complete, insightful understanding of developing and implementing analytical solution.
First Published in 2004. Learning how to analyze qualitative data by computer can be fun. That is one assumption underpinning this introduction to qualitative analysis, which takes account of how computing techniques have enhanced and transformed the field. The author provides a practical discussion of the main procedures for analyzing qualitative data by computer, with most of its examples taken from humour or everyday life. He examines ways in which computers can contribute to greater rigour and creativity, as well as greater efficiency in analysis. He discusses some of the pitfalls and paradoxes as well as the practicalities of computer-based qualitative analysis. The perspective of "Qualitative Data Analysis" is pragmatic rather than prescriptive, introducing different possibilities without advocating one particular approach. The result is a largely discipline-neutral text, which is suitable for arts and social science students and first-time qualitative analysts.
Though the exact nature and delineation of Big Data is still unclear, it seems likely that Big Data will have an enormous impact on our daily lives. Exploring the Bounderies of Big Data serves as preparatory work for The Netherlands Scientific Council for Government Policy's advice to the Dutch government, which has asked the Council to address questions regarding Big Data, security and privacy. It is divided into five parts, each part engaging with a different perspective on Big Data: the technical, empirical, legal, regulatory and international perspective.
Optimization techniques are at the core of data science, including data analysis and machine learning. An understanding of basic optimization techniques and their fundamental properties provides important grounding for students, researchers, and practitioners in these areas. This text covers the fundamentals of optimization algorithms in a compact, self-contained way, focusing on the techniques most relevant to data science. An introductory chapter demonstrates that many standard problems in data science can be formulated as optimization problems. Next, many fundamental methods in optimization are described and analyzed, including: gradient and accelerated gradient methods for unconstrained optimization of smooth (especially convex) functions; the stochastic gradient method, a workhorse algorithm in machine learning; the coordinate descent approach; several key algorithms for constrained optimization problems; algorithms for minimizing nonsmooth functions arising in data science; foundations of the analysis of nonsmooth functions and optimization duality; and the back-propagation approach, relevant to neural networks.
This book explains how to perform data de-noising, in large scale, with a satisfactory level of accuracy. Three main issues are considered. Firstly, how to eliminate the error propagation from one stage to next stages while developing a filtered model. Secondly, how to maintain the positional importance of data whilst purifying it. Finally, preservation of memory in the data is crucial to extract smart data from noisy big data. If, after the application of any form of smoothing or filtering, the memory of the corresponding data changes heavily, then the final data may lose some important information. This may lead to wrong or erroneous conclusions. But, when anticipating any loss of information due to smoothing or filtering, one cannot avoid the process of denoising as on the other hand any kind of analysis of big data in the presence of noise can be misleading. So, the entire process demands very careful execution with efficient and smart models in order to effectively deal with it.
The increasing availability of data in our current, information overloaded society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract knowledge from such data. This book provides an accessible introduction to data mining methods in a consistent and application oriented statistical framework, using case studies drawn from real industry projects and highlighting the use of data mining methods in a variety of business applications. Introduces data mining methods and applications.Covers classical and Bayesian multivariate statistical methodology as well as machine learning and computational data mining methods.Includes many recent developments such as association and sequence rules, graphical Markov models, lifetime value modelling, credit risk, operational risk and web mining.Features detailed case studies based on applied projects within industry.Incorporates discussion of data mining software, with case studies analysed using R.Is accessible to anyone with a basic knowledge of statistics or data analysis.Includes an extensive bibliography and pointers to further reading within the text. "Applied Data Mining for Business and Industry, 2nd edition" is aimed at advanced undergraduate and graduate students of data mining, applied statistics, database management, computer science and economics. The case studies will provide guidance to professionals working in industry on projects involving large volumes of data, such as customer relationship management, web design, risk management, marketing, economics and finance.
There is increasing pressure to protect computer networks against unauthorized intrusion, and some work in this area is concerned with engineering systems that are robust to attack. However, no system can be made invulnerable. Data Analysis for Network Cyber-Security focuses on monitoring and analyzing network traffic data, with the intention of preventing, or quickly identifying, malicious activity. Such work involves the intersection of statistics, data mining and computer science. Fundamentally, network traffic is relational, embodying a link between devices. As such, graph analysis approaches are a natural candidate. However, such methods do not scale well to the demands of real problems, and the critical aspect of the timing of communications events is not accounted for in these approaches. This book gathers papers from leading researchers to provide both background to the problems and a description of cutting-edge methodology. The contributors are from diverse institutions and areas of expertise and were brought together at a workshop held at the University of Bristol in March 2013 to address the issues of network cyber security.The workshop was supported by the Heilbronn Institute for Mathematical Research.
Newcomers to quantitative analysis need practical guidance on how to analyze data in the real world yet most introductory books focus on lengthy derivations and justifications instead of practical techniques. Covering the technical and professional skills needed by analysts in the academic, private, and public sectors, Applying Analytics: A Practical Introduction systematically teaches novices how to apply algorithms to real data and how to recognize potential pitfalls. It offers one of the first textbooks for the emerging first course in analytics. The text concentrates on the interpretation, strengths, and weaknesses of analytical techniques, along with challenges encountered by analysts in their daily work. The author shares various lessons learned from applying analytics in the real world. He supplements the technical material with coverage of professional skills traditionally learned through experience, such as project management, analytic communication, and using analysis to inform decisions. Example data sets used in the text are available for download online so that readers can test their own analytic routines. Suitable for beginning analysts in the sciences, business, engineering, and government, this book provides an accessible, example-driven introduction to the emerging field of analytics. It shows how to interpret data and identify trends across a range of fields.
The authors provide an understanding of big data and MapReduce by clearly presenting the basic terminologies and concepts. They have employed over 100 illustrations and many worked-out examples to convey the concepts and methods used in big data, the inner workings of MapReduce, and single node/multi-node installation on physical/virtual machines. This book covers almost all the necessary information on Hadoop MapReduce for most online certification exams. Upon completing this book, readers will find it easy to understand other big data processing tools such as Spark, Storm, etc. Ultimately, readers will be able to: * understand what big data is and the factors that are involved * understand the inner workings of MapReduce, which is essential for certification exams * learn the features and weaknesses of MapReduce * set up Hadoop clusters with 100s of physical/virtual machines * create a virtual machine in AWS * write MapReduce with Eclipse in a simple way * understand other big data processing tools and their applications
Enterprise Resource Planning (ERP), Supply Chain Management (SCM), Customer Relationship Management (CRM), Business Intelligence (BI) and Big Data analytics (BDA) are business related tasks and processes, which are supported by standardized software solutions. The book explains that this requires business-oriented thinking and acting from IT specialists and data scientists. It is a good idea to let students experience this directly from the business perspective, for example as executives of a virtual company in a role-playing game. The second edition of the book has been completely revised, restructured and supplemented with actual topics such as blockchains in supply chains and the correlation between Big Data analytics, artificial intelligence and machine learning. The structure of the book is based on the gradual implementation and integration of the respective information systems from the business and management perspectives. Part I contains chapters with detailed descriptions of the topics supplemented by online tests and exercises. Part II introduces role play and the online gaming and simulation environment. Supplementary teaching material, presentations, templates, and video clips are available online in the gaming area. The gaming and business simulation Kdibisglobal.com, newly created for this book, now includes a beer division, a bottled water division, a soft drink division and a manufacturing division for barcode cash register scanner with their specific business processes and supply chains.
Product information not available.
First book to examine game analysis, modern didactic reflections on learning, and big data in a key topic in science and society today. Provides understanding on how to use game analysis when applied to different sports and how to use the approach for video, event and positional data. Presents translational work that has implications for academics, programmers and applied practitioners.
There is a lack of an exposition on interdisciplinary and innovative methods of data mining and visualization for biodata. This book fills the gap by introducing an interdisciplinary set of the most recent methods and references on novel techniques from artificial intelligence, data mining, engineering, pattern recognition, and ontological data mining fields that are applicable to bioinformatics. The latest novel approaches are explained in detail, their advantages and disadvantages are summarized, and pointers to the future development of new applications are given. By widening the pool from which biologists and bioinformaticians can adopt methods for biodata mining and visualization, computational data mining experts in nonbiological fields are also encouraged to utilize their expertise in order to contribute to the progress of computational biology, thus enhancing the collaboration between these two disciplines.
Regarding the set of all feature attributes in a given database as the universal set, this monograph discusses various nonadditive set functions that describe the interaction among the contributions from feature attributes towards a considered target attribute. Then, the relevant nonlinear integrals are investigated. These integrals can be applied as aggregation tools in information fusion and data mining, such as synthetic evaluation, nonlinear multiregressions, and nonlinear classifications. Some methods of fuzzification are also introduced for nonlinear integrals such that fuzzy data can be treated and fuzzy information is retrievable. The book is suitable as a text for graduate courses in mathematics, computer science, and information science. It is also useful to researchers in the relevant area.
Poor data quality is known to compromise the credibility and efficiency of commercial and public endeavours. Also, the importance of managing data quality has increased manifold as the diversity of sources, formats and volume of data grows. This volume targets the data quality in the light of collaborative information systems where data creation and ownership is increasingly difficult to establish.
Data Warehousing has been around for 20 years and has become part
of the information technology infrastructure. Data warehousing
originally grew in response to the corporate need for
information--not data--and it supplies integrated, granular, and
historical data to the corporation.
The last decade has witnessed the rise of big data in game development as the increasing proliferation of Internet-enabled gaming devices has made it easier than ever before to collect large amounts of player-related data. At the same time, the emergence of new business models and the diversification of the player base have exposed a broader potential audience, which attaches great importance to being able to tailor game experiences to a wide range of preferences and skill levels. This, in turn, has led to a growing interest in data mining techniques, as they offer new opportunities for deriving actionable insights to inform game design, to ensure customer satisfaction, to maximize revenues, and to drive technical innovation. By now, data mining and analytics have become vital components of game development. The amount of work being done in this area nowadays makes this an ideal time to put together a book on this subject. Data Analytics Applications in Gaming and Entertainment seeks to provide a cross section of current data analytics applications in game production. It is intended as a companion for practitioners, academic researchers, and students seeking knowledge on the latest practices in game data mining. The chapters have been chosen in such a way as to cover a wide range of topics and to provide readers with a glimpse at the variety of applications of data mining in gaming. A total of 25 authors from industry and academia have contributed 12 chapters covering topics such as player profiling, approaches for analyzing player communities and their social structures, matchmaking, churn prediction and customer lifetime value estimation, communication of analytical results, and visual approaches to game analytics. This book's perspectives and concepts will spark heightened interest in game analytics and foment innovative ideas that will advance the exciting field of online gaming and entertainment.
The ability of storing, managing and giving access to the huge quantity of data collected by astronomical observatories is one of the major challenges of modern astronomy. At the same time, the growing complexity of data systems implies a change of concepts: the scientist has to manipulate data as well as information. Developments of the "World Wide Web" bring answers to these problems. The book presents a wide selection of databases, archives, data centres and information systems. Descriptions are included, together with their scientific context and motivations. This volume should prove a useful tool for astronomers, librarians, data specialists and computer engineers.
This is the first comprehensive book dedicated entirely to the field of decision trees in data mining and covers all aspects of this important technique.Decision trees have become one of the most powerful and popular approaches in knowledge discovery and data mining, the science and technology of exploring large and complex bodies of data in order to discover useful patterns. The area is of great importance because it enables modeling and knowledge extraction from the abundance of data available. Both theoreticians and practitioners are continually seeking techniques to make the process more efficient, cost-effective and accurate. Decision trees, originally implemented in decision theory and statistics, are highly effective tools in other areas such as data mining, text mining, information extraction, machine learning, and pattern recognition. This book invites readers to explore the many benefits in data mining that decision trees offer:
This book is the culmination of three years of research effort on a multidisciplinary project in which physicists, mathematicians, computer scientists and social scientists worked together to arrive at a unifying picture of complex networks. The contributed chapters form a reference for the various problems in data analysis visualization and modeling of complex networks. |
You may like...
Mining Over Air: Wireless Communication…
Ye Ouyang, Mantian Hu, …
Hardcover
R2,885
Discovery Miles 28 850
Advanced Applications of Computational…
Akshay Kumar, Mangey Ram, …
Hardcover
R3,080
Discovery Miles 30 800
Physical-Layer Security for Cooperative…
Yulong Zou, Jia Zhu
Hardcover
R2,653
Discovery Miles 26 530
Parsing Theory - Volume I Languages and…
Seppo Sippu, Eljas Soisalon-Soininen
Hardcover
R1,434
Discovery Miles 14 340
Introduction to Diagnosis of Active…
Gianfranco Lamperti, Marina Zanella, …
Hardcover
R3,386
Discovery Miles 33 860
|