![]() |
Welcome to Loot.co.za!
Sign in / Register |Wishlists & Gift Vouchers |Help | Advanced search
|
Your cart is empty |
||
|
Books > Computing & IT > Applications of computing > Databases > Data capture & analysis
Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you will learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available in the framework of the data engineering lifecycle. Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You will understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, governance, and deployment that are critical in any data environment regardless of the underlying technology. This book will help you: Assess data engineering problems using an end-to-end data framework of best practices Cut through marketing hype when choosing data technologies, architecture, and processes Use the data engineering lifecycle to design and build a robust architecture Incorporate data governance and security across the data engineering lifecycle
A unique, integrated approach to exploratory data mining and data quality Data analysts at information-intensive businesses are frequently asked to analyze new data sets that are often dirty–composed of numerous tables possessing unknown properties. Prior to analysis, this data must be cleaned and explored–often a long and arduous task. Ensuring data quality is a notoriously messy problem that can only be addressed by drawing on methods from many disciplines, including statistics, exploratory data mining, database management, and metadata coding. Where other books on data mining and analysis focus primarily on the last stage of the analysis procedure, Exploratory Data Mining and Data Cleaning uses a uniquely integrated approach to data exploration and data cleaning to develop a suitable modeling strategy that will help analysts to more effectively determine and implement the final technique. The authors, both seasoned data analysts at a major corporation, draw on their own professional experience to:
A groundbreaking addition to the existing literature, Exploratory Data Mining and Data Cleaning serves as an important reference for data analysts who need to analyze large amounts of unfamiliar data, operations managers, and students in undergraduate or graduate-level courses dealing with data analysis and data mining.
Measuring the abundance of individuals and the diversity of species are core components of most ecological research projects and conservation monitoring. This book brings together in one place, for the first time, the methods used to estimate the abundance of individuals in nature. The statistical basis of each method is detailed along with practical considerations for survey design and data collection. Methods are illustrated using data ranging from Alaskan shrubs to Yellowstone grizzly bears, not forgetting Costa Rican ants and Prince Edward Island lobsters. Where necessary, example code for use with the open source software R is supplied. When appropriate, reference is made to other widely used programs. After opening with a brief synopsis of relevant statistical methods, the first section deals with the abundance of stationary items such as trees, shrubs, coral, etc. Following a discussion of the use of quadrats and transects in the contexts of forestry sampling and the assessment of plant cover, there are chapters addressing line-intercept sampling, the use of nearest-neighbour distances, and variable sized plots. The second section deals with individuals that move, such as birds, mammals, reptiles, fish, etc. Approaches discussed include double-observer sampling, removal sampling, capture-recapture methods and distance sampling. The final section deals with the measurement of species richness; species diversity; species-abundance distributions; and other aspects of diversity such as evenness, similarity, turnover and rarity. This is an essential reference for anyone involved in advanced undergraduate or postgraduate ecological research and teaching, or those planning and carrying out data analysis as part of conservation survey and monitoring programmes.
This new edition covers some of the key topics relating to the latest version of MS Office through Excel 2019, including the creation of custom ribbons by injecting XML code into Excel Workbooks and how to link Excel VBA macros to customize ribbon objects. It now also provides examples in using ADO, DAO, and SQL queries to retrieve data from databases for analysis. Operations such as fully automated linear and non-linear curve fitting, linear and non-linear mapping, charting, plotting, sorting, and filtering of data have been updated to leverage the newest Excel VBA object models. The text provides examples on automated data analysis and the preparation of custom reports suitable for legal archiving and dissemination. Functionality Demonstrated in This Edition Includes: Find and extract information raw data files Format data in color (conditional formatting) Perform non-linear and linear regressions on data Create custom functions for specific applications Generate datasets for regressions and functions Create custom reports for regulatory agencies Leverage email to send generated reports Return data to Excel using ADO, DAO, and SQL queries Create database files for processed data Create tables, records, and fields in databases Add data to databases in fields or records Leverage external computational engines Call functions in MATLAB (R) and Origin (R) from Excel
Measuring the abundance of individuals and the diversity of species are core components of most ecological research projects and conservation monitoring. This book brings together in one place, for the first time, the methods used to estimate the abundance of individuals in nature. The statistical basis of each method is detailed along with practical considerations for survey design and data collection. Methods are illustrated using data ranging from Alaskan shrubs to Yellowstone grizzly bears, not forgetting Costa Rican ants and Prince Edward Island lobsters. Where necessary, example code for use with the open source software R is supplied. When appropriate, reference is made to other widely used programs. After opening with a brief synopsis of relevant statistical methods, the first section deals with the abundance of stationary items such as trees, shrubs, coral, etc. Following a discussion of the use of quadrats and transects in the contexts of forestry sampling and the assessment of plant cover, there are chapters addressing line-intercept sampling, the use of nearest-neighbour distances, and variable sized plots. The second section deals with individuals that move, such as birds, mammals, reptiles, fish, etc. Approaches discussed include double-observer sampling, removal sampling, capture-recapture methods and distance sampling. The final section deals with the measurement of species richness; species diversity; species-abundance distributions; and other aspects of diversity such as evenness, similarity, turnover and rarity. This is an essential reference for anyone involved in advanced undergraduate or postgraduate ecological research and teaching, or those planning and carrying out data analysis as part of conservation survey and monitoring programmes.
Comprehensive Coverage of the Entire Area of ClassificationResearch on the problem of classification tends to be fragmented across such areas as pattern recognition, database, data mining, and machine learning. Addressing the work of these different communities in a unified way, Data Classification: Algorithms and Applications explores the underlying algorithms of classification as well as applications of classification in a variety of problem domains, including text, multimedia, social network, and biological data. This comprehensive book focuses on three primary aspects of data classification: Methods: The book first describes common techniques used for classification, including probabilistic methods, decision trees, rule-based methods, instance-based methods, support vector machine methods, and neural networks. Domains: The book then examines specific methods used for data domains such as multimedia, text, time-series, network, discrete sequence, and uncertain data. It also covers large data sets and data streams due to the recent importance of the big data paradigm. Variations: The book concludes with insight on variations of the classification process. It discusses ensembles, rare-class learning, distance function learning, active learning, visual learning, transfer learning, and semi-supervised learning as well as evaluation aspects of classifiers.
Large data sets arriving at every increasing speeds require a new set of efficient data analysis techniques. Data analytics are becoming an essential component for every organization and technologies such as health care, financial trading, Internet of Things, Smart Cities or Cyber Physical Systems. However, these diverse application domains give rise to new research challenges. In this context, the book provides a broad picture on the concepts, techniques, applications, and open research directions in this area. In addition, it serves as a single source of reference for acquiring the knowledge on emerging Big Data Analytics technologies.
This book focuses on computer intensive statistical methods, such as validation, model selection, and bootstrap, that help overcome obstacles that could not be previously solved by methods such as regression and time series modelling in the areas of economics, meteorology, and transportation.
Keyword Spotting (KWS) has been proposed as a flexible and more error-tolerant alternative to full transcriptions. In most cases, it allows to retrieve arbitrary query words in handwritten historical document.This comprehensive compendium gives a self-contained preamble and visually attractive description to the field of graph-based KWS. The volume highlights a profound insight into each step of the whole KWS pipeline, viz. image preprocessing, graph representation and graph matching.Written by two world-renowned co-authors, this unique title combines two very current research fields of graph-based pattern recognition and document analysis. The book serves as an attractive teaching material for graduate students, as well as a useful reference text for professionals, academics and researchers.
This is a book about how ecologists can integrate remote sensing and GIS in their research. It will allow readers to get started with the application of remote sensing and to understand its potential and limitations. Using practical examples, the book covers all necessary steps from planning field campaigns to deriving ecologically relevant information through remote sensing and modelling of species distributions. An Introduction to Spatial Data Analysis introduces spatial data handling using the open source software Quantum GIS (QGIS). In addition, readers will be guided through their first steps in the R programming language. The authors explain the fundamentals of spatial data handling and analysis, empowering the reader to turn data acquired in the field into actual spatial data. Readers will learn to process and analyse spatial data of different types and interpret the data and results. After finishing this book, readers will be able to address questions such as "What is the distance to the border of the protected area?", "Which points are located close to a road?", "Which fraction of land cover types exist in my study area?" using different software and techniques. This book is for novice spatial data users and does not assume any prior knowledge of spatial data itself or practical experience working with such data sets. Readers will likely include student and professional ecologists, geographers and any environmental scientists or practitioners who need to collect, visualize and analyse spatial data. The software used is the widely applied open source scientific programs QGIS and R. All scripts and data sets used in the book will be provided online at book.ecosens.org. This book covers specific methods including: what to consider before collecting in situ data how to work with spatial data collected in situ the difference between raster and vector data how to acquire further vector and raster data how to create relevant environmental information how to combine and analyse in situ and remote sensing data how to create useful maps for field work and presentations how to use QGIS and R for spatial analysis how to develop analysis scripts
Rough Set Theory, introduced by Pawlak in the early 1980s, has
become an important part of soft computing within the last 25
years. However, much of the focus has been on the theoretical
understanding of Rough Sets, with a survey of Rough Sets and their
applications within business and industry much desired. "Rough
Sets: Selected Methods and Applications in Management and
Engineering" provides context to Rough Set theory, with each
chapter exploring a real-world application of Rough Sets. "Rough Sets" is relevant to managers striving to improve their
businesses, industry researchers looking to improve the efficiency
of their solutions, and university researchers wanting to apply
Rough Sets to real-world problems.
Data is constantly increasing and data analysts are in higher demand than ever. This book is an essential guide to the role of data analyst. Aspiring data analysts will discover what data analysts do all day, what skills they will need for the role, and what regulations they will be required to adhere to. Practising data analysts can explore useful data analysis tools, methods and techniques, brush up on best practices and look at how they can advance their career.
Online social networking sites like Facebook, LinkedIn, and Twitter, offer millions of members the opportunity to befriend one another, send messages to each other, and post content on the site - actions which generate mind-boggling amounts of data every day.To make sense of the massive data from these sites, we resort to social media mining to answer questions like the following:
Data and its technologies now play a large and growing role in humanities research and teaching. This book addresses the needs of humanities scholars who seek deeper expertise in the area of data modeling and representation. The authors, all experts in digital humanities, offer a clear explanation of key technical principles, a grounded discussion of case studies, and an exploration of important theoretical concerns. The book opens with an orientation, giving the reader a history of data modeling in the humanities and a grounding in the technical concepts necessary to understand and engage with the second part of the book. The second part of the book is a wide-ranging exploration of topics central for a deeper understanding of data modeling in digital humanities. Chapters cover data modeling standards and the role they play in shaping digital humanities practice, traditional forms of modeling in the humanities and how they have been transformed by digital approaches, ontologies which seek to anchor meaning in digital humanities resources, and how data models inhabit the other analytical tools used in digital humanities research. It concludes with a glossary chapter that explains specific terms and concepts for data modeling in the digital humanities context. This book is a unique and invaluable resource for teaching and practising data modeling in a digital humanities context.
This book presents an accessible introduction to data-driven storytelling. Resulting from unique discussions between data visualization researchers and data journalists, it offers an integrated definition of the topic, presents vivid examples and patterns for data storytelling, and calls out key challenges and new opportunities for researchers and practitioners.
A comprehensive compilation of new developments in data linkage methodology The increasing availability of large administrative databases has led to a dramatic rise in the use of data linkage, yet the standard texts on linkage are still those which describe the seminal work from the 1950-60s, with some updates. Linkage and analysis of data across sources remains problematic due to lack of discriminatory and accurate identifiers, missing data and regulatory issues. Recent developments in data linkage methodology have concentrated on bias and analysis of linked data, novel approaches to organising relationships between databases and privacy-preserving linkage. Methodological Developments in Data Linkage brings together a collection of contributions from members of the international data linkage community, covering cutting edge methodology in this field. It presents opportunities and challenges provided by linkage of large and often complex datasets, including analysis problems, legal and security aspects, models for data access and the development of novel research areas. New methods for handling uncertainty in analysis of linked data, solutions for anonymised linkage and alternative models for data collection are also discussed. Key Features : Presents cutting edge methods for a topic of increasing importance to a wide range of research areas, with applications to data linkage systems internationally Covers the essential issues associated with data linkage today Includes examples based on real data linkage systems, highlighting the opportunities, successes and challenges that the increasing availability of linkage data provides Novel approach incorporates technical aspects of both linkage, management and analysis of linked data This book will be of core interest to academics, government employees, data holders, data managers, analysts and statisticians who use administrative data. It will also appeal to researchers in a variety of areas, including epidemiology, biostatistics, social statistics, informatics, policy and public health.
S-PLUS is a powerful environment for the statistical and graphical analysis of data. It provides the tools to implement many statistical ideas which have been made possible by the widespread availability of workstations having good graphics and computational capabilities. This book is a guide to using S-PLUS to perform statistical analyses and provides both an introduction to the use of S-PLUS and a course in modern statistical methods. S-PLUS is available for both Windows and UNIX workstations, and both versions are covered in depth. The aim of the book is to show how to use S-PLUS as a powerful and graphical data analysis system. Readers are assumed to have a basic grounding in statistics, and so the book in intended for would-be users of S-PLUS and both students and researchers using statistics. Throughout, the emphasis is on presenting practical problems and full analyses of real data sets. Many of the methods discussed are state-of-the-art approaches to topics such as linear, nonlinear, and smooth regression models, tree-based methods, multivariate analysis and pattern recognition, survival analysis, time series and spatial statistics. Throughout, modern techniques such as robust methods, non-parametric smoothing, and bootstrapping are used where appropriate. This third edition is intended for users of S-PLUS 4.5, 5.0, 2000 or later, although S-PLUS 3.3/4 are also considered. The major change from the second edition is coverage of the current versions of S-PLUS. The material has been extensively rewritten using new examples and the latest computationally intensive methods. The companion volume on S Programming will provide an in-depth guide for those writing software in the S language. The authors have written several software libraries that enhance S-PLUS; these and all the datasets used are available on the Internet in versions for Windows and UNIX. There are extensive on-line complements covering advanced material, user-contributed extensions, further exercises, and new features of S-PLUS as they are introduced. Dr. Venables is now Statistician with CSRIO in Queensland, having been at the Department of Statistics, University of Adelaide, for many years previously. He has given many short courses on S-PLUS in Australia, Europe, and the USA. Professor Ripley holds the Chair of Applied Statistics at the University of Oxford, and is the author of four other books on spatial statistics, simulation, pattern recognition, and neural networks.
This is an introductory textbook in data analysis and statistics, designed for students in the first year of a social sciences degree. The key concepts of data analysis are explained in a clear and straightforward manner, avoiding unnecessary jargon. Very little mathematical knowledge is assumed on the part of the reader, and a variety of examples are used to illustrate the ideas and techniques. One of the central aims of the text is to ensure that students understand the basic principles of exploratory data analysis before they become involved in manipulating large quantities of data. Hence the author generally uses small data sets to introduce the key concepts, and gradually moves towards the handling of larger data sets. The text uses the Minitab computer analysis package, which is one of the most popular statistical packages available in institutions of higher and further education. Students are told how to use the package before they are introduced to the ideas of methods of data analysis. This clear, jargon-free textbook will be an invaluable aid for students beginning courses in the social sciences and related fields, such as social policy, business studies and health care.
It is universally accepted today that parallel processing is here to stay but that software for parallel machines is still difficult to develop. However, there is little recognition of the fact that changes in processor architecture can significantly ease the development of software. In the seventies the availability of processors that could address a large name space directly, eliminated the problem of name management at one level and paved the way for the routine development of large programs. Similarly, today, processor architectures that can facilitate cheap synchronization and provide a global address space can simplify compiler development for parallel machines. If the cost of synchronization remains high, the pro gramming of parallel machines will remain significantly less abstract than programming sequential machines. In this monograph Bob Iannucci presents the design and analysis of an architecture that can be a better building block for parallel machines than any von Neumann processor. There is another very interesting motivation behind this work. It is rooted in the long and venerable history of dataflow graphs as a formalism for ex pressing parallel computation. The field has bloomed since 1974, when Dennis and Misunas proposed a truly novel architecture using dataflow graphs as the parallel machine language. The novelty and elegance of dataflow architectures has, however, also kept us from asking the real question: "What can dataflow architectures buy us that von Neumann ar chitectures can't?" In the following I explain in a round about way how Bob and I arrived at this question."
This book provides a comprehensive introduction on opinion analysis for online reviews. It offers the newest research on opinion mining, including theories, algorithms and datasets. A new feature presentation method is highlighted for sentiment classification. Then, a three-phase framework for sentiment classification is proposed, where a set of sentiment classifiers are selected automatically to make predictions. Such predictions are integrated via ensemble learning. Finally, to solve the problem of combination explosion encountered, a greedy algorithm is devised to select the base classifiers.
This book provides a comprehensive introduction on opinion analysis for online reviews. It offers the newest research on opinion mining, including theories, algorithms and datasets. A new feature presentation method is highlighted for sentiment classification. Then, a three-phase framework for sentiment classification is proposed, where a set of sentiment classifiers are selected automatically to make predictions. Such predictions are integrated via ensemble learning. Finally, to solve the problem of combination explosion encountered, a greedy algorithm is devised to select the base classifiers.
Optimization techniques are at the core of data science, including data analysis and machine learning. An understanding of basic optimization techniques and their fundamental properties provides important grounding for students, researchers, and practitioners in these areas. This text covers the fundamentals of optimization algorithms in a compact, self-contained way, focusing on the techniques most relevant to data science. An introductory chapter demonstrates that many standard problems in data science can be formulated as optimization problems. Next, many fundamental methods in optimization are described and analyzed, including: gradient and accelerated gradient methods for unconstrained optimization of smooth (especially convex) functions; the stochastic gradient method, a workhorse algorithm in machine learning; the coordinate descent approach; several key algorithms for constrained optimization problems; algorithms for minimizing nonsmooth functions arising in data science; foundations of the analysis of nonsmooth functions and optimization duality; and the back-propagation approach, relevant to neural networks.
First Published in 2004. Learning how to analyze qualitative data by computer can be fun. That is one assumption underpinning this introduction to qualitative analysis, which takes account of how computing techniques have enhanced and transformed the field. The author provides a practical discussion of the main procedures for analyzing qualitative data by computer, with most of its examples taken from humour or everyday life. He examines ways in which computers can contribute to greater rigour and creativity, as well as greater efficiency in analysis. He discusses some of the pitfalls and paradoxes as well as the practicalities of computer-based qualitative analysis. The perspective of "Qualitative Data Analysis" is pragmatic rather than prescriptive, introducing different possibilities without advocating one particular approach. The result is a largely discipline-neutral text, which is suitable for arts and social science students and first-time qualitative analysts.
Though the exact nature and delineation of Big Data is still unclear, it seems likely that Big Data will have an enormous impact on our daily lives. Exploring the Bounderies of Big Data serves as preparatory work for The Netherlands Scientific Council for Government Policy's advice to the Dutch government, which has asked the Council to address questions regarding Big Data, security and privacy. It is divided into five parts, each part engaging with a different perspective on Big Data: the technical, empirical, legal, regulatory and international perspective.
There is increasing pressure to protect computer networks against unauthorized intrusion, and some work in this area is concerned with engineering systems that are robust to attack. However, no system can be made invulnerable. Data Analysis for Network Cyber-Security focuses on monitoring and analyzing network traffic data, with the intention of preventing, or quickly identifying, malicious activity. Such work involves the intersection of statistics, data mining and computer science. Fundamentally, network traffic is relational, embodying a link between devices. As such, graph analysis approaches are a natural candidate. However, such methods do not scale well to the demands of real problems, and the critical aspect of the timing of communications events is not accounted for in these approaches. This book gathers papers from leading researchers to provide both background to the problems and a description of cutting-edge methodology. The contributors are from diverse institutions and areas of expertise and were brought together at a workshop held at the University of Bristol in March 2013 to address the issues of network cyber security.The workshop was supported by the Heilbronn Institute for Mathematical Research. |
You may like...
Data Analytics for Social Microblogging…
Soumi Dutta, Asit Kumar Das, …
Paperback
R3,335
Discovery Miles 33 350
Machine Learning and Data Analytics for…
Manikant Roy, Lovi Raj Gupta
Hardcover
R10,591
Discovery Miles 105 910
Challenges and Applications of Data…
V. Sathiyamoorthi, Atilla Elci
Hardcover
R6,734
Discovery Miles 67 340
Cognitive and Soft Computing Techniques…
Akash Kumar Bhoi, Victor Hugo Costa de Albuquerque, …
Paperback
R2,583
Discovery Miles 25 830
|