|
Showing 1 - 6 of
6 matches in All Departments
This book on statistical disclosure control presents the theory,
applications and software implementation of the traditional
approach to (micro)data anonymization, including data perturbation
methods, disclosure risk, data utility, information loss and
methods for simulating synthetic data. Introducing readers to the R
packages sdcMicro and simPop, the book also features numerous
examples and exercises with solutions, as well as case studies with
real-world data, accompanied by the underlying R code to allow
readers to reproduce all results. The demand for and volume of data
from surveys, registers or other sources containing sensible
information on persons or enterprises have increased significantly
over the last several years. At the same time, privacy protection
principles and regulations have imposed restrictions on the access
and use of individual data. Proper and secure microdata
dissemination calls for the application of statistical disclosure
control methods to the da ta before release. This book is intended
for practitioners at statistical agencies and other national and
international organizations that deal with confidential data. It
will also be interesting for researchers working in statistical
disclosure control and the health sciences.
This book on statistical disclosure control presents the theory,
applications and software implementation of the traditional
approach to (micro)data anonymization, including data perturbation
methods, disclosure risk, data utility, information loss and
methods for simulating synthetic data. Introducing readers to the R
packages sdcMicro and simPop, the book also features numerous
examples and exercises with solutions, as well as case studies with
real-world data, accompanied by the underlying R code to allow
readers to reproduce all results. The demand for and volume of data
from surveys, registers or other sources containing sensible
information on persons or enterprises have increased significantly
over the last several years. At the same time, privacy protection
principles and regulations have imposed restrictions on the access
and use of individual data. Proper and secure microdata
dissemination calls for the application of statistical disclosure
control methods to the da ta before release. This book is intended
for practitioners at statistical agencies and other national and
international organizations that deal with confidential data. It
will also be interesting for researchers working in statistical
disclosure control and the health sciences.
This book explores visualization and imputation techniques for
missing values and presents practical applications using the
statistical software R. It explains the concepts of common
imputation methods with a focus on visualization, description of
data problems and practical solutions using R, including modern
methods of robust imputation, imputation based on deep learning and
imputation for complex data. By describing the advantages,
disadvantages and pitfalls of each method, the book presents a
clear picture of which imputation methods are applicable given a
specific data set at hand. The material covered includes the
pre-analysis of data, visualization of missing values in incomplete
data, single and multiple imputation, deductive imputation and
outlier replacement, model-based methods including methods based on
robust estimates, non-linear methods such as tree-based and deep
learning methods, imputation of compositional data, imputation
quality evaluation from visual diagnostics to precision measures,
coverage rates and prediction performance and a description of
different model- and design-based simulation designs for the
evaluation. The book also features a topic-focused introduction to
R and R code is provided in each chapter to explain the practical
application of the described methodology. Addressed to researchers,
practitioners and students who work with incomplete data, the book
offers an introduction to the subject as well as a discussion of
recent developments in the field. It is suitable for beginners to
the topic and advanced readers alike.
This book presents the statistical analysis of compositional data
using the log-ratio approach. It includes a wide range of classical
and robust statistical methods adapted for compositional data
analysis, such as supervised and unsupervised methods like PCA,
correlation analysis, classification and regression. In addition,
it considers special data structures like high-dimensional
compositions and compositional tables. The methodology introduced
is also frequently compared to methods which ignore the specific
nature of compositional data. It focuses on practical aspects of
compositional data analysis rather than on detailed theoretical
derivations, thus issues like graphical visualization and
preprocessing (treatment of missing values, zeros, outliers and
similar artifacts) form an important part of the book. Since it is
primarily intended for researchers and students from applied fields
like geochemistry, chemometrics, biology and natural sciences,
economics, and social sciences, all the proposed methods are
accompanied by worked-out examples in R using the package
robCompositions.
Harness actionable insights from your data with computational
statistics and simulations using R About This Book * Learn five
different simulation techniques (Monte Carlo, Discrete Event
Simulation, System Dynamics, Agent-Based Modeling, and Resampling)
in-depth using real-world case studies * A unique book that teaches
you the essential and fundamental concepts in statistical modeling
and simulation Who This Book Is For This book is for users who are
familiar with computational methods. If you want to learn about the
advanced features of R, including the computer-intense Monte-Carlo
methods as well as computational tools for statistical simulation,
then this book is for you. Good knowledge of R programming is
assumed/required. What You Will Learn * The book aims to explore
advanced R features to simulate data to extract insights from your
data. * Get to know the advanced features of R including
high-performance computing and advanced data manipulation * See
random number simulation used to simulate distributions, data sets,
and populations * Simulate close-to-reality populations as the
basis for agent-based micro-, model- and design-based simulations *
Applications to design statistical solutions with R for solving
scientific and real world problems * Comprehensive coverage of
several R statistical packages like boot, simPop, VIM, data.table,
dplyr, parallel, StatDA, simecol, simecolModels, deSolve and many
more. In Detail Data Science with R aims to teach you how to begin
performing data science tasks by taking advantage of Rs powerful
ecosystem of packages. R being the most widely used programming
language when used with data science can be a powerful combination
to solve complexities involved with varied data sets in the real
world. The book will provide a computational and methodological
framework for statistical simulation to the users. Through this
book, you will get in grips with the software environment R. After
getting to know the background of popular methods in the area of
computational statistics, you will see some applications in R to
better understand the methods as well as gaining experience of
working with real-world data and real-world problems. This book
helps uncover the large-scale patterns in complex systems where
interdependencies and variation are critical. An effective
simulation is driven by data generating processes that accurately
reflect real physical populations. You will learn how to plan and
structure a simulation project to aid in the decision-making
process as well as the presentation of results. By the end of this
book, you reader will get in touch with the software environment R.
After getting background on popular methods in the area, you will
see applications in R to better understand the methods as well as
to gain experience when working on real-world data and real-world
problems. Style and approach This book takes a practical, hands-on
approach to explain the statistical computing methods, gives advice
on the usage of these methods, and provides computational tools to
help you solve common problems in statistical simulation and
computer-intense methods.
The aim of statistical disclosure control is to keep up the
required statistical privacy while making data available to the
researchers. This can be achieved with the help of minimal
modifications of the data without changing the multivariate data
structure. In this book the well-developed R package sdc- Micro is
introduced. With the help of this package it is possible to keep
microdata confidential in a very effective way. The concept is
thoroughly explained and its application is demonstrated using
real-world data. In addition to that, the robustification of
disclosure methods is described. Many SDCmethods for microdata
developed so far can be influenced by outliers to a great extent
resulting in a high loss of information of the perturbed data.
Missing values are the second topic of this book. The application
of visualisation tools for the analysis of missing values,
preceding the choice of an imputation method, is highlighted. In
addition to that, new methods for the imputation of composition
data are introduced. Due to the linear dependence of the variables
from compositional data, reasonalbe imputations can be made by
considering the special nature of such data.
|
You may like...
Loot
Nadine Gordimer
Paperback
(2)
R398
R330
Discovery Miles 3 300
Southpaw
Jake Gyllenhaal, Forest Whitaker, …
DVD
R99
R24
Discovery Miles 240
|