Data quality is one of the most important problems in data
management. A database system typically aims to support the
creation, maintenance, and use of large amount of data, focusing on
the quantity of data. However, real-life data are often dirty:
inconsistent, duplicated, inaccurate, incomplete, or stale. Dirty
data in a database routinely generate misleading or biased
analytical results and decisions, and lead to loss of revenues,
credibility and customers. With this comes the need for data
quality management. In contrast to traditional data management
tasks, data quality management enables the detection and correction
of errors in the data, syntactic or semantic, in order to improve
the quality of the data and hence, add value to business processes.
While data quality has been a longstanding problem for decades, the
prevalent use of the Web has increased the risks, on an
unprecedented scale, of creating and propagating dirty data. This
monograph gives an overview of fundamental issues underlying
central aspects of data quality, namely, data consistency, data
deduplication, data accuracy, data currency, and information
completeness. We promote a uniform logical framework for dealing
with these issues, based on data quality rules. The text is
organized into seven chapters, focusing on relational data. Chapter
One introduces data quality issues. A conditional dependency theory
is developed in Chapter Two, for capturing data inconsistencies. It
is followed by practical techniques in Chapter 2b for discovering
conditional dependencies, and for detecting inconsistencies and
repairing data based on conditional dependencies. Matching
dependencies are introduced in Chapter Three, as matching rules for
data deduplication. A theory of relative information completeness
is studied in Chapter Four, revising the classical Closed World
Assumption and the Open World Assumption, to characterize
incomplete information in the real world. A data currency model is
presented in Chapter Five, to identify the current values of
entities in a database and to answer queries with the current
values, in the absence of reliable timestamps. Finally,
interactions between these data quality issues are explored in
Chapter Six. Important theoretical results and practical algorithms
are covered, but formal proofs are omitted. The bibliographical
notes contain pointers to papers in which the results were
presented and proven, as well as references to materials for
further reading. This text is intended for a seminar course at the
graduate level. It is also to serve as a useful resource for
researchers and practitioners who are interested in the study of
data quality. The fundamental research on data quality draws on
several areas, including mathematical logic, computational
complexity and database theory. It has raised as many questions as
it has answered, and is a rich source of questions and vitality.
Table of Contents: Data Quality: An Overview / Conditional
Dependencies / Cleaning Data with Conditional Dependencies / Data
Deduplication / Information Completeness / Data Currency /
Interactions between Data Quality Issues
General
Is the information for this product incomplete, wrong or inappropriate?
Let us know about it.
Does this product have an incorrect or missing image?
Send us a new image.
Is this product missing categories?
Add more categories.
Review This Product
No reviews yet - be the first to create one!