|
Showing 1 - 1 of
1 matches in All Departments
Entity Resolution (ER) lies at the core of data integration and
cleaning and, thus, a bulk of the research examines ways for
improving its effectiveness and time efficiency. The initial ER
methods primarily target Veracity in the context of structured
(relational) data that are described by a schema of well-known
quality and meaning. To achieve high effectiveness, they leverage
schema, expert, and/or external knowledge. Part of these methods
are extended to address Volume, processing large datasets through
multi-core or massive parallelization approaches, such as the
MapReduce paradigm. However, these early schema-based approaches
are inapplicable to Web Data, which abound in voluminous, noisy,
semi-structured, and highly heterogeneous information. To address
the additional challenge of Variety, recent works on ER adopt a
novel, loosely schema-aware functionality that emphasizes
scalability and robustness to noise. Another line of present
research focuses on the additional challenge of Velocity, aiming to
process data collections of a continuously increasing volume. The
latest works, though, take advantage of the significant
breakthroughs in Deep Learning and Crowdsourcing, incorporating
external knowledge to enhance the existing words to a significant
extent. This synthesis lecture organizes ER methods into four
generations based on the challenges posed by these four Vs. For
each generation, we outline the corresponding ER workflow, discuss
the state-of-the-art methods per workflow step, and present current
research directions. The discussion of these methods takes into
account a historical perspective, explaining the evolution of the
methods over time along with their similarities and differences.
The lecture also discusses the available ER tools and benchmark
datasets that allow expert as well as novice users to make use of
the available solutions.
|
|
Email address subscribed successfully.
A activation email has been sent to you.
Please click the link in that email to activate your subscription.