|
Showing 1 - 2 of
2 matches in All Departments
This is the second book based on the 5S (Societies, Scenarios,
Spaces, Structures, Streams) approach to digital libraries (DLs).
Leveraging the first volume, on Theoretical Foundations, we focus
on the key issues of evaluation and integration. These
cross-cutting issues serve as a bridge for those interested in DLs,
connecting the introduction and formal discussion in the first
book, with the coverage of key technologies in the third book, and
of illustrative applications in the fourth book. These two topics
have central importance in the DL field, allowing it to be treated
scientifically as well as practically. In the scholarly world, we
only really understand something if we know how to measure and
evaluate it. In the Internet era of distributed information
systems, we only can be practical at scale if we integrate across
both systems and their associated content. Evaluation of DLs must
take place atmultiple levels,so we can address the different
entities and their associated measures. Thus, for digital objects,
we assess accessibility, pertinence, preservability, relevance,
significance, similarity, and timeliness. Other measures are
specific to higher-level constructs like metadata, collections,
catalogs, repositories, and services.We tie these together through
a case study of the 5SQual tool, which we designed and implemented
to perform an automatic quantitative evaluation of DLs. Thus,
across the Information Life Cycle, we describe metrics and software
useful to assess the quality of DLs, and demonstrate utility with
regard to representative application areas: archaeology and
education. Though integration has been a challenge since the
earliest work on DLs, we provide the first comprehensive 5S-based
formal description of the DL integration problem, cast in the
context of related work. Since archaeology is a fundamentally
distributed enterprise, we describe ETANADL, for integrating Near
Eastern Archeology sites and information. Thus, we show how
5S-based modeling can lead to integrated services and content.
While the first book adopts a minimalist and formal approach to
DLs, and provides a systematic and functional method to design and
implement DL exploring services, here we broaden to practical DLs
with richer metamodels, demonstrating the power of 5S for
integration and evaluation.
This book deals with a hard problem that is inherent to human
language: ambiguity. In particular, we focus on author name
ambiguity, a type of ambiguity that exists in digital bibliographic
repositories, which occurs when an author publishes works under
distinct names or distinct authors publish works under similar
names. This problem may be caused by a number of reasons, including
the lack of standards and common practices, and the decentralized
generation of bibliographic content. As a consequence, the quality
of the main services of digital bibliographic repositories such as
search, browsing, and recommendation may be severely affected by
author name ambiguity. The focal point of the book is on automatic
methods, since manual solutions do not scale to the size of the
current repositories or the speed in which they are updated.
Accordingly, we provide an ample view on the problem of automatic
disambiguation of author names, summarizing the results of more
than a decade of research on this topic conducted by our group,
which were reported in more than a dozen publications that received
over 900 citations so far, according to Google Scholar. We start by
discussing its motivational issues (Chapter 1). Next, we formally
define the author name disambiguation task (Chapter 2) and use this
formalization to provide a brief, taxonomically organized, overview
of the literature on the topic (Chapter 3). We then organize,
summarize and integrate the efforts of our own group on developing
solutions for the problem that have historically produced
state-of-the-art (by the time of their proposals) results in terms
of the quality of the disambiguation results. Thus, Chapter 4
covers HHC - Heuristic-based Clustering, an author name
disambiguation method that is based on two specific real-world
assumptions regarding scientific authorship. Then, Chapter 5
describes SAND - Self-training Author Name Disambiguator and
Chapter 6 presents two incremental author name disambiguation
methods, namely INDi - Incremental Unsupervised Name Disambiguation
and INC- Incremental Nearest Cluster. Finally, Chapter 7 provides
an overview of recent author name disambiguation methods that
address new specific approaches such as graph-based
representations, alternative predefined similarity functions,
visualization facilities and approaches based on artificial neural
networks. The chapters are followed by three appendices that cover,
respectively: (i) a pattern matching function for comparing proper
names and used by some of the methods addressed in this book; (ii)
a tool for generating synthetic collections of citation records for
distinct experimental tasks; and (iii) a number of datasets
commonly used to evaluate author name disambiguation methods. In
summary, the book organizes a large body of knowledge and work in
the area of author name disambiguation in the last decade, hoping
to consolidate a solid basis for future developments in the field.
|
|