|
Showing 1 - 5 of
5 matches in All Departments
Explorations in Automatic Thesaurus Discovery presents an automated
method for creating a first-draft thesaurus from raw text. It
describes natural processing steps of tokenization, surface
syntactic analysis, and syntactic attribute extraction. From these
attributes, word and term similarity is calculated and a thesaurus
is created showing important common terms and their relation to
each other, common verb--noun pairings, common expressions, and
word family members. The techniques are tested on twenty different
corpora ranging from baseball newsgroups, assassination archives,
medical X-ray reports, abstracts on AIDS, to encyclopedia articles
on animals, even on the text of the book itself. The corpora range
from 40,000 to 6 million characters of text, and results are
presented for each in the Appendix. The methods described in the
book have undergone extensive evaluation. Their time and space
complexity are shown to be modest. The results are shown to
converge to a stable state as the corpus grows. The similarities
calculated are compared to those produced by psychological testing.
A method of evaluation using Artificial Synonyms is tested. Gold
Standards evaluation show that techniques significantly outperform
non-linguistic-based techniques for the most important words in
corpora. Explorations in Automatic Thesaurus Discovery includes
applications to the fields of information retrieval using
established testbeds, existing thesaural enrichment, semantic
analysis. Also included are applications showing how to create,
implement, and test a first-draft thesaurus.
Most of the papers in this volume were first presented at the
Workshop on Cross-Linguistic Information Retrieval that was held
August 22, 1996 dur ing the SIGIR'96 Conference. Alan Smeaton of
Dublin University and Paraic Sheridan of the ETH, Zurich, were the
two other members of the Scientific Committee for this workshop.
SIGIR is the Association for Computing Ma chinery (ACM) Special
Interest Group on Information Retrieval, and they have held
conferences yearly since 1977. Three additional papers have been
added: Chapter 4 Distributed Cross-Lingual Information retrieval
describes the EMIR retrieval system, one of the first general
cross-language systems to be implemented and evaluated; Chapter 6
Mapping Vocabularies Using Latent Semantic Indexing, which
originally appeared as a technical report in the Lab oratory for
Computational Linguistics at Carnegie Mellon University in 1991, is
included here because it was one of the earliest, though
hard-to-find, publi cations showing the application of Latent
Semantic Indexing to the problem of cross-language retrieval; and
Chapter 10 A Weighted Boolean Model for Cross Language Text
Retrieval describes a recent approach to solving the translation
term weighting problem, specific to Cross-Language Information
Retrieval. Gregory Grefenstette CONTRIBUTORS Lisa Ballesteros David
Hull W, Bruce Croft Gregory Grefenstette Center for Intelligent
Xerox Research Centre Europe Information Retrieval Grenoble
Laboratory Computer Science Department University of Massachusetts
Thomas K. Landauer Department of Psychology Mark W. Davis and
Institute of Cognitive Science Computing Research Lab University of
Colorado, Boulder New Mexico State University Michael L. Littman
Bonnie J.
This book presents revised versions of the lectures given at the 8th ELSNET European Summer School on Language and Speech Communication held on the Island of Chios, Greece, in summer 2000. Besides an introductory survey, the book presents lectures on data analysis for multimedia libraries, pronunciation modeling for large vocabulary speech recognition, statistical language modeling, very large scale information retrieval, reduction of information variation in text, and a concluding chapter on open questions in research for linguistics in information access. The book gives newcomers to language and speech communication a clear overview of the main technologies and problems in the area. Researchers and professionals active in the area will appreciate the book as a concise review of the technologies used in text- and speech-triggered information access.
Most of the papers in this volume were first presented at the
Workshop on Cross-Linguistic Information Retrieval that was held
August 22, 1996 dur ing the SIGIR'96 Conference. Alan Smeaton of
Dublin University and Paraic Sheridan of the ETH, Zurich, were the
two other members of the Scientific Committee for this workshop.
SIGIR is the Association for Computing Ma chinery (ACM) Special
Interest Group on Information Retrieval, and they have held
conferences yearly since 1977. Three additional papers have been
added: Chapter 4 Distributed Cross-Lingual Information retrieval
describes the EMIR retrieval system, one of the first general
cross-language systems to be implemented and evaluated; Chapter 6
Mapping Vocabularies Using Latent Semantic Indexing, which
originally appeared as a technical report in the Lab oratory for
Computational Linguistics at Carnegie Mellon University in 1991, is
included here because it was one of the earliest, though
hard-to-find, publi cations showing the application of Latent
Semantic Indexing to the problem of cross-language retrieval; and
Chapter 10 A Weighted Boolean Model for Cross Language Text
Retrieval describes a recent approach to solving the translation
term weighting problem, specific to Cross-Language Information
Retrieval. Gregory Grefenstette CONTRIBUTORS Lisa Ballesteros David
Hull W, Bruce Croft Gregory Grefenstette Center for Intelligent
Xerox Research Centre Europe Information Retrieval Grenoble
Laboratory Computer Science Department University of Massachusetts
Thomas K. Landauer Department of Psychology Mark W. Davis and
Institute of Cognitive Science Computing Research Lab University of
Colorado, Boulder New Mexico State University Michael L. Littman
Bonnie J."
Explorations in Automatic Thesaurus Discovery presents an automated
method for creating a first-draft thesaurus from raw text. It
describes natural processing steps of tokenization, surface
syntactic analysis, and syntactic attribute extraction. From these
attributes, word and term similarity is calculated and a thesaurus
is created showing important common terms and their relation to
each other, common verb--noun pairings, common expressions, and
word family members. The techniques are tested on twenty different
corpora ranging from baseball newsgroups, assassination archives,
medical X-ray reports, abstracts on AIDS, to encyclopedia articles
on animals, even on the text of the book itself. The corpora range
from 40,000 to 6 million characters of text, and results are
presented for each in the Appendix. The methods described in the
book have undergone extensive evaluation. Their time and space
complexity are shown to be modest. The results are shown to
converge to a stable state as the corpus grows. The similarities
calculated are compared to those produced by psychological testing.
A method of evaluation using Artificial Synonyms is tested. Gold
Standards evaluation show that techniques significantly outperform
non-linguistic-based techniques for the most important words in
corpora. Explorations in Automatic Thesaurus Discovery includes
applications to the fields of information retrieval using
established testbeds, existing thesaural enrichment, semantic
analysis. Also included are applications showing how to create,
implement, and test a first-draft thesaurus.
|
|