|
Showing 1 - 4 of
4 matches in All Departments
Data mining is a mature technology. The prediction problem, looking
for predictive patterns in data, has been widely studied. Strong
me- ods are available to the practitioner. These methods process
structured numerical information, where uniform measurements are
taken over a sample of data. Text is often described as
unstructured information. So, it would seem, text and numerical
data are different, requiring different methods. Or are they? In
our view, a prediction problem can be solved by the same methods,
whether the data are structured - merical measurements or
unstructured text. Text and documents can be transformed into
measured values, such as the presence or absence of words, and the
same methods that have proven successful for pred- tive data mining
can be applied to text. Yet, there are key differences. Evaluation
techniques must be adapted to the chronological order of
publication and to alternative measures of error. Because the data
are documents, more specialized analytical methods may be preferred
for text. Moreover, the methods must be modi?ed to accommodate very
high dimensions: tens of thousands of words and documents. Still,
the central themes are similar.
One consequence of the pervasive use of computers is that most
documents originate in digital form. Widespread use of the Internet
makes them readily available. Text mining - the process of
analyzing unstructured natural-language text - is concerned with
how to extract information from these documents. Developed from the
authors' highly successful Springer reference on text mining,
Fundamentals of Predictive Text Mining is an introductory textbook
and guide to this rapidly evolving field. Integrating topics
spanning the varied disciplines of data mining, machine learning,
databases, and computational linguistics, this uniquely useful book
also provides practical advice for text mining. In-depth
discussions are presented on issues of document classification,
information retrieval, clustering and organizing documents,
information extraction, web-based data-sourcing, and prediction and
evaluation. Background on data mining is beneficial, but not
essential. Where advanced concepts are discussed that require
mathematical maturity for a proper understanding, intuitive
explanations are also provided for less advanced readers. Topics
and features: presents a comprehensive, practical and easy-to-read
introduction to text mining; includes chapter summaries, useful
historical and bibliographic remarks, and classroom-tested
exercises for each chapter; explores the application and utility of
each method, as well as the optimum techniques for specific
scenarios; provides several descriptive case studies that take
readers from problem description to systems deployment in the real
world; includes access to industrial-strength text-mining software
that runs on any computer; describes methods that rely on basic
statistical techniques, thus allowing for relevance to all
languages (not just English); contains links to free downloadable
software and other supplementary instruction material. Fundamentals
of Predictive Text Mining is an essential resource for IT
professionals and managers, as well as a key text for advanced
undergraduate computer science students and beginning graduate
students. Dr. Sholom M. Weiss is a Research Staff Member with the
IBM Predictive Modeling group, in Yorktown Heights, New York, and
Professor Emeritus of Computer Science at Rutgers University. Dr.
Nitin Indurkhya is Professor at the School of Computer Science and
Engineering, University of New South Wales, Australia, as well as
founder and president of data-mining consulting company Data-Miner
Pty Ltd. Dr. Tong Zhang is Associate Professor at the Department of
Statistics and Biostatistics at Rutgers University, New Jersey.
Data mining is a mature technology. The prediction problem, looking
for predictive patterns in data, has been widely studied. Strong
me- ods are available to the practitioner. These methods process
structured numerical information, where uniform measurements are
taken over a sample of data. Text is often described as
unstructured information. So, it would seem, text and numerical
data are different, requiring different methods. Or are they? In
our view, a prediction problem can be solved by the same methods,
whether the data are structured - merical measurements or
unstructured text. Text and documents can be transformed into
measured values, such as the presence or absence of words, and the
same methods that have proven successful for pred- tive data mining
can be applied to text. Yet, there are key differences. Evaluation
techniques must be adapted to the chronological order of
publication and to alternative measures of error. Because the data
are documents, more specialized analytical methods may be preferred
for text. Moreover, the methods must be modi?ed to accommodate very
high dimensions: tens of thousands of words and documents. Still,
the central themes are similar.
The potential business advantages of data mining are well
documented in publications for executives and managers. However,
developers implementing major data-mining systems need concrete
information about the underlying technical principles and their
practical manifestations in order to either integrate commercially
available tools or write data-mining programs from scratch. This
book is the first technical guide to provide a complete,
generalized roadmap for developing data-mining applications,
together with advice on performing these large-scale, open-ended
analyses for real-world data warehouses.
Note: If you already own Predictive Data Mining: A Practical Guide,
please see ISBN 1-55860-477-4 to order the accompanying software.
To order the book/software package, please see ISBN 1-55860-478-2.
+ Focuses on the preparation and organization of data and the
development of an overall strategy for data mining.
+ Reviews sophisticated prediction methods that search for patterns
in big data.
+ Describes how to accurately estimate future performance of
proposed solutions.
+ Illustrates the data-mining process and its potential pitfalls
through real-life case studies."
|
You may like...
Holy Fvck
Demi Lovato
CD
R435
Discovery Miles 4 350
Widows
Viola Davis, Michelle Rodriguez, …
Blu-ray disc
R22
R19
Discovery Miles 190
Operation Joktan
Amir Tsarfati, Steve Yohn
Paperback
(1)
R250
R211
Discovery Miles 2 110
Loot
Nadine Gordimer
Paperback
(2)
R205
R168
Discovery Miles 1 680
|