Welcome to Loot.co.za!
Sign in / Register |Wishlists & Gift Vouchers |Help | Advanced search
|
Your cart is empty |
|||
Showing 1 - 16 of 16 matches in All Departments
An emerging topic in software engineering and data mining, specification mining tackles software maintenance and reliability issues that cost economies billions of dollars each year. The first unified reference on the subject, Mining Software Specifications: Methodologies and Applications describes recent approaches for mining specifications of software systems. Experts in the field illustrate how to apply state-of-the-art data mining and machine learning techniques to address software engineering concerns. In the first set of chapters, the book introduces a number of studies on mining finite state machines that employ techniques, such as grammar inference, partial order mining, source code model checking, abstract interpretation, and more. The remaining chapters present research on mining temporal rules/patterns, covering techniques that include path-aware static program analyses, lightweight rule/pattern mining, statistical analysis, and other interesting approaches. Throughout the book, the authors discuss how to employ dynamic analysis, static analysis, and combinations of both to mine software specifications. According to the US National Institute of Standards and Technology in 2002, software bugs have cost the US economy 59.5 billion dollars a year. This volume shows how specification mining can help find bugs and improve program understanding, thereby reducing unnecessary financial losses. The book encourages the industry adoption of specification mining techniques and the assimilation of these techniques in standard integrated development environments (IDEs).
The Definitive Volume on Cutting-Edge Exploratory Analysis of Massive Spatial and Spatiotemporal Databases Since the publication of the first edition of Geographic Data Mining and Knowledge Discovery, new techniques for geographic data warehousing (GDW), spatial data mining, and geovisualization (GVis) have been developed. In addition, there has been a rise in the use of knowledge discovery techniques due to the increasing collection and storage of data on spatiotemporal processes and mobile objects. Incorporating these novel developments, this second edition reflects the current state of the art in the field. New to the Second Edition
Geographic data mining and knowledge discovery is a promising young discipline with many challenging research problems. This book shows that this area represents an important direction in the development of a new generation of spatial analysis tools for data-rich environments. Exploring various problems and possible solutions, it will motivate researchers to develop new methods and applications in this emerging field.
Drawn from the US National Science Foundation's Symposium on Next Generation of Data Mining and Cyber-Enabled Discovery for Innovation (NGDM 07), Next Generation of Data Mining explores emerging technologies and applications in data mining as well as potential challenges faced by the field. Gathering perspectives from top experts across different disciplines, the book debates upcoming challenges and outlines computational methods. The contributors look at how ecology, astronomy, social science, medicine, finance, and more can benefit from the next generation of data mining techniques. They examine the algorithms, middleware, infrastructure, and privacy policies associated with ubiquitous, distributed, and high performance data mining. They also discuss the impact of new technologies, such as the semantic web, on data mining and provide recommendations for privacy-preserving mechanisms. The dramatic increase in the availability of massive, complex data from various sources is creating computing, storage, communication, and human-computer interaction challenges for data mining. Providing a framework to better understand these fundamental issues, this volume surveys promising approaches to data mining problems that span an array of disciplines.
Data Mining: Concepts and Techniques, Fourth Edition introduces concepts, principles, and methods for mining patterns, knowledge, and models from various kinds of data for diverse applications. Specifically, it delves into the processes for uncovering patterns and knowledge from massive collections of data, known as knowledge discovery from data, or KDD. It focuses on the feasibility, usefulness, effectiveness, and scalability of data mining techniques for large data sets. After an introduction to the concept of data mining, the authors explain the methods for preprocessing, characterizing, and warehousing data. They then partition the data mining methods into several major tasks, introducing concepts and methods for mining frequent patterns, associations, and correlations for large data sets; data classificcation and model construction; cluster analysis; and outlier detection. Concepts and methods for deep learning are systematically introduced as one chapter. Finally, the book covers the trends, applications, and research frontiers in data mining.
The real-world data, though massive, is largely unstructured, in the form of natural-language text. It is challenging but highly desirable to mine structures from massive text data, without extensive human annotation and labeling. In this book, we investigate the principles and methodologies of mining structures of factual knowledge (e.g., entities and their relationships) from massive, unstructured text corpora. Departing from many existing structure extraction methods that have heavy reliance on human annotated data for model training, our effort-light approach leverages human-curated facts stored in external knowledge bases as distant supervision and exploits rich data redundancy in large text corpora for context understanding. This effort-light mining approach leads to a series of new principles and powerful methodologies for structuring text corpora, including (1) entity recognition, typing and synonym discovery, (2) entity relation extraction, and (3) open-domain attribute-value mining and information extraction. This book introduces this new research frontier and points out some promising research directions.
A lot of digital ink has been spilled on "big data" over the past few years. Most of this surge owes its origin to the various types of unstructured data in the wild, among which the proliferation of text-heavy data is particularly overwhelming, attributed to the daily use of web documents, business reviews, news, social posts, etc., by so many people worldwide.A core challenge presents itself: How can one efficiently and effectively turn massive, unstructured text into structured representation so as to further lay the foundation for many other downstream text mining applications? In this book, we investigated one promising paradigm for representing unstructured text, that is, through automatically identifying high-quality phrases from innumerable documents. In contrast to a list of frequent n-grams without proper filtering, users are often more interested in results based on variable-length phrases with certain semantics such as scientific concepts, organizations, slogans, and so on. We propose new principles and powerful methodologies to achieve this goal, from the scenario where a user can provide meaningful guidance to a fully automated setting through distant learning. This book also introduces applications enabled by the mined phrases and points out some promising research directions.
The "big data" era is characterized by an explosion of information in the form of digital data collections, ranging from scientific knowledge, to social media, news, and everyone's daily life. Examples of such collections include scientific publications, enterprise logs, news articles, social media, and general web pages. Valuable knowledge about multi-typed entities is often hidden in the unstructured or loosely structured, interconnected data. Mining latent structures around entities uncovers hidden knowledge such as implicit topics, phrases, entity roles and relationships. In this monograph, we investigate the principles and methodologies of mining latent entity structures from massive unstructured and interconnected data. We propose a text-rich information network model for modeling data in many different domains. This leads to a series of new principles and powerful methodologies for mining latent structures, including (1) latent topical hierarchy, (2) quality topical phrases, (3) entity roles in hierarchical topical communities, and (4) entity relations. This book also introduces applications enabled by the mined structures and points out some promising research directions.
Outlier (or anomaly) detection is a very broad field which has been studied in the context of a large number of research areas like statistics, data mining, sensor networks, environmental science, distributed systems, spatio-temporal mining, etc. Initial research in outlier detection focused on time series-based outliers (in statistics). Since then, outlier detection has been studied on a large variety of data types including high-dimensional data, uncertain data, stream data, network data, time series data, spatial data, and spatio-temporal data. While there have been many tutorials and surveys for general outlier detection, we focus on outlier detection for temporal data in this book. A large number of applications generate temporal datasets. For example, in our everyday life, various kinds of records like credit, personnel, financial, judicial, medical, etc., are all temporal. This stresses the need for an organized and detailed study of outliers with respect to such temporal data. In the past decade, there has been a lot of research on various forms of temporal data including consecutive data snapshots, series of data snapshots and data streams. Besides the initial work on time series, researchers have focused on rich forms of data including multiple data streams, spatio-temporal data, network data, community distribution data, etc. Compared to general outlier detection, techniques for temporal outlier detection are very different. In this book, we will present an organized picture of both recent and past research in temporal outlier detection. We start with the basics and then ramp up the reader to the main ideas in state-of-the-art outlier detection techniques. We motivate the importance of temporal outlier detection and brief the challenges beyond usual outlier detection. Then, we list down a taxonomy of proposed techniques for temporal outlier detection. Such techniques broadly include statistical techniques (like AR models, Markov models, histograms, neural networks), distance- and density-based approaches, grouping-based approaches (clustering, community detection), network-based approaches, and spatio-temporal outlier detection approaches. We summarize by presenting a wide collection of applications where temporal outlier detection techniques have been applied to discover interesting outliers. Table of Contents: Preface / Acknowledgments / Figure Credits / Introduction and Challenges / Outlier Detection for Time Series and Data Sequences / Outlier Detection for Data Streams / Outlier Detection for Distributed Data Streams / Outlier Detection for Spatio-Temporal Data / Outlier Detection for Temporal Network Data / Applications of Outlier Detection for Temporal Data / Conclusions and Research Directions / Bibliography / Authors' Biographies
Our ability to generate and collect data has been increasing rapidly. Not only are all of our business, scientific, and government transactions now computerized, but the widespread use of digital cameras, publication tools, and bar codes also generate data. On the collection side, scanned text and image platforms, satellite remote sensing systems, and the World Wide Web have flooded us with a tremendous amount of data. This explosive growth has generated an even more urgent need for new techniques and automated tools that can help us transform this data into useful information and knowledge. Like the first edition, voted the most popular data mining book by KD Nuggets readers, this book explores concepts and techniques for the discovery of patterns hidden in large data sets, focusing on issues relating to their feasibility, usefulness, effectiveness, and scalability. However, since the publication of the first edition, great progress has been made in the development of new data mining methods, systems, and applications. This new edition substantially enhances the first edition, and new chapters have been added to address recent developments on mining complex types of data- including stream data, sequence data, graph structured data, social network data, and multi-relational data.
Drawn from the US National Science Foundation's Symposium on Next Generation of Data Mining and Cyber-Enabled Discovery for Innovation (NGDM 07), Next Generation of Data Mining explores emerging technologies and applications in data mining as well as potential challenges faced by the field. Gathering perspectives from top experts across different disciplines, the book debates upcoming challenges and outlines computational methods. The contributors look at how ecology, astronomy, social science, medicine, finance, and more can benefit from the next generation of data mining techniques. They examine the algorithms, middleware, infrastructure, and privacy policies associated with ubiquitous, distributed, and high performance data mining. They also discuss the impact of new technologies, such as the semantic web, on data mining and provide recommendations for privacy-preserving mechanisms. The dramatic increase in the availability of massive, complex data from various sources is creating computing, storage, communication, and human-computer interaction challenges for data mining. Providing a framework to better understand these fundamental issues, this volume surveys promising approaches to data mining problems that span an array of disciplines.
With the recent ?ourishing research activities on Web search and mining, social networkanalysis,informationnetworkanalysis,informationretrieval,linkana- sis,andstructuraldatamining,researchonlinkmininghasbeenrapidlygrowing, forminganew?eldofdatamining. Traditionaldataminingfocuseson"?at"or"isolated"datainwhicheachdata objectisrepresentedasanindependentattributevector. However,manyreal-world data sets are inter-connected, much richer in structure, involving objects of h- erogeneoustypesandcomplexlinks. Hence,thestudyoflinkminingwillhavea highimpactonvariousimportantapplicationssuchasWebandtextmining,social networkanalysis,collaborative?ltering,andbioinformatics. Asanemergingresearch?eld,therearecurrentlynobooksfocusingonthetheory andtechniquesaswellastherelatedapplicationsforlinkmining,especiallyfrom aninterdisciplinarypointofview. Ontheotherhand,duetothehighpopularity oflinkagedata,extensiveapplicationsrangingfromgovernmentalorganizationsto commercial businesses to people's daily life call for exploring the techniques of mininglinkagedata. Therefore,researchersandpractitionersneedacomprehensive booktosystematicallystudy,furtherdevelop,andapplythelinkminingtechniques totheseapplications. Thisbookcontainscontributedchaptersfromavarietyofprominentresearchers inthe?eld. Whilethechaptersarewrittenbydifferentresearchers,thetopicsand contentareorganizedinsuchawayastopresentthemostimportantmodels,al- rithms,andapplicationsonlinkmininginastructuredandconciseway. Giventhe lackofstructurallyorganizedinformationonthetopicoflinkmining,thebookwill provideinsightswhicharenoteasilyaccessibleotherwise. Wehopethatthebook willprovideausefulreferencetonotonlyresearchers,professors,andadvanced levelstudentsincomputersciencebutalsopractitionersinindustry. Wewouldliketoconveyourappreciationtoallauthorsfortheirvaluablec- tributions. WewouldalsoliketoacknowledgethatthisworkissupportedbyNSF throughgrantsIIS-0905215,IIS-0914934,andDBI-0960443. Chicago,Illinois PhilipS. Yu Urbana-Champaign,Illinois JiaweiHan Pittsburgh,Pennsylvania ChristosFaloutsos v Contents Part I Link-Based Clustering 1 Machine Learning Approaches to Link-Based Clustering...3 Zhongfei(Mark)Zhang,BoLong,ZhenGuo,TianbingXu, andPhilipS. Yu 2 Scalable Link-Based Similarity Computation and Clustering...45 XiaoxinYin,JiaweiHan,andPhilipS. Yu 3 Community Evolution and Change Point Detection in Time-Evolving Graphs...73 JimengSun,SpirosPapadimitriou,PhilipS. Yu,andChristosFaloutsos Part II Graph Mining and Community Analysis 4 A Survey of Link Mining Tasks for Analyzing Noisy and Incomplete Networks...107 GalileoMarkNamata,HossamSharara,andLiseGetoor 5 Markov Logic: A Language and Algorithms for Link Mining...135 PedroDomingos,DanielLowd,StanleyKok,AniruddhNath,Hoifung Poon,MatthewRichardson,andParagSingla 6 Understanding Group Structures and Properties in Social Media...163 LeiTangandHuanLiu 7 Time Sensitive Ranking with Application to Publication Search...187 XinLi,BingLiu,andPhilipS. Yu 8 Proximity Tracking on Dynamic Bipartite Graphs: Problem De?nitions and Fast Solutions...211 Hanghang Tong, Spiros Papadimitriou, Philip S. Yu, andChristosFaloutsos vii viii Contents 9 Discriminative Frequent Pattern-Based Graph Classi?cation...237 HongCheng,XifengYan,andJiaweiHan Part III Link Analysis for Data Cleaning and Information Integration 10 Information Integration for Graph Databases...2 65 Ee-PengLim,AixinSun,AnwitamanDatta,andKuiyuChang 11 Veracity Analysis and Object Distinction...283 XiaoxinYin,JiaweiHan,andPhilipS. Yu Part IV Social Network Analysis 12 Dynamic Community Identi?cation...
This book brings all of the elements of data mining together in a
single volume, saving the reader the time and expense of making
multiple purchases. It consolidates both introductory and advanced
topics, thereby covering the gamut of data mining and machine
learning tactics ? from data integration and pre-processing, to
fundamental algorithms, to optimization techniques and web mining
methodology.
Unstructured text, as one of the most important data forms, plays a crucial role in data-driven decision making in domains ranging from social networking and information retrieval to scientific research and healthcare informatics. In many emerging applications, people's information need from text data is becoming multidimensional-they demand useful insights along multiple aspects from a text corpus. However, acquiring such multidimensional knowledge from massive text data remains a challenging task. This book presents data mining techniques that turn unstructured text data into multidimensional knowledge. We investigate two core questions. (1) How does one identify task-relevant text data with declarative queries in multiple dimensions? (2) How does one distill knowledge from text data in a multidimensional space? To address the above questions, we develop a text cube framework. First, we develop a cube construction module that organizes unstructured data into a cube structure, by discovering latent multidimensional and multi-granular structure from the unstructured text corpus and allocating documents into the structure. Second, we develop a cube exploitation module that models multiple dimensions in the cube space, thereby distilling from user-selected data multidimensional knowledge. Together, these two modules constitute an integrated pipeline: leveraging the cube structure, users can perform multidimensional, multigranular data selection with declarative queries; and with cube exploitation algorithms, users can extract multidimensional patterns from the selected data for decision making. The proposed framework has two distinctive advantages when turning text data into multidimensional knowledge: flexibility and label-efficiency. First, it enables acquiring multidimensional knowledge flexibly, as the cube structure allows users to easily identify task-relevant data along multiple dimensions at varied granularities and further distill multidimensional knowledge. Second, the algorithms for cube construction and exploitation require little supervision; this makes the framework appealing for many applications where labeled data are expensive to obtain.
With the recent ?ourishing research activities on Web search and mining, social networkanalysis,informationnetworkanalysis,informationretrieval,linkana- sis,andstructuraldatamining,researchonlinkmininghasbeenrapidlygrowing, forminganew?eldofdatamining. Traditionaldataminingfocuseson"?at"or"isolated"datainwhicheachdata objectisrepresentedasanindependentattributevector. However,manyreal-world data sets are inter-connected, much richer in structure, involving objects of h- erogeneoustypesandcomplexlinks. Hence,thestudyoflinkminingwillhavea highimpactonvariousimportantapplicationssuchasWebandtextmining,social networkanalysis,collaborative?ltering,andbioinformatics. Asanemergingresearch?eld,therearecurrentlynobooksfocusingonthetheory andtechniquesaswellastherelatedapplicationsforlinkmining,especiallyfrom aninterdisciplinarypointofview. Ontheotherhand,duetothehighpopularity oflinkagedata,extensiveapplicationsrangingfromgovernmentalorganizationsto commercial businesses to people's daily life call for exploring the techniques of mininglinkagedata. Therefore,researchersandpractitionersneedacomprehensive booktosystematicallystudy,furtherdevelop,andapplythelinkminingtechniques totheseapplications. Thisbookcontainscontributedchaptersfromavarietyofprominentresearchers inthe?eld. Whilethechaptersarewrittenbydifferentresearchers,thetopicsand contentareorganizedinsuchawayastopresentthemostimportantmodels,al- rithms,andapplicationsonlinkmininginastructuredandconciseway. Giventhe lackofstructurallyorganizedinformationonthetopicoflinkmining,thebookwill provideinsightswhicharenoteasilyaccessibleotherwise. Wehopethatthebook willprovideausefulreferencetonotonlyresearchers,professors,andadvanced levelstudentsincomputersciencebutalsopractitionersinindustry. Wewouldliketoconveyourappreciationtoallauthorsfortheirvaluablec- tributions. WewouldalsoliketoacknowledgethatthisworkissupportedbyNSF throughgrantsIIS-0905215,IIS-0914934,andDBI-0960443. Chicago,Illinois PhilipS. Yu Urbana-Champaign,Illinois JiaweiHan Pittsburgh,Pennsylvania ChristosFaloutsos v Contents Part I Link-Based Clustering 1 Machine Learning Approaches to Link-Based Clustering...3 Zhongfei(Mark)Zhang,BoLong,ZhenGuo,TianbingXu, andPhilipS. Yu 2 Scalable Link-Based Similarity Computation and Clustering...45 XiaoxinYin,JiaweiHan,andPhilipS. Yu 3 Community Evolution and Change Point Detection in Time-Evolving Graphs...73 JimengSun,SpirosPapadimitriou,PhilipS. Yu,andChristosFaloutsos Part II Graph Mining and Community Analysis 4 A Survey of Link Mining Tasks for Analyzing Noisy and Incomplete Networks...107 GalileoMarkNamata,HossamSharara,andLiseGetoor 5 Markov Logic: A Language and Algorithms for Link Mining...135 PedroDomingos,DanielLowd,StanleyKok,AniruddhNath,Hoifung Poon,MatthewRichardson,andParagSingla 6 Understanding Group Structures and Properties in Social Media...163 LeiTangandHuanLiu 7 Time Sensitive Ranking with Application to Publication Search...187 XinLi,BingLiu,andPhilipS. Yu 8 Proximity Tracking on Dynamic Bipartite Graphs: Problem De?nitions and Fast Solutions...211 Hanghang Tong, Spiros Papadimitriou, Philip S. Yu, andChristosFaloutsos vii viii Contents 9 Discriminative Frequent Pattern-Based Graph Classi?cation...237 HongCheng,XifengYan,andJiaweiHan Part III Link Analysis for Data Cleaning and Information Integration 10 Information Integration for Graph Databases...2 65 Ee-PengLim,AixinSun,AnwitamanDatta,andKuiyuChang 11 Veracity Analysis and Object Distinction...283 XiaoxinYin,JiaweiHan,andPhilipS. Yu Part IV Social Network Analysis 12 Dynamic Community Identi?cation...
This book provides a principled data-driven framework that progressively constructs, enriches, and applies taxonomies without leveraging massive human annotated data. Traditionally, people construct domain-specific taxonomies by extensive manual curations, which is time-consuming and costly. In today's information era, people are inundated with the vast amounts of text data. Despite their usefulness, people haven't yet exploited the full power of taxonomies due to the heavy curation needed for creating and maintaining them. To bridge this gap, the authors discuss automated taxonomy discovery and exploration, with an emphasis on label-efficient machine learning methods and their real-world usages. Taxonomy organizes entities and concepts in a hierarchy way. It is ubiquitous in our daily life, ranging from product taxonomies used by online retailers, topic taxonomies deployed by news outlets and social media, as well as scientific taxonomies deployed by digital libraries across various domains. When properly analyzed, these taxonomies can play a vital role for science, engineering, business intelligence, policy design, e-commerce, and more. Intuitive examples are used throughout enabling readers to grasp concepts more easily.
An emerging topic in software engineering and data mining, specification mining tackles software maintenance and reliability issues that cost economies billions of dollars each year. The first unified reference on the subject, Mining Software Specifications: Methodologies and Applications describes recent approaches for mining specifications of software systems. Experts in the field illustrate how to apply state-of-the-art data mining and machine learning techniques to address software engineering concerns. In the first set of chapters, the book introduces a number of studies on mining finite state machines that employ techniques, such as grammar inference, partial order mining, source code model checking, abstract interpretation, and more. The remaining chapters present research on mining temporal rules/patterns, covering techniques that include path-aware static program analyses, lightweight rule/pattern mining, statistical analysis, and other interesting approaches. Throughout the book, the authors discuss how to employ dynamic analysis, static analysis, and combinations of both to mine software specifications. According to the US National Institute of Standards and Technology in 2002, software bugs have cost the US economy 59.5 billion dollars a year. This volume shows how specification mining can help find bugs and improve program understanding, thereby reducing unnecessary financial losses. The book encourages the industry adoption of specification mining techniques and the assimilation of these techniques in standard integrated development environments (IDEs).
|
You may like...
How Did We Get Here? - A Girl's Guide to…
Mpoomy Ledwaba
Paperback
(1)
|