![]() |
Welcome to Loot.co.za!
Sign in / Register |Wishlists & Gift Vouchers |Help | Advanced search
|
Your cart is empty |
||
|
Books > Computing & IT > Applications of computing > Databases > Data mining
Often considered more of an art than a science, books on clustering have been dominated by learning through example with techniques chosen almost through trial and error. Even the two most popular, and most related, clustering methods K-Means for partitioning and Ward's method for hierarchical clustering have lacked the theoretical underpinning required to establish a firm relationship between the two methods and relevant interpretation aids. Other approaches, such as spectral clustering or consensus clustering, are considered absolutely unrelated to each other or to the two above mentioned methods. Clustering: A Data Recovery Approach, Second Edition presents a unified modeling approach for the most popular clustering methods: the K-Means and hierarchical techniques, especially for divisive clustering. It significantly expands coverage of the mathematics of data recovery, and includes a new chapter covering more recent popular network clustering approaches spectral, modularity and uniform, additive, and consensus treated within the same data recovery approach. Another added chapter covers cluster validation and interpretation, including recent developments for ontology-driven interpretation of clusters. Altogether, the insertions added a hundred pages to the book, even in spite of the fact that fragments unrelated to the main topics were removed. Illustrated using a set of small real-world datasets and more than a hundred examples, the book is oriented towards students, practitioners, and theoreticians of cluster analysis. Covering topics that are beyond the scope of most texts, the author s explanations of data recovery methods, theory-based advice, pre- and post-processing issues and his clear, practical instructions for real-world data mining make this book ideally suited for teaching, self-study, and professional reference.
"This text should be required reading for everyone in contemporary business." --Peter Woodhull, CEO, Modus21 "The one book that clearly describes and links Big Data concepts to business utility." --Dr. Christopher Starr, PhD "Simply, this is the best Big Data book on the market!" --Sam Rostam, Cascadian IT Group "...one of the most contemporary approaches I've seen to Big Data fundamentals..." --Joshua M. Davis, PhD The Definitive Plain-English Guide to Big Data for Business and Technology Professionals Big Data Fundamentals provides a pragmatic, no-nonsense introduction to Big Data. Best-selling IT author Thomas Erl and his team clearly explain key Big Data concepts, theory and terminology, as well as fundamental technologies and techniques. All coverage is supported with case study examples and numerous simple diagrams. The authors begin by explaining how Big Data can propel an organization forward by solving a spectrum of previously intractable business problems. Next, they demystify key analysis techniques and technologies and show how a Big Data solution environment can be built and integrated to offer competitive advantages. Discovering Big Data's fundamental concepts and what makes it different from previous forms of data analysis and data science Understanding the business motivations and drivers behind Big Data adoption, from operational improvements through innovation Planning strategic, business-driven Big Data initiatives Addressing considerations such as data management, governance, and security Recognizing the 5 "V" characteristics of datasets in Big Data environments: volume, velocity, variety, veracity, and value Clarifying Big Data's relationships with OLTP, OLAP, ETL, data warehouses, and data marts Working with Big Data in structured, unstructured, semi-structured, and metadata formats Increasing value by integrating Big Data resources with corporate performance monitoring Understanding how Big Data leverages distributed and parallel processing Using NoSQL and other technologies to meet Big Data's distinct data processing requirements Leveraging statistical approaches of quantitative and qualitative analysis Applying computational analysis methods, including machine learning
Your logical, linear guide to the fundamentals of data science programming Data science is exploding--in a good way--with a forecast of 1.7 megabytes of new information created every second for each human being on the planet by 2020 and 11.5 million job openings by 2026. It clearly pays dividends to be in the know. This friendly guide charts a path through the fundamentals of data science and then delves into the actual work: linear regression, logical regression, machine learning, neural networks, recommender engines, and cross-validation of models. Data Science Programming All-In-One For Dummies is a compilation of the key data science, machine learning, and deep learning programming languages: Python and R. It helps you decide which programming languages are best for specific data science needs. It also gives you the guidelines to build your own projects to solve problems in real time. Get grounded: the ideal start for new data professionals What lies ahead: learn about specific areas that data is transforming Be meaningful: find out how to tell your data story See clearly: pick up the art of visualization Whether you're a beginning student or already mid-career, get your copy now and add even more meaning to your life--and everyone else's!
Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.
Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the "data-analytic thinking" necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today. Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You'll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company's data science projects. You'll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making. Understand how data science fits in your organization - and how you can use it for competitive advantage Treat data as a business asset that requires careful investment if you're to gain real value Approach business problems data-analytically, using the data-mining process to gather good data in the most appropriate way Learn general concepts for actually extracting knowledge from data Apply data science principles when interviewing data science job candidates
Spectral Feature Selection for Data Mining introduces a novel feature selection technique that establishes a general platform for studying existing feature selection algorithms and developing new algorithms for emerging problems in real-world applications. This technique represents a unified framework for supervised, unsupervised, and semisupervised feature selection. The book explores the latest research achievements, sheds light on new research directions, and stimulates readers to make the next creative breakthroughs. It presents the intrinsic ideas behind spectral feature selection, its theoretical foundations, its connections to other algorithms, and its use in handling both large-scale data sets and small sample problems. The authors also cover feature selection and feature extraction, including basic concepts, popular existing algorithms, and applications. A timely introduction to spectral feature selection, this book illustrates the potential of this powerful dimensionality reduction technique in high-dimensional data processing. Readers learn how to use spectral feature selection to solve challenging problems in real-life applications and discover how general feature selection and extraction are connected to spectral feature selection.
The latest inventions in internet technology influence most of business and daily activities. Internet security, internet data management, web search, data grids, cloud computing, and web-based applications play vital roles, especially in business and industry, as more transactions go online and mobile. Issues related to ubiquitous computing are becoming critical. Internet technology and data engineering should reinforce efficiency and effectiveness of business processes. These technologies should help people make better and more accurate decisions by presenting necessary information and possible consequences for the decisions. Intelligent information systems should help us better understand and manage information with ubiquitous data repository and cloud computing. This book is a compilation of some recent research findings in Internet Technology and Data Engineering. This book provides state-of-the-art accounts in computational algorithms/tools, database management and database technologies, intelligent information systems, data engineering applications, internet security, internet data management, web search, data grids, cloud computing, web-based application, and other related topics.
This book will focus on utilizing statistical modelling of the software source code, in order to resolve issues associated with the software development processes. Writing and maintaining software source code is a costly business; software developers need to constantly rely on large existing code bases. Statistical modelling identifies the patterns in software artifacts and utilize them for predicting the possible issues.
This open access book describes the results of natural language processing and machine learning methods applied to clinical text from electronic patient records. It is divided into twelve chapters. Chapters 1-4 discuss the history and background of the original paper-based patient records, their purpose, and how they are written and structured. These initial chapters do not require any technical or medical background knowledge. The remaining eight chapters are more technical in nature and describe various medical classifications and terminologies such as ICD diagnosis codes, SNOMED CT, MeSH, UMLS, and ATC. Chapters 5-10 cover basic tools for natural language processing and information retrieval, and how to apply them to clinical text. The difference between rule-based and machine learning-based methods, as well as between supervised and unsupervised machine learning methods, are also explained. Next, ethical concerns regarding the use of sensitive patient records for research purposes are discussed, including methods for de-identifying electronic patient records and safely storing patient records. The book's closing chapters present a number of applications in clinical text mining and summarise the lessons learned from the previous chapters. The book provides a comprehensive overview of technical issues arising in clinical text mining, and offers a valuable guide for advanced students in health informatics, computational linguistics, and information retrieval, and for researchers entering these fields.
Introduction to Bio-Ontologies explores the computational background of ontologies. Emphasizing computational and algorithmic issues surrounding bio-ontologies, this self-contained text helps readers understand ontological algorithms and their applications. The first part of the book defines ontology and bio-ontologies. It also explains the importance of mathematical logic for understanding concepts of inference in bio-ontologies, discusses the probability and statistics topics necessary for understanding ontology algorithms, and describes ontology languages, including OBO (the preeminent language for bio-ontologies), RDF, RDFS, and OWL. The second part covers significant bio-ontologies and their applications. The book presents the Gene Ontology; upper-level ontologies, such as the Basic Formal Ontology and the Relation Ontology; and current bio-ontologies, including several anatomy ontologies, Chemical Entities of Biological Interest, Sequence Ontology, Mammalian Phenotype Ontology, and Human Phenotype Ontology. The third part of the text introduces the major graph-based algorithms for bio-ontologies. The authors discuss how these algorithms are used in overrepresentation analysis, model-based procedures, semantic similarity analysis, and Bayesian networks for molecular biology and biomedical applications. With a focus on computational reasoning topics, the final part describes the ontology languages of the Semantic Web and their applications for inference. It covers the formal semantics of RDF and RDFS, OWL inference rules, a key inference algorithm, the SPARQL query language, and the state of the art for querying OWL ontologies. Web Resource This book provides readers with the foundation to use ontologies as a starting point for new bioinformatics research projects or to support current molecular genetics research projects. By supplying a self-contained introduction to OBO ontologies and the Semantic Web, it bridges the gap between both fields and helps readers see what each can contribute to the analysis and understanding of biomedical data.
The book focuses on how machine learning and the Internet of Things (IoT) has empowered the advancement of information driven arrangements including key concepts and advancements. Ontologies that are used in heterogeneous IoT environments have been discussed including interpretation, context awareness, analyzing various data sources, machine learning algorithms and intelligent services and applications. Further, it includes unsupervised and semi-supervised machine learning techniques with study of semantic analysis and thorough analysis of reviews. Divided into sections such as machine learning, security, IoT and data mining, the concepts are explained with practical implementation including results. Key Features Follows an algorithmic approach for data analysis in machine learning Introduces machine learning methods in applications Address the emerging issues in computing such as deep learning, machine learning, Internet of Things and data analytics Focuses on machine learning techniques namely unsupervised and semi-supervised for unseen and seen data sets Case studies are covered relating to human health, transportation and Internet applications
Data mining is one of the most rapidly growing research areas in computer science and statistics. In Volume 2 of this three volume series, we have brought together contributions from some of the most prestigious researchers in theoretical data mining. Each of the chapters is self contained. Statisticians and applied scientists/ engineers will find this volume valuable. Additionally, it provides a sourcebook for graduate students interested in the current direction of research in data mining.
Providing a complete review of existing work in music emotion developed in psychology and engineering, Music Emotion Recognition explains how to account for the subjective nature of emotion perception in the development of automatic music emotion recognition (MER) systems. Among the first publications dedicated to automatic MER, it begins with a comprehensive introduction to the essential aspects of MER-including background, key techniques, and applications. This ground-breaking reference examines emotion from a dimensional perspective. It defines emotions in music as points in a 2D plane in terms of two of the most fundamental emotion dimensions according to psychologists-valence and arousal. The authors present a computational framework that generalizes emotion recognition from the categorical domain to real-valued 2D space. They also: Introduce novel emotion-based music retrieval and organization methods Describe a ranking-base emotion annotation and model training method Present methods that integrate information extracted from lyrics, chord sequence, and genre metadata for improved accuracy Consider an emotion-based music retrieval system that is particularly useful for mobile devices The book details techniques for addressing the issues related to: the ambiguity and granularity of emotion description, heavy cognitive load of emotion annotation, subjectivity of emotion perception, and the semantic gap between low-level audio signal and high-level emotion perception. Complete with more than 360 useful references, 12 example MATLAB (R) codes, and a listing of key abbreviations and acronyms, this cutting-edge guide supplies the technical understanding and tools needed to develop your own automatic MER system based on the automatic recognition model.
Biomarker discovery is an important area of biomedical research that may lead to significant breakthroughs in disease analysis and targeted therapy. Biomarkers are biological entities whose alterations are measurable and are characteristic of a particular biological condition. Discovering, managing, and interpreting knowledge of new biomarkers are challenging and attractive problems in the emerging field of biomedical informatics. This volume is a collection of state-of-the-art research into the application of data mining to the discovery and analysis of new biomarkers. Presenting new results, models and algorithms, the included contributions focus on biomarker data integration, information retrieval methods, and statistical machine learning techniques. This volume is intended for students, and researchers in bioinformatics, proteomics, and genomics, as well engineers and applied scientists interested in the interdisciplinary application of data mining techniques.
This volume comprises the 6th IFIP International Conference on Intelligent Infor- tion Processing. As the world proceeds quickly into the Information Age, it encounters both successes and challenges, and it is well recognized nowadays that intelligent information processing provides the key to the Information Age and to mastering many of these challenges. Intelligent information processing supports the most - vanced productive tools that are said to be able to change human life and the world itself. However, the path is never a straight one and every new technology brings with it a spate of new research problems to be tackled by researchers; as a result we are not running out of topics; rather the demand is ever increasing. This conference provides a forum for engineers and scientists in academia and industry to present their latest research findings in all aspects of intelligent information processing. This is the 6th IFIP International Conference on Intelligent Information Processing. We received more than 50 papers, of which 35 papers are included in this program as regular papers and 4 as short papers. We are grateful for the dedicated work of both the authors and the referees, and we hope these proceedings will continue to bear fruit over the years to come. All papers submitted were reviewed by two referees. A conference such as this cannot succeed without help from many individuals who contributed their valuable time and expertise.
The book provides an overview of the state-of-the-art of map construction algorithms, which use tracking data in the form of trajectories to generate vector maps. The most common trajectory type is GPS-based trajectories. It introduces three emerging algorithmic categories, outlines their general algorithmic ideas, and discusses three representative algorithms in greater detail. To quantify map construction algorithms, the authors include specific datasets and evaluation measures. The datasets, source code of map construction algorithms and evaluation measures are publicly available on http://www.mapconstruction.org. The web site serves as a repository for map construction data and algorithms and researchers can contribute by uploading their own code and benchmark data. Map Construction Algorithms is an excellent resource for professionals working in computational geometry, spatial databases, and GIS. Advanced-level students studying computer science, geography and mathematics will also find this book a useful tool.
Data mining consists of attempting to discover novel and useful knowledge from data, trying to find patterns among datasets that can help in intelligent decision making. However, reports of real-world case studies are not generally detailed in the literature, due to the fact that they are usually based on proprietary datasets, making it impossible to publish the results. This kind of situation makes hard to evaluate, in a precise way, the degree of effectiveness of data mining techniques in real-world applications. On the other hand, researchers of this field of expertise usually exploit public-domain datasets. This volume offers a wide spectrum of research work developed for data mining for real-world application. In the following, we give a brief introduction of the chapters that are included in this book.
Like a data-guzzling turbo engine, advanced data mining has been powering post-genome biological studies for two decades. Reflecting this growth, Biological Data Mining presents comprehensive data mining concepts, theories, and applications in current biological and medical research. Each chapter is written by a distinguished team of interdisciplinary data mining researchers who cover state-of-the-art biological topics. The first section of the book discusses challenges and opportunities in analyzing and mining biological sequences and structures to gain insight into molecular functions. The second section addresses emerging computational challenges in interpreting high-throughput Omics data. The book then describes the relationships between data mining and related areas of computing, including knowledge representation, information retrieval, and data integration for structured and unstructured biological data. The last part explores emerging data mining opportunities for biomedical applications. This volume examines the concepts, problems, progress, and trends in developing and applying new data mining techniques to the rapidly growing field of genome biology. By studying the concepts and case studies presented, readers will gain significant insight and develop practical solutions for similar biological data mining projects in the future.
This book covers deep-learning-based approaches for sentiment analysis, a relatively new, but fast-growing research area, which has significantly changed in the past few years. The book presents a collection of state-of-the-art approaches, focusing on the best-performing, cutting-edge solutions for the most common and difficult challenges faced in sentiment analysis research. Providing detailed explanations of the methodologies, the book is a valuable resource for researchers as well as newcomers to the field.
Introducing the fundamental concepts and algorithms of data mining Introduction to Data Mining, 2nd Edition, gives a comprehensive overview of the background and general themes of data mining and is designed to be useful to students, instructors, researchers, and professionals. Presented in a clear and accessible way, the book outlines fundamental concepts and algorithms for each topic, thus providing the reader with the necessary background for the application of data mining to real problems. The text helps readers understand the nuances of the subject, and includes important sections on classification, association analysis, and cluster analysis. This edition improves on the first iteration of the book, published over a decade ago, by addressing the significant changes in the industry as a result of advanced technology and data growth.
Most life science researchers will agree that biology is not a truly theoretical branch of science. The hype around computational biology and bioinformatics beginning in the nineties of the 20th century was to be short lived (1, 2). When almost no value of practical importance such as the optimal dose of a drug or the three-dimensional structure of an orphan protein can be computed from fundamental principles, it is still more straightforward to determine them experimentally. Thus, experiments and observationsdogeneratetheoverwhelmingpartofinsightsintobiologyandmedicine. The extrapolation depth and the prediction power of the theoretical argument in life sciences still have a long way to go. Yet, two trends have qualitatively changed the way how biological research is done today. The number of researchers has dramatically grown and they, armed with the same protocols, have produced lots of similarly structured data. Finally, high-throu- put technologies such as DNA sequencing or array-based expression profiling have been around for just a decade. Nevertheless, with their high level of uniform data generation, they reach the threshold of totally describing a living organism at the biomolecular level for the first time in human history. Whereas getting exact data about living systems and the sophistication of experimental procedures have primarily absorbed the minds of researchers previously, the weight increasingly shifts to the problem of interpreting accumulated data in terms of biological function and bio- lecular mechanisms.
Describes the State-of-the-Art in Spatial Data Mining, Focuses on Data Quality Substantial progress has been made toward developing effective techniques for spatial information processing in recent years. This science deals with models of reality in a GIS, however, and not with reality itself. Therefore, spatial information processes are often imprecise, allowing for much interpretation of abstract figures and data. Quality Aspects in Spatial Data Mining introduces practical and theoretical solutions for making sense of the often chaotic and overwhelming amount of concrete data available to researchers. In this cohesive collection of peer-reviewed chapters, field authorities present the latest field advancements and cover such essential areas as data acquisition, geoinformation theory, spatial statistics, and dissemination. Each chapter debuts with an editorial preview of each topic from a conceptual, applied, and methodological point of view, making it easier for researchers to judge which information is most beneficial to their work. Chapters Evolve From Error Propagation and Spatial Statistics to Address Relevant Applications The book advises the use of granular computing as a means of circumventing spatial complexities. This counter-application to traditional computing allows for the calculation of imprecise probabilities - the kind of information that the spatial information systems community wrestles with much of the time. Under the editorial guidance of internationally respected geoinformatics experts, this indispensable volume addresses quality aspects in the entire spatial data mining process, from data acquisition to end user. It also alleviates what is oftenfield researchers' most daunting task by organizing the wealth of concrete spatial data available into one convenient source, thereby advancing the frontiers of spatial information systems.
This book develops two key machine learning principles: the semi-supervised paradigm and learning with interdependent data. It reveals new applications, primarily web related, that transgress the classical machine learning framework through learning with interdependent data. The book traces how the semi-supervised paradigm and the learning to rank paradigm emerged from new web applications, leading to a massive production of heterogeneous textual data. It explains how semi-supervised learning techniques are widely used, but only allow a limited analysis of the information content and thus do not meet the demands of many web-related tasks. Later chapters deal with the development of learning methods for ranking entities in a large collection with respect to precise information needed. In some cases, learning a ranking function can be reduced to learning a classification function over the pairs of examples. The book proves that this task can be efficiently tackled in a new framework: learning with interdependent data. Researchers and professionals in machine learning will find these new perspectives and solutions valuable. Learning with Partially Labeled and Interdependent Data is also useful for advanced-level students of computer science, particularly those focused on statistics and learning.
This textbook offers a comprehensive introduction to Machine Learning techniques and algorithms. This Third Edition covers newer approaches that have become highly topical, including deep learning, and auto-encoding, introductory information about temporal learning and hidden Markov models, and a much more detailed treatment of reinforcement learning. The book is written in an easy-to-understand manner with many examples and pictures, and with a lot of practical advice and discussions of simple applications. The main topics include Bayesian classifiers, nearest-neighbor classifiers, linear and polynomial classifiers, decision trees, rule-induction programs, artificial neural networks, support vector machines, boosting algorithms, unsupervised learning (including Kohonen networks and auto-encoding), deep learning, reinforcement learning, temporal learning (including long short-term memory), hidden Markov models, and the genetic algorithm. Special attention is devoted to performance evaluation, statistical assessment, and to many practical issues ranging from feature selection and feature construction to bias, context, multi-label domains, and the problem of imbalanced classes. |
You may like...
All Saints - The Surprising Story of How…
Michael Spurlock, Jeanette Windle
Paperback
R468
Discovery Miles 4 680
The Oxford Handbook of Multimethod and…
Sharlene Nagy Hesse-Biber, R. Burke Johnson
Hardcover
R5,628
Discovery Miles 56 280
Higher-Order Growth Curves and Mixture…
Kandauda A. S. Wickrama, Tae Kyoung Lee, …
Paperback
R1,853
Discovery Miles 18 530
Analysis of Multivariate Social Science…
David J. Bartholomew, Fiona Steele, …
Paperback
R1,814
Discovery Miles 18 140
Design and Analysis in Educational…
Kamden K Strunk, Mwarumba Mwavita
Paperback
R1,533
Discovery Miles 15 330
|