![]() |
![]() |
Your cart is empty |
||
Books > Computing & IT > Applications of computing > Databases > Data mining
Text Mining and Visualization: Case Studies Using Open-Source Tools provides an introduction to text mining using some of the most popular and powerful open-source tools: KNIME, RapidMiner, Weka, R, and Python. The contributors-all highly experienced with text mining and open-source software-explain how text data are gathered and processed from a wide variety of sources, including books, server access logs, websites, social media sites, and message boards. Each chapter presents a case study that you can follow as part of a step-by-step, reproducible example. You can also easily apply and extend the techniques to other problems. All the examples are available on a supplementary website. The book shows you how to exploit your text data, offering successful application examples and blueprints for you to tackle your text mining tasks and benefit from open and freely available tools. It gets you up to date on the latest and most powerful tools, the data mining process, and specific text mining activities.
Traditional methods for handling spatial data are encumbered by the assumption of separate origins for horizontal and vertical measurements, but modern measurement systems operate in a 3-D spatial environment. The 3-D Global Spatial Data Model: Principles and Applications, Second Edition maintains a new model for handling digital spatial data, the global spatial data model or GSDM. The GSDM preserves the integrity of three-dimensional spatial data while also providing additional benefits such as simpler equations, worldwide standardization, and the ability to track spatial data accuracy with greater specificity and convenience. This second edition expands to new topics that satisfy a growing need in the GIS, professional surveyor, machine control, and Big Data communities while continuing to embrace the earth center fixed coordinate system as the fundamental point of origin of one, two, and three-dimensional data sets. Ideal for both beginner and advanced levels, this book also provides guidance and insight on how to link to the data collected and stored in legacy systems.
Feature engineering plays a vital role in big data analytics. Machine learning and data mining algorithms cannot work without data. Little can be achieved if there are few features to represent the underlying data objects, and the quality of results of those algorithms largely depends on the quality of the available features. Feature Engineering for Machine Learning and Data Analytics provides a comprehensive introduction to feature engineering, including feature generation, feature extraction, feature transformation, feature selection, and feature analysis and evaluation. The book presents key concepts, methods, examples, and applications, as well as chapters on feature engineering for major data types such as texts, images, sequences, time series, graphs, streaming data, software engineering data, Twitter data, and social media data. It also contains generic feature generation approaches, as well as methods for generating tried-and-tested, hand-crafted, domain-specific features. The first chapter defines the concepts of features and feature engineering, offers an overview of the book, and provides pointers to topics not covered in this book. The next six chapters are devoted to feature engineering, including feature generation for specific data types. The subsequent four chapters cover generic approaches for feature engineering, namely feature selection, feature transformation based feature engineering, deep learning based feature engineering, and pattern based feature generation and engineering. The last three chapters discuss feature engineering for social bot detection, software management, and Twitter-based applications respectively. This book can be used as a reference for data analysts, big data scientists, data preprocessing workers, project managers, project developers, prediction modelers, professors, researchers, graduate students, and upper level undergraduate students. It can also be used as the primary text for courses on feature engineering, or as a supplement for courses on machine learning, data mining, and big data analytics.
High-Performance Computing for Big Data: Methodologies and Applications explores emerging high-performance architectures for data-intensive applications, novel efficient analytical strategies to boost data processing, and cutting-edge applications in diverse fields, such as machine learning, life science, neural networks, and neuromorphic engineering. The book is organized into two main sections. The first section covers Big Data architectures, including cloud computing systems, and heterogeneous accelerators. It also covers emerging 3D IC design principles for memory architectures and devices. The second section of the book illustrates emerging and practical applications of Big Data across several domains, including bioinformatics, deep learning, and neuromorphic engineering. Features Covers a wide range of Big Data architectures, including distributed systems like Hadoop/Spark Includes accelerator-based approaches for big data applications such as GPU-based acceleration techniques, and hardware acceleration such as FPGA/CGRA/ASICs Presents emerging memory architectures and devices such as NVM, STT- RAM, 3D IC design principles Describes advanced algorithms for different big data application domains Illustrates novel analytics techniques for Big Data applications, scheduling, mapping, and partitioning methodologies Featuring contributions from leading experts, this book presents state-of-the-art research on the methodologies and applications of high-performance computing for big data applications. About the Editor Dr. Chao Wang is an Associate Professor in the School of Computer Science at the University of Science and Technology of China. He is the Associate Editor of ACM Transactions on Design Automations for Electronics Systems (TODAES), Applied Soft Computing, Microprocessors and Microsystems, IET Computers & Digital Techniques, and International Journal of Electronics. Dr. Chao Wang was the recipient of Youth Innovation Promotion Association, CAS, ACM China Rising Star Honorable Mention (2016), and best IP nomination of DATE 2015. He is now on the CCF Technical Committee on Computer Architecture, CCF Task Force on Formal Methods. He is a Senior Member of IEEE, Senior Member of CCF, and a Senior Member of ACM.
Modern biological databases comprise not only data, but also sophisticated query facilities and bioinformatics data analysis tools. This book provides an exploration through the world of Bioinformatics Database Systems. The book summarizes the popular and innovative bioinformatics repositories currently available, including popular primary genetic and protein sequence databases, phylogenetic databases, structure and pathway databases, microarray databases and boutique databases. It also explores the data quality and information integration issues currently involved with managing bioinformatics databases, including data quality issues that have been observed, and efforts in the data cleaning field. Biological data integration issues are also covered in-depth, and the book demonstrates how data integration can create new repositories to address the needs of the biological communities. It also presents typical data integration architectures employed in current bioinformatics databases. The latter part of the book covers biological data mining and biological data processing approaches using cloud-based technologies. General data mining approaches are discussed, as well as specific data mining methodologies that have been successfully deployed in biological data mining applications. Two biological data mining case studies are also included to illustrate how data, query, and analysis methods are integrated into user-friendly systems. Aimed at researchers and developers of bioinformatics database systems, the book is also useful as a supplementary textbook for a one-semester upper-level undergraduate course, or an introductory graduate bioinformatics course.
Data Mining with R: Learning with Case Studies, Second Edition uses practical examples to illustrate the power of R and data mining. Providing an extensive update to the best-selling first edition, this new edition is divided into two parts. The first part will feature introductory material, including a new chapter that provides an introduction to data mining, to complement the already existing introduction to R. The second part includes case studies, and the new edition strongly revises the R code of the case studies making it more up-to-date with recent packages that have emerged in R. The book does not assume any prior knowledge about R. Readers who are new to R and data mining should be able to follow the case studies, and they are designed to be self-contained so the reader can start anywhere in the document. The book is accompanied by a set of freely available R source files that can be obtained at the book's web site. These files include all the code used in the case studies, and they facilitate the "do-it-yourself" approach followed in the book. Designed for users of data analysis tools, as well as researchers and developers, the book should be useful for anyone interested in entering the "world" of R and data mining. About the Author Luis Torgo is an associate professor in the Department of Computer Science at the University of Porto in Portugal. He teaches Data Mining in R in the NYU Stern School of Business' MS in Business Analytics program. An active researcher in machine learning and data mining for more than 20 years, Dr. Torgo is also a researcher in the Laboratory of Artificial Intelligence and Data Analysis (LIAAD) of INESC Porto LA.
This book equips readers to handle complex multi-view data representation, centered around several major visual applications, sharing many tips and insights through a unified learning framework. This framework is able to model most existing multi-view learning and domain adaptation, enriching readers' understanding from their similarity, and differences based on data organization and problem settings, as well as the research goal. A comprehensive review exhaustively provides the key recent research on multi-view data analysis, i.e., multi-view clustering, multi-view classification, zero-shot learning, and domain adaption. More practical challenges in multi-view data analysis are discussed including incomplete, unbalanced and large-scale multi-view learning. Learning Representation for Multi-View Data Analysis covers a wide range of applications in the research fields of big data, human-centered computing, pattern recognition, digital marketing, web mining, and computer vision.
Interest in predictive analytics of big data has grown exponentially in the four years since the publication of Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data, Second Edition. In the third edition of this bestseller, the author has completely revised, reorganized, and repositioned the original chapters and produced 13 new chapters of creative and useful machine-learning data mining techniques. In sum, the 43 chapters of simple yet insightful quantitative techniques make this book unique in the field of data mining literature. What is new in the Third Edition: The current chapters have been completely rewritten. The core content has been extended with strategies and methods for problems drawn from the top predictive analytics conference and statistical modeling workshops. Adds thirteen new chapters including coverage of data science and its rise, market share estimation, share of wallet modeling without survey data, latent market segmentation, statistical regression modeling that deals with incomplete data, decile analysis assessment in terms of the predictive power of the data, and a user-friendly version of text mining, not requiring an advanced background in natural language processing (NLP). Includes SAS subroutines which can be easily converted to other languages. As in the previous edition, this book offers detailed background, discussion, and illustration of specific methods for solving the most commonly experienced problems in predictive modeling and analysis of big data. The author addresses each methodology and assigns its application to a specific type of problem. To better ground readers, the book provides an in-depth discussion of the basic methodologies of predictive modeling and analysis. While this type of overview has been attempted before, this approach offers a truly nitty-gritty, step-by-step method that both tyros and experts in the field can enjoy playing with.
The book illustrates the inter-relationship between several data management, analytics and decision support techniques and methods commonly adopted in Cybersecurity-oriented frameworks. The recent advent of Big Data paradigms and the use of data science methods, has resulted in a higher demand for effective data-driven models that support decision-making at a strategic level. This motivates the need for defining novel data analytics and decision support approaches in a myriad of real-life scenarios and problems, with Cybersecurity-related domains being no exception. This contributed volume comprises nine chapters, written by leading international researchers, covering a compilation of recent advances in Cybersecurity-related applications of data analytics and decision support approaches. In addition to theoretical studies and overviews of existing relevant literature, this book comprises a selection of application-oriented research contributions. The investigations undertaken across these chapters focus on diverse and critical Cybersecurity problems, such as Intrusion Detection, Insider Threats, Insider Threats, Collusion Detection, Run-Time Malware Detection, Intrusion Detection, E-Learning, Online Examinations, Cybersecurity noisy data removal, Secure Smart Power Systems, Security Visualization and Monitoring. Researchers and professionals alike will find the chapters an essential read for further research on the topic.
This book presents recent developments on the theoretical, algorithmic, and application aspects of Big Data in Complex and Social Networks. The book consists of four parts, covering a wide range of topics. The first part of the book focuses on data storage and data processing. It explores how the efficient storage of data can fundamentally support intensive data access and queries, which enables sophisticated analysis. It also looks at how data processing and visualization help to communicate information clearly and efficiently. The second part of the book is devoted to the extraction of essential information and the prediction of web content. The book shows how Big Data analysis can be used to understand the interests, location, and search history of users and provide more accurate predictions of User Behavior. The latter two parts of the book cover the protection of privacy and security, and emergent applications of big data and social networks. It analyzes how to model rumor diffusion, identify misinformation from massive data, and design intervention strategies. Applications of big data and social networks in multilayer networks and multiparty systems are also covered in-depth.
During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.
This guide shows how to combine data science with social science to gain unprecedented insight into customer behavior, so you can change it. Joanne Rodrigues-Craig bridges the gap between predictive data science and statistical techniques that reveal why important things happen -- why customers buy more, or why they immediately leave your site -- so you can get more behaviors you want and less you don't. Drawing on extensive enterprise experience and deep knowledge of demographics and sociology, Rodrigues-Craig shows how to create better theories and metrics, so you can accelerate the process of gaining insight, altering behavior, and earning business value. You'll learn how to: Develop complex, testable theories for understanding individual and social behavior in web products Think like a social scientist and contextualize individual behavior in today's social environments Build more effective metrics and KPIs for any web product or system Conduct more informative and actionable A/B tests Explore causal effects, reflecting a deeper understanding of the differences between correlation and causation Alter user behavior in a complex web product Understand how relevant human behaviors develop, and the prerequisites for changing them Choose the right statistical techniques for common tasks such as multistate and uplift modeling Use advanced statistical techniques to model multidimensional systems Do all of this in R (with sample code available in a separate code manual)
Volume II of this series discusses the technology used to implement a big data analysis capability within a service-oriented organization. It discusses the technical architecture necessary to implement a big data analysis capability, some issues and challenges in big data analysis and utilization that an organization will face, and how to capture value from it. It will help readers understand what technology is required for a basic capability and what the expected benefits are from establishing a big data capability within their organization.
Web mining aims to discover useful information and knowledge from Web hyperlinks, page contents, and usage data. Although Web mining uses many conventional data mining techniques, it is not purely an application of traditional data mining due to the semi-structured and unstructured nature of the Web data. The field has also developed many of its own algorithms and techniques. Liu has written a comprehensive text on Web mining, which consists of two parts. The first part covers the data mining and machine learning foundations, where all the essential concepts and algorithms of data mining and machine learning are presented. The second part covers the key topics of Web mining, where Web crawling, search, social network analysis, structured data extraction, information integration, opinion mining and sentiment analysis, Web usage mining, query log mining, computational advertising, and recommender systems are all treated both in breadth and in depth. His book thus brings all the related concepts and algorithms together to form an authoritative and coherent text. The book offers a rich blend of theory and practice. It is suitable for students, researchers and practitioners interested in Web mining and data mining both as a learning text and as a reference book. Professors can readily use it for classes on data mining, Web mining, and text mining. Additional teaching materials such as lecture slides, datasets, and implemented algorithms are available online. "
Volume I of this two-volume series focuses on the role of big data in service delivery systems. It discusses the definition and orientation to big data, applications of it in service delivery systems, how to obtain results that can affect/enhance service delivery, and how to build an effective big data organization. This volume will assist readers in fitting big data analysis into their service-based organizations. It will also help readers understand how to improve the use of big data to enhance their service-oriented organizations.
Geographic Information has an important role to play in linking and combining datasets through shared location, but the potential is still far from fully realized because the data is not well organized and the technology to aid this process has not been available. Developments in the Semantic Web and Linked Data, however, are making it possible to integrate data based on Geographic Information in a way that is more accessible to users. Drawing on the industry experience of a geographer and a computer scientist, Linked Data: A Geographic Perspective is a practical guide to implementing Geographic Information as Linked Data. Combine Geographic Information from Multiple Sources Using Linked Data After an introduction to the building blocks of Geographic Information, the Semantic Web, and Linked Data, the book explores how Geographic Information can become part of the Semantic Web as Linked Data. In easy-to-understand terms, the authors explain the complexities of modeling Geographic Information using Semantic Web technologies and publishing it as Linked Data. They review the software tools currently available for publishing and modeling Linked Data and provide a framework to help you evaluate new tools in a rapidly developing market. They also give an overview of the important languages and syntaxes you will need to master. Throughout, extensive examples demonstrate why and how you can use ontologies and Linked Data to manipulate and integrate real-world Geographic Information data from multiple sources. A Practical, Readable Guide for Geographers, Software Engineers, and Laypersons A coherent, readable introduction to a complex subject, this book supplies the durable knowledge and insight you need to think about Geographic Information through the lens of the Semantic Web. It provides a window to Linked Data for geographers, as well as a geographic perspective for so
Covering research at the frontier of this field, Privacy-Aware Knowledge Discovery: Novel Applications and New Techniques presents state-of-the-art privacy-preserving data mining techniques for application domains, such as medicine and social networks, that face the increasing heterogeneity and complexity of new forms of data. Renowned authorities from prominent organizations not only cover well-established results-they also explore complex domains where privacy issues are generally clear and well defined, but the solutions are still preliminary and in continuous development. Divided into seven parts, the book provides in-depth coverage of the most novel reference scenarios for privacy-preserving techniques. The first part gives general techniques that can be applied to various applications discussed in the rest of the book. The second section focuses on the sanitization of network traces and privacy in data stream mining. After the third part on privacy in spatio-temporal data mining and mobility data analysis, the book examines time series analysis in the fourth section, explaining how a perturbation method and a segment-based method can tackle privacy issues of time series data. The fifth section on biomedical data addresses genomic data as well as the problem of privacy-aware information sharing of health data. In the sixth section on web applications, the book deals with query log mining and web recommender systems. The final part on social networks analyzes privacy issues related to the management of social network data under different perspectives. While several new results have recently occurred in the privacy, database, and data mining research communities, a uniform presentation of up-to-date techniques and applications is lacking. Filling this void, Privacy-Aware Knowledge Discovery presents novel algorithms, patterns, and models, along with a significant collection of open problems for future investigation.
This book offers a clear and comprehensive introduction to broad learning, one of the novel learning problems studied in data mining and machine learning. Broad learning aims at fusing multiple large-scale information sources of diverse varieties together, and carrying out synergistic data mining tasks across these fused sources in one unified analytic. This book takes online social networks as an application example to introduce the latest alignment and knowledge discovery algorithms. Besides the overview of broad learning, machine learning and social network basics, specific topics covered in this book include network alignment, link prediction, community detection, information diffusion, viral marketing, and network embedding.
Compiled by world- class leaders in the field of collaborative information retrieval and search (CIS), this book centres on the notion that information seeking is not always a solitary activity and working in collaboration to perform information-seeking tasks should be studied and supported. Covering aspects of theories, models, and applications the book is divided in three parts: * Best Practices and Studies: providing an overview of current knowledge and state-of-the-art in the field. * New Domains: covers some of the new and exciting opportunities of applying CIS * New Thoughts: focuses on new research directions by scholars from academia and industry from around the world. Collaborative Information Seeking provides a valuable reference for student, teachers, and researchers interested in the area of collaborative work, information seeking/retrieval, and human-computer interaction.
This book features multi-omics big-data integration and data-mining techniques. In the omics age, paramount of multi-omics data from various sources is the new challenge we are facing, but it also provides clues for several biomedical or clinical applications. This book focuses on data integration and data mining methods for multi-omics research, which explains in detail and with supportive examples the “What”, “Why” and “How” of the topic. The contents are organized into eight chapters, out of which one is for the introduction, followed by four chapters dedicated for omics integration techniques focusing on several omics data resources and data-mining methods, and three chapters dedicated for applications of multi-omics analyses with application being demonstrated by several data mining methods. This book is an attempt to bridge the gap between the biomedical multi-omics big data and the data-mining techniques for the best practice of contemporary bioinformatics and the in-depth insights for the biomedical questions. It would be of interests for the researchers and practitioners who want to conduct the multi-omics studies in cancer, inflammation disease, and microbiome researches.
This book provides a unique, in-depth discussion of multiview learning, one of the fastest developing branches in machine learning. Multiview Learning has been proved to have good theoretical underpinnings and great practical success. This book describes the models and algorithms of multiview learning in real data analysis. Incorporating multiple views to improve the generalization performance, multiview learning is also known as data fusion or data integration from multiple feature sets. This self-contained book is applicable for multi-modal learning research, and requires minimal prior knowledge of the basic concepts in the field. It is also a valuable reference resource for researchers working in the field of machine learning and also those in various application domains.
This book explores the possibility of using social media data for detecting socio-economic recovery activities. In the last decade, there have been intensive research activities focusing on social media during and after disasters. This approach, which views people's communication on social media as a sensor for real-time situations, has been widely adopted as the "people as sensor" approach. Furthermore, to improve recovery efforts after large-scale disasters, detecting communities' real-time recovery situations is essential, since conventional socio-economic recovery indicators, such as governmental statistics, are not published in real time. Thanks to its timeliness, using social media data can fill the gap. Motivated by this possibility, this book especially focuses on the relationships between people's communication on Twitter and Facebook pages, and socio-economic recovery activities as reflected in the used-car market data and the housing market data in the case of two major disasters: the Great East Japan Earthquake and Tsunami of 2011 and Hurricane Sandy in 2012. The book pursues an interdisciplinary approach, combining e.g. disaster recovery studies, crisis informatics, and economics. In terms of its contributions, firstly, the book sheds light on the "people as sensors" approach for detecting socio-economic recovery activities, which has not been thoroughly studied to date but has the potential to improve situation awareness during the recovery phase. Secondly, the book proposes new socio-economic recovery indicators: used-car market data and housing market data. Thirdly, in the context of using social media during the recovery phase, the results demonstrate the importance of distinguishing between social media data posted both by people who are at or near disaster-stricken areas and by those who are farther away.
This volume unpacks an intriguing challenge for the field of media research: combining media research with the study of complex networks. Bringing together research on the small-world idea and digital culture it questions the assumption that we are separated from any other person on the planet by just a few steps, and that this distance decreases within digital social networks. The book argues that the role of languages is decisive to understand how people connect, and it looks at the consequences this has on the ways knowledge spreads digitally. This volume offers a first conceptual venue to analyse emerging phenomena at the innovative intersection of media and complex network research.
This book constitutes selected, revised and extended papers from the 13th International Conference on Computer Supported Education, CSEDU 2021, held as a virtual event in April 2021. The 27 revised full papers were carefully reviewed and selected from 143 submissions. They were organized in topical sections as follows: artificial intelligence in education; information technologies supporting learning; learning/teaching methodologies and assessment; social context and learning environments; ubiquitous learning; current topics. |
![]() ![]() You may like...
PVD for Microelectronics: Sputter…
Stephen M. Rossnagel, Ronald Powell, …
Hardcover
R3,545
Discovery Miles 35 450
Killing For Culture - From Edison to…
David Kerekes, David Slater
Paperback
R995
Discovery Miles 9 950
Using WAVES and VHDL for Effective…
James P. Hanna, Robert G. Hillman, …
Hardcover
R4,538
Discovery Miles 45 380
The Charley Chase Scrapbook (hardback)
Brian Anthony, Bill Walker
Hardcover
R1,753
Discovery Miles 17 530
|