![]() |
![]() |
Your cart is empty |
||
Books > Computing & IT > Applications of computing > Databases > Data mining
Pattern recognition in data is a well known classical problem that falls under the ambit of data analysis. As we need to handle different data, the nature of patterns, their recognition and the types of data analyses are bound to change. Since the number of data collection channels increases in the recent time and becomes more diversified, many real-world data mining tasks can easily acquire multiple databases from various sources. In these cases, data mining becomes more challenging for several essential reasons. We may encounter sensitive data originating from different sources - those cannot be amalgamated. Even if we are allowed to place different data together, we are certainly not able to analyze them when local identities of patterns are required to be retained. Thus, pattern recognition in multiple databases gives rise to a suite of new, challenging problems different from those encountered before. Association rule mining, global pattern discovery and mining patterns of select items provide different patterns discovery techniques in multiple data sources. Some interesting item-based data analyses are also covered in this book. Interesting patterns, such as exceptional patterns, icebergs and periodic patterns have been recently reported. The book presents a thorough influence analysis between items in time-stamped databases. The recent research on mining multiple related databases is covered while some previous contributions to the area are highlighted and contrasted with the most recent developments.
This book is a collection of representative and novel works in the field of data mining, knowledge discovery, clustering and classification. Discussing both theoretical and practical aspects of "Knowledge Discovery and Management" (KDM), it is intended for researchers interested in these fields, including PhD and MSc students, and researchers from public or private laboratories. The contributions included are extended and reworked versions of six of the best papers that were originally presented in French at the EGC'2016 conference held in Reims (France) in January 2016. This was the 16th edition of this successful conference, which takes place each year, and also featured workshops and other events with the aim of promoting exchanges between researchers and companies concerned with KDM and its applications in business, administration, industry and public organizations. For more details about the EGC society, please consult egc.asso.fr.
As we entered the 21st century, the rapid growth of information technology has changed our lives more conveniently than we have ever speculated. Recently in all fields of the industry, heterogeneous technologies have converged with information technology resulting in a new paradigm, information technology convergence. In the process of information technology convergence, the latest issues in the structure of data, system, network, and infrastructure have become the most challenging task. Proceedings of the International Conference on IT Convergence and Security 2011 approaches the subject matter with problems in technical convergence and convergences of security technology by looking at new issues that arise from techniques converging. The general scope is convergence security and the latest information technology with the following most important features and benefits: 1. Introduction of the most recent information technology and its related ideas 2. Applications and problems related to technology convergence, and its case studies 3. Introduction of converging existing security techniques through convergence security Overall, after reading Proceedings of the International Conference on IT Convergence and Security 2011, readers will understand the most state of the art information strategies and technologies of convergence security.
This book provides an overview of how comparable corpora can be used to overcome the lack of parallel resources when building machine translation systems for under-resourced languages and domains. It presents a wealth of methods and open tools for building comparable corpora from the Web, evaluating comparability and extracting parallel data that can be used for the machine translation task. It is divided into several sections, each covering a specific task such as building, processing, and using comparable corpora, focusing particularly on under-resourced language pairs and domains. The book is intended for anyone interested in data-driven machine translation for under-resourced languages and domains, especially for developers of machine translation systems, computational linguists and language workers. It offers a valuable resource for specialists and students in natural language processing, machine translation, corpus linguistics and computer-assisted translation, and promotes the broader use of comparable corpora in natural language processing and computational linguistics.
Data has increased due to the growing use of web applications and communication devices. It is necessary to develop new techniques of managing data in order to ensure adequate usage. Modern Technologies for Big Data Classification and Clustering is an essential reference source for the latest scholarly research on handling large data sets with conventional data mining and provide information about the new technologies developed for the management of large data. Featuring coverage on a broad range of topics such as text and web data analytics, risk analysis, and opinion mining, this publication is ideally designed for professionals, researchers, and students seeking current research on various concepts of big data analytics. Topics Covered: The many academic areas covered in this publication include, but are not limited to: Data visualization Distributed Computing Systems Opinion Mining Privacy and security Risk analysis Social Network Analysis Text Data Analytics Web Data Analytics
Representation learning in heterogeneous graphs (HG) is intended to provide a meaningful vector representation for each node so as to facilitate downstream applications such as link prediction, personalized recommendation, node classification, etc. This task, however, is challenging not only because of the need to incorporate heterogeneous structural (graph) information consisting of multiple types of node and edge, but also the need to consider heterogeneous attributes or types of content (e.g. text or image) associated with each node. Although considerable advances have been made in homogeneous (and heterogeneous) graph embedding, attributed graph embedding and graph neural networks, few are capable of simultaneously and effectively taking into account heterogeneous structural (graph) information as well as the heterogeneous content information of each node. In this book, we provide a comprehensive survey of current developments in HG representation learning. More importantly, we present the state-of-the-art in this field, including theoretical models and real applications that have been showcased at the top conferences and journals, such as TKDE, KDD, WWW, IJCAI and AAAI. The book has two major objectives: (1) to provide researchers with an understanding of the fundamental issues and a good point of departure for working in this rapidly expanding field, and (2) to present the latest research on applying heterogeneous graphs to model real systems and learning structural features of interaction systems. To the best of our knowledge, it is the first book to summarize the latest developments and present cutting-edge research on heterogeneous graph representation learning. To gain the most from it, readers should have a basic grasp of computer science, data mining and machine learning.
This book discusses the application of data systems and data-driven infrastructure in existing industrial systems in order to optimize workflow, utilize hidden potential, and make existing systems free from vulnerabilities. The book discusses application of data in the health sector, public transportation, the financial institutions, and in battling natural disasters, among others. Topics include real-time applications in the current big data perspective; improving security in IoT devices; data backup techniques for systems; artificial intelligence-based outlier prediction; machine learning in OpenFlow Network; and application of deep learning in blockchain enabled applications. This book is intended for a variety of readers from professional industries, organizations, and students.
Advances in hardware technology have lead to an ability to collect data with the use of a variety of sensor technologies. In particular sensor notes have become cheaper and more efficient, and have even been integrated into day-to-day devices of use, such as mobile phones. This has lead to a much larger scale of applicability and mining of sensor data sets. The human-centric aspect of sensor data has created tremendous opportunities in integrating social aspects of sensor data collection into the mining process. Managing and Mining Sensor Data is a contributed volume by prominent leaders in this field, targeting advanced-level students in computer science as a secondary text book or reference. Practitioners and researchers working in this field will also find this book useful.
This book presents established and state-of-the-art methods in Language Technology (including text mining, corpus linguistics, computational linguistics, and natural language processing), and demonstrates how they can be applied by humanities scholars working with textual data. The landscape of humanities research has recently changed thanks to the proliferation of big data and large textual collections such as Google Books, Early English Books Online, and Project Gutenberg. These resources have yet to be fully explored by new generations of scholars, and the authors argue that Language Technology has a key role to play in the exploration of large-scale textual data. The authors use a series of illustrative examples from various humanistic disciplines (mainly but not exclusively from History, Classics, and Literary Studies) to demonstrate basic and more complex use-case scenarios. This book will be useful to graduate students and researchers in humanistic disciplines working with textual data, including History, Modern Languages, Literary studies, Classics, and Linguistics. This is also a very useful book for anyone teaching or learning Digital Humanities and interested in the basic concepts from computational linguistics, corpus linguistics, and natural language processing.
This is the second edition of the comprehensive treatment of statistical inference using permutation techniques. It makes available to practitioners a variety of useful and powerful data analytic tools that rely on very few distributional assumptions. Although many of these procedures have appeared in journal articles, they are not readily available to practitioners. This new and updated edition places increased emphasis on the use of alternative permutation statistical tests based on metric Euclidean distance functions that have excellent robustness characteristics. These alternative permutation techniques provide many powerful multivariate tests including multivariate multiple regression analyses.
Dirty data is a problem that costs businesses thousands, if not millions, every year. In organisations large and small across the globe you will hear talk of data quality issues. What you will rarely hear about is the consequences or how to fix it. Between the Spreadsheets: Classifying and Fixing Dirty Data draws on classification expert Susan Walsh's decade of experience in data classification to present a fool-proof method for cleaning and classifying your data. The book covers everything from the very basics of data classification to normalisation and taxonomies, and presents the author's proven COAT methodology, helping ensure an organisation's data is Consistent, Organised, Accurate and Trustworthy. A series of data horror stories outlines what can go wrong in managing data, and if it does, how it can be fixed. After reading this book, regardless of your level of experience, not only will you be able to work with your data more efficiently, but you will also understand the impact the work you do with it has, and how it affects the rest of the organisation. Written in an engaging and highly practical manner, Between the Spreadsheets gives readers of all levels a deep understanding of the dangers of dirty data and the confidence and skills to work more efficiently and effectively with it.
Data mining essentially relies on several mathematical disciplines, many of which are presented in this second edition of this book. Topics include partially ordered sets, combinatorics, general topology, metric spaces, linear spaces, graph theory. To motivate the reader a significant number of applications of these mathematical tools are included ranging from association rules, clustering algorithms, classification, data constraints, logical data analysis, etc. The book is intended as a reference for researchers and graduate students. The current edition is a significant expansion of the first edition. We strived to make the book self-contained and only a general knowledge of mathematics is required. More than 700 exercises are included and they form an integral part of the material. Many exercises are in reality supplemental material and their solutions are included.
These proceedings gather outstanding research papers presented at the Second International Conference on Data Engineering 2015 (DaEng-2015) and offer a consolidated overview of the latest developments in databases, information retrieval, data mining and knowledge management. The conference brought together researchers and practitioners from academia and industry to address key challenges in these fields, discuss advanced data engineering concepts and form new collaborations. The topics covered include but are not limited to: * Data engineering * Big data * Data and knowledge visualization * Data management * Data mining and warehousing * Data privacy & security * Database theory * Heterogeneous databases * Knowledge discovery in databases * Mobile, grid and cloud computing * Knowledge management * Parallel and distributed data * Temporal data * Web data, services and information engineering * Decision support systems * E-Business engineering and management * E-commerce and e-learning * Geographical information systems * Information management * Information quality and strategy * Information retrieval, integration and visualization * Information security * Information systems and technologies
This book organizes key concepts, theories, standards, methodologies, trends, challenges and applications of data mining and knowledge discovery in databases. It first surveys, then provides comprehensive yet concise algorithmic descriptions of methods, including classic methods plus the extensions and novel methods developed recently. It also gives in-depth descriptions of data mining applications in various interdisciplinary industries.
Overcoming many challenges, data mining has already established discipline capability in many domains. ""Dynamic and Advanced Data Mining for Progressing Technological Development: Innovations and Systemic Approaches"" discusses advances in modern data mining research in today's rapidly growing global and technological environment. A critical mass of the most sought after knowledge, this publication serves as an important reference tool to leading research within information search and retrieval techniques.
As data mining algorithms are typically applied to sizable volumes of high-dimensional data, these can result in large storage requirements and inefficient computation times. This unique text/reference addresses the challenges of data abstraction generation using a least number of database scans, compressing data through novel lossy and non-lossy schemes, and carrying out clustering and classification directly in the compressed domain. Schemes are presented which are shown to be efficient both in terms of space and time, while simultaneously providing the same or better classification accuracy, as illustrated using high-dimensional handwritten digit data and a large intrusion detection dataset. Topics and features: presents a concise introduction to data mining paradigms, data compression, and mining compressed data; describes a non-lossy compression scheme based on run-length encoding of patterns with binary valued features; proposes a lossy compression scheme that recognizes a pattern as a sequence of features and identifying subsequences; examines whether the identification of prototypes and features can be achieved simultaneously through lossy compression and efficient clustering; discusses ways to make use of domain knowledge in generating abstraction; reviews optimal prototype selection using genetic algorithms; suggests possible ways of dealing with big data problems using multiagent systems. A must-read for all researchers involved in data mining and big data, the book proposes each algorithm within a discussion of the wider context, implementation details and experimental results. These are further supported by bibliographic notes and a glossary."""
This book introduces readers to advanced data science techniques for signal mining in connection with agriculture. It shows how to apply heuristic modeling to improve farm-level efficiency, and how to use sensors and data intelligence to provide closed-loop feedback, while also providing recommendation techniques that yield actionable insights. The book also proposes certain macroeconomic pricing models, which data-mine macroeconomic signals and the influence of global economic trends on small-farm sustainability to provide actionable insights to farmers, helping them avoid financial disasters due to recurrent economic crises. The book is intended to equip current and future software engineering teams and operations research experts with the skills and tools they need in order to fully utilize advanced data science, artificial intelligence, heuristics, and economic models to develop software capabilities that help to achieve sustained food security for future generations.
This volume presents techniques and theories drawn from mathematics, statistics, computer science, and information science to analyze problems in business, economics, finance, insurance, and related fields. The authors present proposals for solutions to common problems in related fields. To this end, they are showing the use of mathematical, statistical, and actuarial modeling, and concepts from data science to construct and apply appropriate models with real-life data, and employ the design and implementation of computer algorithms to evaluate decision-making processes. This book is unique as it associates data science - data-scientists coming from different backgrounds - with some basic and advanced concepts and tools used in econometrics, operational research, and actuarial sciences. It, therefore, is a must-read for scholars, students, and practitioners interested in a better understanding of the techniques and theories of these fields.
"Introduction to Data Mining" presents fundamental concepts and algorithms for those learning data mining for the first time. Each concept is explored thoroughly and supported with numerous examples. The text requires only a modest background in mathematics. Each major topic is organized into two chapters, beginning with basic concepts that provide necessary background for understanding each data mining technique, followed by more advanced concepts and algorithms.
Enterprise Architecture, Integration, and Interoperability and the Networked enterprise have become the theme of many conferences in the past few years. These conferences were organised by IFIP TC5 with the support of its two working groups: WG 5. 12 (Architectures for Enterprise Integration) and WG 5. 8 (Enterprise Interoperability), both concerned with aspects of the topic: how is it possible to architect and implement businesses that are flexible and able to change, to interact, and use one another's s- vices in a dynamic manner for the purpose of (joint) value creation. The original qu- tion of enterprise integration in the 1980s was: how can we achieve and integrate - formation and material flow in the enterprise? Various methods and reference models were developed or proposed - ranging from tightly integrated monolithic system - chitectures, through cell-based manufacturing to on-demand interconnection of bu- nesses to form virtual enterprises in response to market opportunities. Two camps have emerged in the endeavour to achieve the same goal, namely, to achieve interoperability between businesses (whereupon interoperability is the ability to exchange information in order to use one another's services or to jointly implement a service). One school of researchers addresses the technical aspects of creating dynamic (and static) interconnections between disparate businesses (or parts thereof).
Deep Learning models are at the core of artificial intelligence research today. It is well known that deep learning techniques are disruptive for Euclidean data, such as images or sequence data, and not immediately applicable to graph-structured data such as text. This gap has driven a wave of research for deep learning on graphs, including graph representation learning, graph generation, and graph classification. The new neural network architectures on graph-structured data (graph neural networks, GNNs in short) have performed remarkably on these tasks, demonstrated by applications in social networks, bioinformatics, and medical informatics. Despite these successes, GNNs still face many challenges ranging from the foundational methodologies to the theoretical understandings of the power of the graph representation learning. This book provides a comprehensive introduction of GNNs. It first discusses the goals of graph representation learning and then reviews the history, current developments, and future directions of GNNs. The second part presents and reviews fundamental methods and theories concerning GNNs while the third part describes various frontiers that are built on the GNNs. The book concludes with an overview of recent developments in a number of applications using GNNs. This book is suitable for a wide audience including undergraduate and graduate students, postdoctoral researchers, professors and lecturers, as well as industrial and government practitioners who are new to this area or who already have some basic background but want to learn more about advanced and promising techniques and applications. |
![]() ![]() You may like...
Summarizing Biological Networks
Sourav S Bhowmick, Boon-Siew Seah
Hardcover
R3,764
Discovery Miles 37 640
Math Everywhere - Deterministic and…
G. Aletti, Martin Burger, …
Hardcover
R2,919
Discovery Miles 29 190
An educator's guide to effective…
S.A. Coetzee, E.J. van Niekerk
Paperback
R649
Discovery Miles 6 490
Architectural Wireless Networks…
Santosh Kumar Das, Sourav Samanta, …
Hardcover
R5,132
Discovery Miles 51 320
We Belong - 50 Strategies to Create…
Laurie Barron, Patti Kinney
Paperback
Languages and Compilers for Parallel…
Gheorghe Almasi, Calin Cascaval, …
Paperback
R1,547
Discovery Miles 15 470
|