![]() |
![]() |
Your cart is empty |
||
Books > Computing & IT > Applications of computing > Databases > Data mining
This book describes various methods and recent advances in predictive computing and information security. It highlights various predictive application scenarios to discuss these breakthroughs in real-world settings. Further, it addresses state-of-art techniques and the design, development and innovative use of technologies for enhancing predictive computing and information security. Coverage also includes the frameworks for eTransportation and eHealth, security techniques, and algorithms for predictive computing and information security based on Internet-of-Things and Cloud computing. As such, the book offers a valuable resource for graduate students and researchers interested in exploring predictive modeling techniques and architectures to solve information security, privacy and protection issues in future communication.
This book discusses the effective use of modern ICT solutions for business needs, including the efficient use of IT resources, decision support systems, business intelligence, data mining and advanced data processing algorithms, as well as the processing of large datasets (inter alia social networking such as Twitter and Facebook, etc.). The ability to generate, record and process qualitative and quantitative data, including in the area of big data, the Internet of Things (IoT) and cloud computing offers a real prospect of significant improvements for business, as well as the operation of a company within Industry 4.0. The book presents new ideas, approaches, solutions and algorithms in the area of knowledge representation, management and processing, quantitative and qualitative data processing (including sentiment analysis), problems of simulation performance, and the use of advanced signal processing to increase the speed of computation. The solutions presented are also aimed at the effective use of business process modeling and notation (BPMN), business process semantization and investment project portfolio selection. It is a valuable resource for researchers, data analysts, entrepreneurs and IT professionals alike, and the research findings presented make it possible to reduce costs, increase the accuracy of investment, optimize resources and streamline operations and marketing.
Statistics and hypothesis testing are routinely used in areas (such as linguistics) that are traditionally not mathematically intensive. In such fields, when faced with experimental data, many students and researchers tend to rely on commercial packages to carry out statistical data analysis, often without understanding the logic of the statistical tests they rely on. As a consequence, results are often misinterpreted, and users have difficulty in flexibly applying techniques relevant to their own research they use whatever they happen to have learned. A simple solution is to teach the fundamental ideas of statistical hypothesis testing without using too much mathematics. This book provides a non-mathematical, simulation-based introduction to basic statistical concepts and encourages readers to try out the simulations themselves using the source code and data provided (the freely available programming language R is used throughout). Since the code presented in the text almost always requires the use of previously introduced programming constructs, diligent students also acquire basic programming abilities in R. The book is intended for advanced undergraduate and graduate students in any discipline, although the focus is on linguistics, psychology, and cognitive science. It is designed for self-instruction, but it can also be used as a textbook for a first course on statistics. Earlier versions of the book have been used in undergraduate and graduate courses in Europe and the US. Vasishth and Broe have written an attractive introduction to the foundations of statistics. It is concise, surprisingly comprehensive, self-contained and yet quite accessible. Highly recommended. Harald Baayen, Professor of Linguistics, University of Alberta, Canada By using the text students not only learn to do the specific things outlined in the book, they also gain a skill set that empowers them to explore new areas that lie beyond the book s coverage. Colin Phillips, Professor of Linguistics, University of Maryland, USA
This open access book summarizes knowledge about several file systems and file formats commonly used in mobile devices. In addition to the fundamental description of the formats, there are hints about the forensic value of possible artefacts, along with an outline of tools that can decode the relevant data. The book is organized into two distinct parts: Part I describes several different file systems that are commonly used in mobile devices. * APFS is the file system that is used in all modern Apple devices including iPhones, iPads, and even Apple Computers, like the MacBook series. * Ext4 is very common in Android devices and is the successor of the Ext2 and Ext3 file systems that were commonly used on Linux-based computers. * The Flash-Friendly File System (F2FS) is a Linux system designed explicitly for NAND Flash memory, common in removable storage devices and mobile devices, which Samsung Electronics developed in 2012. * The QNX6 file system is present in Smartphones delivered by Blackberry (e.g. devices that are using Blackberry 10) and modern vehicle infotainment systems that use QNX as their operating system. Part II describes five different file formats that are commonly used on mobile devices. * SQLite is nearly omnipresent in mobile devices with an overwhelming majority of all mobile applications storing their data in such databases. * The second leading file format in the mobile world are Property Lists, which are predominantly found on Apple devices. * Java Serialization is a popular technique for storing object states in the Java programming language. Mobile application (app) developers very often resort to this technique to make their application state persistent. * The Realm database format has emerged over recent years as a possible successor to the now ageing SQLite format and has begun to appear as part of some modern applications on mobile devices. * Protocol Buffers provide a format for taking compiled data and serializing it by turning it into bytes represented in decimal values, which is a technique commonly used in mobile devices. The aim of this book is to act as a knowledge base and reference guide for digital forensic practitioners who need knowledge about a specific file system or file format. It is also hoped to provide useful insight and knowledge for students or other aspiring professionals who want to work within the field of digital forensics. The book is written with the assumption that the reader will have some existing knowledge and understanding about computers, mobile devices, file systems and file formats.
Data-mining has become a popular research topic in recent years for the treatment of the "data rich and information poor" syndrome. Currently, application oriented engineers are only concerned with their immediate problems, which results in an ad hoc method of problem solving. Researchers, on the other hand, lack an understanding of the practical issues of data-mining for real-world problems and often concentrate on issues that are of no significance to the practitioners. In this volume, we hope to remedy problems by (1) presenting a theoretical foundation of data-mining, and (2) providing important new directions for data-mining research. A set of well respected data mining theoreticians were invited to present their views on the fundamental science of data mining. We have also called on researchers with practical data mining experiences to present new important data-mining topics.
Pattern recognition in data is a well known classical problem that falls under the ambit of data analysis. As we need to handle different data, the nature of patterns, their recognition and the types of data analyses are bound to change. Since the number of data collection channels increases in the recent time and becomes more diversified, many real-world data mining tasks can easily acquire multiple databases from various sources. In these cases, data mining becomes more challenging for several essential reasons. We may encounter sensitive data originating from different sources - those cannot be amalgamated. Even if we are allowed to place different data together, we are certainly not able to analyze them when local identities of patterns are required to be retained. Thus, pattern recognition in multiple databases gives rise to a suite of new, challenging problems different from those encountered before. Association rule mining, global pattern discovery and mining patterns of select items provide different patterns discovery techniques in multiple data sources. Some interesting item-based data analyses are also covered in this book. Interesting patterns, such as exceptional patterns, icebergs and periodic patterns have been recently reported. The book presents a thorough influence analysis between items in time-stamped databases. The recent research on mining multiple related databases is covered while some previous contributions to the area are highlighted and contrasted with the most recent developments.
This book is a collection of representative and novel works in the field of data mining, knowledge discovery, clustering and classification. Discussing both theoretical and practical aspects of "Knowledge Discovery and Management" (KDM), it is intended for researchers interested in these fields, including PhD and MSc students, and researchers from public or private laboratories. The contributions included are extended and reworked versions of six of the best papers that were originally presented in French at the EGC'2016 conference held in Reims (France) in January 2016. This was the 16th edition of this successful conference, which takes place each year, and also featured workshops and other events with the aim of promoting exchanges between researchers and companies concerned with KDM and its applications in business, administration, industry and public organizations. For more details about the EGC society, please consult egc.asso.fr.
As we entered the 21st century, the rapid growth of information technology has changed our lives more conveniently than we have ever speculated. Recently in all fields of the industry, heterogeneous technologies have converged with information technology resulting in a new paradigm, information technology convergence. In the process of information technology convergence, the latest issues in the structure of data, system, network, and infrastructure have become the most challenging task. Proceedings of the International Conference on IT Convergence and Security 2011 approaches the subject matter with problems in technical convergence and convergences of security technology by looking at new issues that arise from techniques converging. The general scope is convergence security and the latest information technology with the following most important features and benefits: 1. Introduction of the most recent information technology and its related ideas 2. Applications and problems related to technology convergence, and its case studies 3. Introduction of converging existing security techniques through convergence security Overall, after reading Proceedings of the International Conference on IT Convergence and Security 2011, readers will understand the most state of the art information strategies and technologies of convergence security.
This book provides an overview of how comparable corpora can be used to overcome the lack of parallel resources when building machine translation systems for under-resourced languages and domains. It presents a wealth of methods and open tools for building comparable corpora from the Web, evaluating comparability and extracting parallel data that can be used for the machine translation task. It is divided into several sections, each covering a specific task such as building, processing, and using comparable corpora, focusing particularly on under-resourced language pairs and domains. The book is intended for anyone interested in data-driven machine translation for under-resourced languages and domains, especially for developers of machine translation systems, computational linguists and language workers. It offers a valuable resource for specialists and students in natural language processing, machine translation, corpus linguistics and computer-assisted translation, and promotes the broader use of comparable corpora in natural language processing and computational linguistics.
Data has increased due to the growing use of web applications and communication devices. It is necessary to develop new techniques of managing data in order to ensure adequate usage. Modern Technologies for Big Data Classification and Clustering is an essential reference source for the latest scholarly research on handling large data sets with conventional data mining and provide information about the new technologies developed for the management of large data. Featuring coverage on a broad range of topics such as text and web data analytics, risk analysis, and opinion mining, this publication is ideally designed for professionals, researchers, and students seeking current research on various concepts of big data analytics. Topics Covered: The many academic areas covered in this publication include, but are not limited to: Data visualization Distributed Computing Systems Opinion Mining Privacy and security Risk analysis Social Network Analysis Text Data Analytics Web Data Analytics
Learn how to apply the principles of machine learning to time series modeling with this indispensable resource Machine Learning for Time Series Forecasting with Python is an incisive and straightforward examination of one of the most crucial elements of decision-making in finance, marketing, education, and healthcare: time series modeling. Despite the centrality of time series forecasting, few business analysts are familiar with the power or utility of applying machine learning to time series modeling. Author Francesca Lazzeri, a distinguished machine learning scientist and economist, corrects that deficiency by providing readers with comprehensive and approachable explanation and treatment of the application of machine learning to time series forecasting. Written for readers who have little to no experience in time series forecasting or machine learning, the book comprehensively covers all the topics necessary to: Understand time series forecasting concepts, such as stationarity, horizon, trend, and seasonality Prepare time series data for modeling Evaluate time series forecasting models' performance and accuracy Understand when to use neural networks instead of traditional time series models in time series forecasting Machine Learning for Time Series Forecasting with Python is full real-world examples, resources and concrete strategies to help readers explore and transform data and develop usable, practical time series forecasts. Perfect for entry-level data scientists, business analysts, developers, and researchers, this book is an invaluable and indispensable guide to the fundamental and advanced concepts of machine learning applied to time series modeling.
Representation learning in heterogeneous graphs (HG) is intended to provide a meaningful vector representation for each node so as to facilitate downstream applications such as link prediction, personalized recommendation, node classification, etc. This task, however, is challenging not only because of the need to incorporate heterogeneous structural (graph) information consisting of multiple types of node and edge, but also the need to consider heterogeneous attributes or types of content (e.g. text or image) associated with each node. Although considerable advances have been made in homogeneous (and heterogeneous) graph embedding, attributed graph embedding and graph neural networks, few are capable of simultaneously and effectively taking into account heterogeneous structural (graph) information as well as the heterogeneous content information of each node. In this book, we provide a comprehensive survey of current developments in HG representation learning. More importantly, we present the state-of-the-art in this field, including theoretical models and real applications that have been showcased at the top conferences and journals, such as TKDE, KDD, WWW, IJCAI and AAAI. The book has two major objectives: (1) to provide researchers with an understanding of the fundamental issues and a good point of departure for working in this rapidly expanding field, and (2) to present the latest research on applying heterogeneous graphs to model real systems and learning structural features of interaction systems. To the best of our knowledge, it is the first book to summarize the latest developments and present cutting-edge research on heterogeneous graph representation learning. To gain the most from it, readers should have a basic grasp of computer science, data mining and machine learning.
This book discusses the application of data systems and data-driven infrastructure in existing industrial systems in order to optimize workflow, utilize hidden potential, and make existing systems free from vulnerabilities. The book discusses application of data in the health sector, public transportation, the financial institutions, and in battling natural disasters, among others. Topics include real-time applications in the current big data perspective; improving security in IoT devices; data backup techniques for systems; artificial intelligence-based outlier prediction; machine learning in OpenFlow Network; and application of deep learning in blockchain enabled applications. This book is intended for a variety of readers from professional industries, organizations, and students.
Advances in hardware technology have lead to an ability to collect data with the use of a variety of sensor technologies. In particular sensor notes have become cheaper and more efficient, and have even been integrated into day-to-day devices of use, such as mobile phones. This has lead to a much larger scale of applicability and mining of sensor data sets. The human-centric aspect of sensor data has created tremendous opportunities in integrating social aspects of sensor data collection into the mining process. Managing and Mining Sensor Data is a contributed volume by prominent leaders in this field, targeting advanced-level students in computer science as a secondary text book or reference. Practitioners and researchers working in this field will also find this book useful.
Dirty data is a problem that costs businesses thousands, if not millions, every year. In organisations large and small across the globe you will hear talk of data quality issues. What you will rarely hear about is the consequences or how to fix it. Between the Spreadsheets: Classifying and Fixing Dirty Data draws on classification expert Susan Walsh's decade of experience in data classification to present a fool-proof method for cleaning and classifying your data. The book covers everything from the very basics of data classification to normalisation and taxonomies, and presents the author's proven COAT methodology, helping ensure an organisation's data is Consistent, Organised, Accurate and Trustworthy. A series of data horror stories outlines what can go wrong in managing data, and if it does, how it can be fixed. After reading this book, regardless of your level of experience, not only will you be able to work with your data more efficiently, but you will also understand the impact the work you do with it has, and how it affects the rest of the organisation. Written in an engaging and highly practical manner, Between the Spreadsheets gives readers of all levels a deep understanding of the dangers of dirty data and the confidence and skills to work more efficiently and effectively with it.
This book presents established and state-of-the-art methods in Language Technology (including text mining, corpus linguistics, computational linguistics, and natural language processing), and demonstrates how they can be applied by humanities scholars working with textual data. The landscape of humanities research has recently changed thanks to the proliferation of big data and large textual collections such as Google Books, Early English Books Online, and Project Gutenberg. These resources have yet to be fully explored by new generations of scholars, and the authors argue that Language Technology has a key role to play in the exploration of large-scale textual data. The authors use a series of illustrative examples from various humanistic disciplines (mainly but not exclusively from History, Classics, and Literary Studies) to demonstrate basic and more complex use-case scenarios. This book will be useful to graduate students and researchers in humanistic disciplines working with textual data, including History, Modern Languages, Literary studies, Classics, and Linguistics. This is also a very useful book for anyone teaching or learning Digital Humanities and interested in the basic concepts from computational linguistics, corpus linguistics, and natural language processing.
This is the second edition of the comprehensive treatment of statistical inference using permutation techniques. It makes available to practitioners a variety of useful and powerful data analytic tools that rely on very few distributional assumptions. Although many of these procedures have appeared in journal articles, they are not readily available to practitioners. This new and updated edition places increased emphasis on the use of alternative permutation statistical tests based on metric Euclidean distance functions that have excellent robustness characteristics. These alternative permutation techniques provide many powerful multivariate tests including multivariate multiple regression analyses.
Data mining essentially relies on several mathematical disciplines, many of which are presented in this second edition of this book. Topics include partially ordered sets, combinatorics, general topology, metric spaces, linear spaces, graph theory. To motivate the reader a significant number of applications of these mathematical tools are included ranging from association rules, clustering algorithms, classification, data constraints, logical data analysis, etc. The book is intended as a reference for researchers and graduate students. The current edition is a significant expansion of the first edition. We strived to make the book self-contained and only a general knowledge of mathematics is required. More than 700 exercises are included and they form an integral part of the material. Many exercises are in reality supplemental material and their solutions are included.
These proceedings gather outstanding research papers presented at the Second International Conference on Data Engineering 2015 (DaEng-2015) and offer a consolidated overview of the latest developments in databases, information retrieval, data mining and knowledge management. The conference brought together researchers and practitioners from academia and industry to address key challenges in these fields, discuss advanced data engineering concepts and form new collaborations. The topics covered include but are not limited to: * Data engineering * Big data * Data and knowledge visualization * Data management * Data mining and warehousing * Data privacy & security * Database theory * Heterogeneous databases * Knowledge discovery in databases * Mobile, grid and cloud computing * Knowledge management * Parallel and distributed data * Temporal data * Web data, services and information engineering * Decision support systems * E-Business engineering and management * E-commerce and e-learning * Geographical information systems * Information management * Information quality and strategy * Information retrieval, integration and visualization * Information security * Information systems and technologies
This book organizes key concepts, theories, standards, methodologies, trends, challenges and applications of data mining and knowledge discovery in databases. It first surveys, then provides comprehensive yet concise algorithmic descriptions of methods, including classic methods plus the extensions and novel methods developed recently. It also gives in-depth descriptions of data mining applications in various interdisciplinary industries.
Overcoming many challenges, data mining has already established discipline capability in many domains. ""Dynamic and Advanced Data Mining for Progressing Technological Development: Innovations and Systemic Approaches"" discusses advances in modern data mining research in today's rapidly growing global and technological environment. A critical mass of the most sought after knowledge, this publication serves as an important reference tool to leading research within information search and retrieval techniques.
As data mining algorithms are typically applied to sizable volumes of high-dimensional data, these can result in large storage requirements and inefficient computation times. This unique text/reference addresses the challenges of data abstraction generation using a least number of database scans, compressing data through novel lossy and non-lossy schemes, and carrying out clustering and classification directly in the compressed domain. Schemes are presented which are shown to be efficient both in terms of space and time, while simultaneously providing the same or better classification accuracy, as illustrated using high-dimensional handwritten digit data and a large intrusion detection dataset. Topics and features: presents a concise introduction to data mining paradigms, data compression, and mining compressed data; describes a non-lossy compression scheme based on run-length encoding of patterns with binary valued features; proposes a lossy compression scheme that recognizes a pattern as a sequence of features and identifying subsequences; examines whether the identification of prototypes and features can be achieved simultaneously through lossy compression and efficient clustering; discusses ways to make use of domain knowledge in generating abstraction; reviews optimal prototype selection using genetic algorithms; suggests possible ways of dealing with big data problems using multiagent systems. A must-read for all researchers involved in data mining and big data, the book proposes each algorithm within a discussion of the wider context, implementation details and experimental results. These are further supported by bibliographic notes and a glossary.""" |
![]() ![]() You may like...
The Little SAS Enterprise Guide Book
Susan J Slaughter, Lora D Delwiche
Hardcover
R1,936
Discovery Miles 19 360
14th International Symposium on Process…
Yoshiyuki Yamashita, Manabu Kano
Hardcover
R11,801
Discovery Miles 118 010
Modern Approaches to Clinical Trials…
Sandeep Menon, Richard C Zink
Hardcover
R2,560
Discovery Miles 25 600
JMP Essentials - An Illustrated Guide…
Curt Hinrichs, Chuck Boiler, …
Hardcover
R2,231
Discovery Miles 22 310
|