![]() |
![]() |
Your cart is empty |
||
Books > Computing & IT > Applications of computing > Databases > Data mining
In order to make informed decisions, there are three important elements: intuition, trust, and analytics. Intuition is based on experiential learning and recent research has shown that those who rely on their "gut feelings" may do better than those who don't. Analytics, however, are important in a data-driven environment to also inform decision making. The third element, trust, is critical for knowledge sharing to take place. These three elements-intuition, analytics, and trust-make a perfect combination for decision making. This book gathers leading researchers who explore the role of these three elements in the process of decision-making.
The growth of machines and users of the Internet has led to the proliferation of all sorts of data concerning individuals, institutions, companies, governments, universities, and all kinds of known objects and events happening everywhere in daily life. Scientific knowledge is not an exception to the data boom. The phenomenon of data growth in science pushes forth as the number of scientific papers published doubles every 9-15 years, and the need for methods and tools to understand what is reported in scientific literature becomes evident. As the number of academicians and innovators swells, so do the number of publications of all types, yielding outlets of documents and depots of authors and institutions that need to be found in Bibliometric databases. These databases are dug into and treated to hand over metrics of research performance by means of Scientometrics that analyze the toil of individuals, institutions, journals, countries, and even regions of the world. The objective of this book is to assist students, professors, university managers, government, industry, and stakeholders in general, understand which are the main Bibliometric databases, what are the key research indicators, and who are the main players in university rankings and the methodologies and approaches that they employ in producing ranking tables. The book is divided into two sections. The first looks at Scientometric databases, including Scopus and Google Scholar as well as institutional repositories. The second section examines the application of Scientometrics to world-class universities and the role that Scientometrics can play in competition among them. It looks at university rankings and the methodologies used to create these rankings. Individual chapters examine specific rankings that include: QS World University Scimago Institutions Webometrics U-Multirank U.S. News & World Report The book concludes with a discussion of university performance in the age of research analytics.
Algorithms and Applications for Academic Search, Recommendation and Quantitative Association Rule Mining presents novel algorithms for academic search, recommendation and association rule mining that have been developed and optimized for different commercial as well as academic purpose systems. Along with the design and implementation of algorithms, a major part of the work presented in the book involves the development of new systems both for commercial as well as for academic use. In the first part of the book the author introduces a novel hierarchical heuristic scheme for re-ranking academic publications retrieved from standard digital libraries. The scheme is based on the hierarchical combination of a custom implementation of the term frequency heuristic, a time-depreciated citation score and a graph-theoretic computed score that relates the paper's index terms with each other. In order to evaluate the performance of the introduced algorithms, a meta-search engine has been designed and developed that submits user queries to standard digital repositories of academic publications and re-ranks the top-n results using the introduced hierarchical heuristic scheme. In the second part of the book the design of novel recommendation algorithms with application in different types of e-commerce systems are described. The newly introduced algorithms are a part of a developed Movie Recommendation system, the first such system to be commercially deployed in Greece by a major Triple Play services provider. The initial version of the system uses a novel hybrid recommender (user, item and content based) and provides daily recommendations to all active subscribers of the provider (currently more than 30,000). The recommenders that we are presenting are hybrid by nature, using an ensemble configuration of different content, user as well as item-based recommenders in order to provide more accurate recommendation results. The final part of the book presents the design of a quantitative association rule mining algorithm. Quantitative association rules refer to a special type of association rules of the form that antecedent implies consequent consisting of a set of numerical or quantitative attributes. The introduced mining algorithm processes a specific number of user histories in order to generate a set of association rules with a minimally required support and confidence value. The generated rules show strong relationships that exist between the consequent and the antecedent of each rule, representing different items that have been consumed at specific price levels. This research book will be of appeal to researchers, graduate students, professionals, engineers and computer programmers.
This compendium is a completely revised version of an earlier book, Data Mining in Time Series Databases, by the same editors. It provides a unique collection of new articles written by leading experts that account for the latest developments in the field of time series and data stream mining.The emerging topics covered by the book include weightless neural modeling for mining data streams, using ensemble classifiers for imbalanced and evolving data streams, document stream mining with active learning, and many more. In particular, it addresses the domain of streaming data, which has recently become one of the emerging topics in Data Science, Big Data, and related areas. Existing titles do not provide sufficient information on this topic.
This book presents a series of studies that demonstrate the value of interactions between knowledge management with the arts and humanities. The carefully compiled chapters show, on the one hand, how traditional methods from the arts and humanities - e.g. theatrical improvisation, clay modelling, theory of aesthetics - can be used to enhance knowledge creation and evolution. On the other, the chapters discuss knowledge management models and practices such as virtual knowledge space (BA) design, social networking and knowledge sharing, data mining and knowledge discovery tools. The book also demonstrates how these practices can yield valuable benefits in terms of organizing and analyzing big arts and humanities data in a digital environment.
Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you'll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This third edition-updated for Cassandra 4.0-provides the technical details and practical examples you need to put this database to work in a production environment. Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra's nonrelational design, with special attention to data modeling. If you're a developer, DBA, or application architect looking to solve a database scaling issue or future-proof your application, this guide helps you harness Cassandra's speed and flexibility. Understand Cassandra's distributed and decentralized structure Use the Cassandra Query Language (CQL) and cqlsh-the CQL shell Create a working data model and compare it with an equivalent relational model Develop sample applications using client drivers for languages including Java, Python, and Node.js Explore cluster topology and learn how nodes exchange data
This book has two main goals: to define data science through the work of data scientists and their results, namely data products, while simultaneously providing the reader with relevant lessons learned from applied data science projects at the intersection of academia and industry. As such, it is not a replacement for a classical textbook (i.e., it does not elaborate on fundamentals of methods and principles described elsewhere), but systematically highlights the connection between theory, on the one hand, and its application in specific use cases, on the other. With these goals in mind, the book is divided into three parts: Part I pays tribute to the interdisciplinary nature of data science and provides a common understanding of data science terminology for readers with different backgrounds. These six chapters are geared towards drawing a consistent picture of data science and were predominantly written by the editors themselves. Part II then broadens the spectrum by presenting views and insights from diverse authors - some from academia and some from industry, ranging from financial to health and from manufacturing to e-commerce. Each of these chapters describes a fundamental principle, method or tool in data science by analyzing specific use cases and drawing concrete conclusions from them. The case studies presented, and the methods and tools applied, represent the nuts and bolts of data science. Finally, Part III was again written from the perspective of the editors and summarizes the lessons learned that have been distilled from the case studies in Part II. The section can be viewed as a meta-study on data science across a broad range of domains, viewpoints and fields. Moreover, it provides answers to the question of what the mission-critical factors for success in different data science undertakings are. The book targets professionals as well as students of data science: first, practicing data scientists in industry and academia who want to broaden their scope and expand their knowledge by drawing on the authors' combined experience. Second, decision makers in businesses who face the challenge of creating or implementing a data-driven strategy and who want to learn from success stories spanning a range of industries. Third, students of data science who want to understand both the theoretical and practical aspects of data science, vetted by real-world case studies at the intersection of academia and industry.
Data analysis is of upmost importance in the mining of big data, where knowledge discovery and inference are the basis for intelligent systems to support the real world applications. However, the process involves knowledge acquisition, representation, inference and data, Bayesian network (BN) is the key technology plays a key role in knowledge representation, in order to pave way to cope with incomplete, fuzzy data to solve the real-life problems.This book presents Bayesian network as a technology to support data-intensive and incremental learning in knowledge discovery, inference and data fusion in uncertain environment.
This book presents a broad range of deep-learning applications related to vision, natural language processing, gene expression, arbitrary object recognition, driverless cars, semantic image segmentation, deep visual residual abstraction, brain-computer interfaces, big data processing, hierarchical deep learning networks as game-playing artefacts using regret matching, and building GPU-accelerated deep learning frameworks. Deep learning, an advanced level of machine learning technique that combines class of learning algorithms with the use of many layers of nonlinear units, has gained considerable attention in recent times. Unlike other books on the market, this volume addresses the challenges of deep learning implementation, computation time, and the complexity of reasoning and modeling different type of data. As such, it is a valuable and comprehensive resource for engineers, researchers, graduate students and Ph.D. scholars.
This volume features selected, refereed papers on various aspects of statistics, matrix theory and its applications to statistics, as well as related numerical linear algebra topics and numerical solution methods, which are relevant for problems arising in statistics and in big data. The contributions were originally presented at the 25th International Workshop on Matrices and Statistics (IWMS 2016), held in Funchal (Madeira), Portugal on June 6-9, 2016. The IWMS workshop series brings together statisticians, computer scientists, data scientists and mathematicians, helping them better understand each other's tools, and fostering new collaborations at the interface of matrix theory and statistics.
Cognitive Information Systems in Management Sciences summarizes the body of work in this area, taking an analytical approach to interpreting the data, while also providing an approach that can be used for practical implementation in the fields of computing, economics, and engineering. Using numerous illustrative examples, and following both theoretical and practical results, Dr. Lidia Ogiela discusses the concepts and principles of cognitive information systems, the relationship between intelligent computer data analysis, and how to utilize computational intelligent approaches to enhance information retrieval. Real world implantation use cases round out the book, with valuable scenarios covering management science, computer science, and engineering. Indexing: The books of this series are submitted to EI-Compendex and SCOPUS
The book presents the results of studies on selected problems (such as predictive model of transcription initiation and termination, protein recognition codes, protein structure prediction, feature selection for disease prediction, information retrieval from medical imaging) of Bioinformatics and Information Retrieval. Information Retrieval is one of the contemporary answers to new challenges in threat evaluation of composite systems. This book provides a practical course in computational data analysis suitable for students or researchers with no previous exposure to computer programming. It describes in detail the theoretical basis for statistical analysis techniques used throughout the textbook, from basic principles. It presents walk-throughs of data analysis tasks using different tools to help in taking decisions in healthcare management.
Web mining is the application of data mining strategies to excerpt learning from web information, i.e. web content, web structure, and web usage data. With the emergence of the web as the predominant and converging platform for communication, business and scholastic information dissemination, especially in the last five years, there are ever increasing research groups working on different aspects of web mining mainly in three directions. These are: mining of web content, web structure and web usage. In this context there are good number of frameworks and benchmarks related to the metrics of the websites which is certainly weighty for B2B, B2C and in general in any e-commerce paradigm. Owing to the popularity of this topic there are few books in the market, dealing more on such performance metrics and other related issues. This book, however, omits all such routine topics and lays more emphasis on the classification and clustering aspects of the websites in order to come out with the true perception of the websites in light of its usability. In nutshell, Web Mining: A Synergic Approach Resorting to Classifications and Clustering showcases an effective methodology for classification and clustering of web sites from their usability point of view. While the clustering and classification is accomplished by using an open source tool WEKA, the basic dataset for the selected websites has been emanated by using a free tool site-analyzer. As a case study, several commercial websites have been analyzed. The dataset preparation using site-analyzer and classification through WEKA by embedding different algorithms is one of the unique selling points of this book. This text projects a complete spectrum of web mining from its very inception through data mining and takes the reader up to the application level. Salient features of the book include: - Literature review of research work in the area of web mining - Business websites domain researched, and data collected using site-analyzer tool - Accessibility, design, text, multimedia, and networking are assessed - Datasets are filtered further by selecting vital attributes which are Search Engine Optimized for processing using the Weka attributed tool - Dataset with labels have been classified using J48, RBFNetwork, NaiveBayes, and SMO techniques using Weka - A comparative analysis of all classifiers is reported - Commercial applications for improving website performance based on SEO is given
Build predictive models from time-based patterns in your data. Master statistical models including new deep learning approaches for time series forecasting. In Time Series Forecasting in Python you will learn how to: Recognize a time series forecasting problem and build a performant predictive model Create univariate forecasting models that account for seasonal effects and external variables Build multivariate forecasting models to predict many time series at once Leverage large datasets by using deep learning for forecasting time series Automate the forecasting process DESCRIPTION Time Series Forecasting in Python teaches you to build powerful predictive models from time-based data. Every model you create is relevant, useful, and easy to implement with Python. You'll explore interesting real-world datasets like Google's daily stock price and economic data for the USA, quickly progressing from the basics to developing large-scale models that use deep learning tools like TensorFlow.Time Series Forecasting in Python teaches you to apply time series forecasting and get immediate, meaningful predictions. You'll learn both traditional statistical and new deep learning models for time series forecasting, all fully illustrated with Python source code. Time Series Forecasting in Python teaches you to build powerful predictive models from time-based data. Every model you create is relevant, useful, and easy to implement with Python. You'll explore interesting real-world datasets like Google's daily stock price and economic data for the USA, quickly progressing from the basics to developing large-scale models that use deep learning tools like TensorFlow. about the technology Time series forecasting reveals hidden trends and makes predictions about the future from your data. This powerful technique has proven incredibly valuable across multiple fields-from tracking business metrics, to healthcare and the sciences. Modern Python libraries and powerful deep learning tools have opened up new methods and utilities for making practical time series forecasts. about the book Time Series Forecasting in Python teaches you to apply time series forecasting and get immediate, meaningful predictions. You'll learn both traditional statistical and new deep learning models for time series forecasting, all fully illustrated with Python source code. Test your skills with hands-on projects for forecasting air travel, volume of drug prescriptions, and the earnings of Johnson & Johnson. By the time you're done, you'll be ready to build accurate and insightful forecasting models with tools from the Python ecosystem.
This book presents the recent achievements on the processing of representative user generated content (UGC) on E-commerce websites. This large size of UGC is valuable information for data mining to help customer/object profiling. It provides a comprehensive overview on the concept of customer credibility, object-oriented review summarization technology and content-based collaborative filtering algorithm. It covers a feedback mechanism which is designed to discover customer credibility, which is used to define the professional degree of review content; product-oriented review summarization for restaurants or trip arrangements, and introduced content-based collaborative filtering for product recommendation.
This book presents the recent achievements on the processing of representative user generated content (UGC) on E-commerce websites. This large size of UGC is valuable information for data mining to help customer/object profiling. It provides a comprehensive overview on the concept of customer credibility, object-oriented review summarization technology and content-based collaborative filtering algorithm. It covers a feedback mechanism which is designed to discover customer credibility, which is used to define the professional degree of review content; product-oriented review summarization for restaurants or trip arrangements, and introduced content-based collaborative filtering for product recommendation.
This book presents state-of-the-art research on intrusion detection using reinforcement learning, fuzzy and rough set theories, and genetic algorithm. Reinforcement learning is employed to incrementally learn the computer network behavior, while rough and fuzzy sets are utilized to handle the uncertainty involved in the detection of traffic anomaly to secure data resources from possible attack. Genetic algorithms make it possible to optimally select the network traffic parameters to reduce the risk of network intrusion. The book is unique in terms of its content, organization, and writing style. Primarily intended for graduate electrical and computer engineering students, it is also useful for doctoral students pursuing research in intrusion detection and practitioners interested in network security and administration. The book covers a wide range of applications, from general computer security to server, network, and cloud security.
Group method of data handling (GMDH) is a typical inductive modeling method built on the principles of self-organization. Since its introduction, inductive modelling has been developed to support complex systems in prediction, clusterization, system identification, as well as data mining and knowledge extraction technologies in social science, science, engineering, and medicine.This is the first book to explore GMDH using MATLAB (matrix laboratory) language. Readers will learn how to implement GMDH in MATLAB as a method of dealing with big data analytics. Error-free source codes in MATLAB have been included in supplementary material (accessible online) to assist users in their understanding in GMDH and to make it easy for users to further develop variations of GMDH algorithms.
What is text mining, and how can it be used? What relevance do these methods have to everyday work in information science and the digital humanities? How does one develop competences in text mining? Working with Text provides a series of cross-disciplinary perspectives on text mining and its applications. As text mining raises legal and ethical issues, the legal background of text mining and the responsibilities of the engineer are discussed in this book. Chapters provide an introduction to the use of the popular GATE text mining package with data drawn from social media, the use of text mining to support semantic search, the development of an authority system to support content tagging, and recent techniques in automatic language evaluation. Focused studies describe text mining on historical texts, automated indexing using constrained vocabularies, and the use of natural language processing to explore the climate science literature. Interviews are included that offer a glimpse into the real-life experience of working within commercial and academic text mining.
The Art and Science of Analyzing Software Data provides valuable information on analysis techniques often used to derive insight from software data. This book shares best practices in the field generated by leading data scientists, collected from their experience training software engineering students and practitioners to master data science. The book covers topics such as the analysis of security data, code reviews, app stores, log files, and user telemetry, among others. It covers a wide variety of techniques such as co-change analysis, text analysis, topic analysis, and concept analysis, as well as advanced topics such as release planning and generation of source code comments. It includes stories from the trenches from expert data scientists illustrating how to apply data analysis in industry and open source, present results to stakeholders, and drive decisions.
This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part "Technologies and Methods" contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part "Processes and Applications" details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems.
During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.
Decision trees have become one of the most powerful and popular approaches in knowledge discovery and data mining; it is the science of exploring large and complex bodies of data in order to discover useful patterns. Decision tree learning continues to evolve over time. Existing methods are constantly being improved and new methods introduced.This 2nd Edition is dedicated entirely to the field of decision trees in data mining; to cover all aspects of this important technique, as well as improved or new methods and techniques developed after the publication of our first edition. In this new edition, all chapters have been revised and new topics brought in. New topics include Cost-Sensitive Active Learning, Learning with Uncertain and Imbalanced Data, Using Decision Trees beyond Classification Tasks, Privacy Preserving Decision Tree Learning, Lessons Learned from Comparative Studies, and Learning Decision Trees for Big Data. A walk-through guide to existing open-source data mining software is also included in this edition.This book invites readers to explore the many benefits in data mining that decision trees offer:
Biologists are stepping up their efforts in understanding the biological processes that underlie disease pathways in the clinical contexts. This has resulted in a flood of biological and clinical data from genomic and protein sequences, DNA microarrays, protein interactions, biomedical images, to disease pathways and electronic health records. To exploit these data for discovering new knowledge that can be translated into clinical applications, there are fundamental data analysis difficulties that have to be overcome. Practical issues such as handling noisy and incomplete data, processing compute-intensive tasks, and integrating various data sources, are new challenges faced by biologists in the post-genome era. This book will cover the fundamentals of state-of-the-art data mining techniques which have been designed to handle such challenging data analysis problems, and demonstrate with real applications how biologists and clinical scientists can employ data mining to enable them to make meaningful observations and discoveries from a wide array of heterogeneous data from molecular biology to pharmaceutical and clinical domains.
Originating from Facebook, LinkedIn, Twitter, Instagram, YouTube, and many other networking sites, the social media shared by users and the associated metadata are collectively known as user generated content (UGC). To analyze UGC and glean insight about user behavior, robust techniques are needed to tackle the huge amount of real-time, multimedia, and multilingual data. Researchers must also know how to assess the social aspects of UGC, such as user relations and influential users. Mining User Generated Content is the first focused effort to compile state-of-the-art research and address future directions of UGC. It explains how to collect, index, and analyze UGC to uncover social trends and user habits. Divided into four parts, the book focuses on the mining and applications of UGC. The first part presents an introduction to this new and exciting topic. Covering the mining of UGC of different medium types, the second part discusses the social annotation of UGC, social network graph construction and community mining, mining of UGC to assist in music retrieval, and the popular but difficult topic of UGC sentiment analysis. The third part describes the mining and searching of various types of UGC, including knowledge extraction, search techniques for UGC content, and a specific study on the analysis and annotation of Japanese blogs. The fourth part on applications explores the use of UGC to support question-answering, information summarization, and recommendations. |
![]() ![]() You may like...
New Opportunities for Sentiment Analysis…
Aakanksha Sharaff, G. R. Sinha, …
Hardcover
R7,372
Discovery Miles 73 720
Computational Intelligence in Data…
Vallidevi Krishnamurthy, Suresh Jaganathan, …
Hardcover
R2,621
Discovery Miles 26 210
Clinical Decision Support and Beyond…
Robert Greenes, Guilherme Del Fiol
Paperback
Big Data and Smart Service Systems
Xiwei Liu, Rangachari Anand, …
Hardcover
Opinion Mining and Text Analytics on…
Pantea Keikhosrokiani, Moussa Pourya Asl
Hardcover
R10,307
Discovery Miles 103 070
|