Welcome to Loot.co.za!
Sign in / Register |Wishlists & Gift Vouchers |Help | Advanced search
|
Your cart is empty |
|||
Books > Computing & IT > Applications of computing > Databases > Data mining
Algorithms and Applications for Academic Search, Recommendation and Quantitative Association Rule Mining presents novel algorithms for academic search, recommendation and association rule mining that have been developed and optimized for different commercial as well as academic purpose systems. Along with the design and implementation of algorithms, a major part of the work presented in the book involves the development of new systems both for commercial as well as for academic use. In the first part of the book the author introduces a novel hierarchical heuristic scheme for re-ranking academic publications retrieved from standard digital libraries. The scheme is based on the hierarchical combination of a custom implementation of the term frequency heuristic, a time-depreciated citation score and a graph-theoretic computed score that relates the paper's index terms with each other. In order to evaluate the performance of the introduced algorithms, a meta-search engine has been designed and developed that submits user queries to standard digital repositories of academic publications and re-ranks the top-n results using the introduced hierarchical heuristic scheme. In the second part of the book the design of novel recommendation algorithms with application in different types of e-commerce systems are described. The newly introduced algorithms are a part of a developed Movie Recommendation system, the first such system to be commercially deployed in Greece by a major Triple Play services provider. The initial version of the system uses a novel hybrid recommender (user, item and content based) and provides daily recommendations to all active subscribers of the provider (currently more than 30,000). The recommenders that we are presenting are hybrid by nature, using an ensemble configuration of different content, user as well as item-based recommenders in order to provide more accurate recommendation results. The final part of the book presents the design of a quantitative association rule mining algorithm. Quantitative association rules refer to a special type of association rules of the form that antecedent implies consequent consisting of a set of numerical or quantitative attributes. The introduced mining algorithm processes a specific number of user histories in order to generate a set of association rules with a minimally required support and confidence value. The generated rules show strong relationships that exist between the consequent and the antecedent of each rule, representing different items that have been consumed at specific price levels. This research book will be of appeal to researchers, graduate students, professionals, engineers and computer programmers.
This compendium is a completely revised version of an earlier book, Data Mining in Time Series Databases, by the same editors. It provides a unique collection of new articles written by leading experts that account for the latest developments in the field of time series and data stream mining.The emerging topics covered by the book include weightless neural modeling for mining data streams, using ensemble classifiers for imbalanced and evolving data streams, document stream mining with active learning, and many more. In particular, it addresses the domain of streaming data, which has recently become one of the emerging topics in Data Science, Big Data, and related areas. Existing titles do not provide sufficient information on this topic.
This book presents a series of studies that demonstrate the value of interactions between knowledge management with the arts and humanities. The carefully compiled chapters show, on the one hand, how traditional methods from the arts and humanities - e.g. theatrical improvisation, clay modelling, theory of aesthetics - can be used to enhance knowledge creation and evolution. On the other, the chapters discuss knowledge management models and practices such as virtual knowledge space (BA) design, social networking and knowledge sharing, data mining and knowledge discovery tools. The book also demonstrates how these practices can yield valuable benefits in terms of organizing and analyzing big arts and humanities data in a digital environment.
This book has two main goals: to define data science through the work of data scientists and their results, namely data products, while simultaneously providing the reader with relevant lessons learned from applied data science projects at the intersection of academia and industry. As such, it is not a replacement for a classical textbook (i.e., it does not elaborate on fundamentals of methods and principles described elsewhere), but systematically highlights the connection between theory, on the one hand, and its application in specific use cases, on the other. With these goals in mind, the book is divided into three parts: Part I pays tribute to the interdisciplinary nature of data science and provides a common understanding of data science terminology for readers with different backgrounds. These six chapters are geared towards drawing a consistent picture of data science and were predominantly written by the editors themselves. Part II then broadens the spectrum by presenting views and insights from diverse authors - some from academia and some from industry, ranging from financial to health and from manufacturing to e-commerce. Each of these chapters describes a fundamental principle, method or tool in data science by analyzing specific use cases and drawing concrete conclusions from them. The case studies presented, and the methods and tools applied, represent the nuts and bolts of data science. Finally, Part III was again written from the perspective of the editors and summarizes the lessons learned that have been distilled from the case studies in Part II. The section can be viewed as a meta-study on data science across a broad range of domains, viewpoints and fields. Moreover, it provides answers to the question of what the mission-critical factors for success in different data science undertakings are. The book targets professionals as well as students of data science: first, practicing data scientists in industry and academia who want to broaden their scope and expand their knowledge by drawing on the authors' combined experience. Second, decision makers in businesses who face the challenge of creating or implementing a data-driven strategy and who want to learn from success stories spanning a range of industries. Third, students of data science who want to understand both the theoretical and practical aspects of data science, vetted by real-world case studies at the intersection of academia and industry.
Handbook of Statistical Analysis and Data Mining Applications, Second Edition, is a comprehensive professional reference book that guides business analysts, scientists, engineers and researchers, both academic and industrial, through all stages of data analysis, model building and implementation. The handbook helps users discern technical and business problems, understand the strengths and weaknesses of modern data mining algorithms and employ the right statistical methods for practical application. This book is an ideal reference for users who want to address massive and complex datasets with novel statistical approaches and be able to objectively evaluate analyses and solutions. It has clear, intuitive explanations of the principles and tools for solving problems using modern analytic techniques and discusses their application to real problems in ways accessible and beneficial to practitioners across several areas-from science and engineering, to medicine, academia and commerce.
Data analysis is of upmost importance in the mining of big data, where knowledge discovery and inference are the basis for intelligent systems to support the real world applications. However, the process involves knowledge acquisition, representation, inference and data, Bayesian network (BN) is the key technology plays a key role in knowledge representation, in order to pave way to cope with incomplete, fuzzy data to solve the real-life problems.This book presents Bayesian network as a technology to support data-intensive and incremental learning in knowledge discovery, inference and data fusion in uncertain environment.
Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you'll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This third edition-updated for Cassandra 4.0-provides the technical details and practical examples you need to put this database to work in a production environment. Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra's nonrelational design, with special attention to data modeling. If you're a developer, DBA, or application architect looking to solve a database scaling issue or future-proof your application, this guide helps you harness Cassandra's speed and flexibility. Understand Cassandra's distributed and decentralized structure Use the Cassandra Query Language (CQL) and cqlsh-the CQL shell Create a working data model and compare it with an equivalent relational model Develop sample applications using client drivers for languages including Java, Python, and Node.js Explore cluster topology and learn how nodes exchange data
This book presents a broad range of deep-learning applications related to vision, natural language processing, gene expression, arbitrary object recognition, driverless cars, semantic image segmentation, deep visual residual abstraction, brain-computer interfaces, big data processing, hierarchical deep learning networks as game-playing artefacts using regret matching, and building GPU-accelerated deep learning frameworks. Deep learning, an advanced level of machine learning technique that combines class of learning algorithms with the use of many layers of nonlinear units, has gained considerable attention in recent times. Unlike other books on the market, this volume addresses the challenges of deep learning implementation, computation time, and the complexity of reasoning and modeling different type of data. As such, it is a valuable and comprehensive resource for engineers, researchers, graduate students and Ph.D. scholars.
The book presents the results of studies on selected problems (such as predictive model of transcription initiation and termination, protein recognition codes, protein structure prediction, feature selection for disease prediction, information retrieval from medical imaging) of Bioinformatics and Information Retrieval. Information Retrieval is one of the contemporary answers to new challenges in threat evaluation of composite systems. This book provides a practical course in computational data analysis suitable for students or researchers with no previous exposure to computer programming. It describes in detail the theoretical basis for statistical analysis techniques used throughout the textbook, from basic principles. It presents walk-throughs of data analysis tasks using different tools to help in taking decisions in healthcare management.
This volume features selected, refereed papers on various aspects of statistics, matrix theory and its applications to statistics, as well as related numerical linear algebra topics and numerical solution methods, which are relevant for problems arising in statistics and in big data. The contributions were originally presented at the 25th International Workshop on Matrices and Statistics (IWMS 2016), held in Funchal (Madeira), Portugal on June 6-9, 2016. The IWMS workshop series brings together statisticians, computer scientists, data scientists and mathematicians, helping them better understand each other's tools, and fostering new collaborations at the interface of matrix theory and statistics.
Web mining is the application of data mining strategies to excerpt learning from web information, i.e. web content, web structure, and web usage data. With the emergence of the web as the predominant and converging platform for communication, business and scholastic information dissemination, especially in the last five years, there are ever increasing research groups working on different aspects of web mining mainly in three directions. These are: mining of web content, web structure and web usage. In this context there are good number of frameworks and benchmarks related to the metrics of the websites which is certainly weighty for B2B, B2C and in general in any e-commerce paradigm. Owing to the popularity of this topic there are few books in the market, dealing more on such performance metrics and other related issues. This book, however, omits all such routine topics and lays more emphasis on the classification and clustering aspects of the websites in order to come out with the true perception of the websites in light of its usability. In nutshell, Web Mining: A Synergic Approach Resorting to Classifications and Clustering showcases an effective methodology for classification and clustering of web sites from their usability point of view. While the clustering and classification is accomplished by using an open source tool WEKA, the basic dataset for the selected websites has been emanated by using a free tool site-analyzer. As a case study, several commercial websites have been analyzed. The dataset preparation using site-analyzer and classification through WEKA by embedding different algorithms is one of the unique selling points of this book. This text projects a complete spectrum of web mining from its very inception through data mining and takes the reader up to the application level. Salient features of the book include: - Literature review of research work in the area of web mining - Business websites domain researched, and data collected using site-analyzer tool - Accessibility, design, text, multimedia, and networking are assessed - Datasets are filtered further by selecting vital attributes which are Search Engine Optimized for processing using the Weka attributed tool - Dataset with labels have been classified using J48, RBFNetwork, NaiveBayes, and SMO techniques using Weka - A comparative analysis of all classifiers is reported - Commercial applications for improving website performance based on SEO is given
This book presents state-of-the-art research on intrusion detection using reinforcement learning, fuzzy and rough set theories, and genetic algorithm. Reinforcement learning is employed to incrementally learn the computer network behavior, while rough and fuzzy sets are utilized to handle the uncertainty involved in the detection of traffic anomaly to secure data resources from possible attack. Genetic algorithms make it possible to optimally select the network traffic parameters to reduce the risk of network intrusion. The book is unique in terms of its content, organization, and writing style. Primarily intended for graduate electrical and computer engineering students, it is also useful for doctoral students pursuing research in intrusion detection and practitioners interested in network security and administration. The book covers a wide range of applications, from general computer security to server, network, and cloud security.
This book presents the recent achievements on the processing of representative user generated content (UGC) on E-commerce websites. This large size of UGC is valuable information for data mining to help customer/object profiling. It provides a comprehensive overview on the concept of customer credibility, object-oriented review summarization technology and content-based collaborative filtering algorithm. It covers a feedback mechanism which is designed to discover customer credibility, which is used to define the professional degree of review content; product-oriented review summarization for restaurants or trip arrangements, and introduced content-based collaborative filtering for product recommendation.
This book presents the recent achievements on the processing of representative user generated content (UGC) on E-commerce websites. This large size of UGC is valuable information for data mining to help customer/object profiling. It provides a comprehensive overview on the concept of customer credibility, object-oriented review summarization technology and content-based collaborative filtering algorithm. It covers a feedback mechanism which is designed to discover customer credibility, which is used to define the professional degree of review content; product-oriented review summarization for restaurants or trip arrangements, and introduced content-based collaborative filtering for product recommendation.
Group method of data handling (GMDH) is a typical inductive modeling method built on the principles of self-organization. Since its introduction, inductive modelling has been developed to support complex systems in prediction, clusterization, system identification, as well as data mining and knowledge extraction technologies in social science, science, engineering, and medicine.This is the first book to explore GMDH using MATLAB (matrix laboratory) language. Readers will learn how to implement GMDH in MATLAB as a method of dealing with big data analytics. Error-free source codes in MATLAB have been included in supplementary material (accessible online) to assist users in their understanding in GMDH and to make it easy for users to further develop variations of GMDH algorithms.
This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part "Technologies and Methods" contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part "Processes and Applications" details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems.
Technology is not just limited to technology companies, it impacts sectors such as healthcare, agriculture, and security. In the last few decades, countries, too, have started developing technologies or integrating technologies into their systems. As a result, all countries, regardless of size, need to understand the management of engineering and technology concepts. Digital Transformations reviews fundamentals and applications through existing and emerging technologies all around the world. Big data availability and the emergence of new tools provide opportunities to detect the emergence of new technologies. Some of the major elements of such analyses include bibliometrics, patent analysis and social network analysis. The authors focus on these three tools and demonstrate their use through applications such as Blockchain, Artificial Intelligence, Robotics, 3D printing, Wireless Power, Autonomous and Electric Driving, and Smart Homes. Through the examination of cases based on emerging technologies, the book provides a spectrum of these recent applications and serves as a reference for professionals, researchers and students on fundamentals of technology utilization tools.
A hands on guide to web scraping and text mining for both beginners and experienced users of R * Introduces fundamental concepts of the main architecture of the web and databases and covers HTTP, HTML, XML, JSON, SQL. * Provides basic techniques to query web documents and data sets (XPath and regular expressions). * An extensive set of exercises are presented to guide the reader through each technique. * Explores both supervised and unsupervised techniques as well as advanced techniques such as data scraping and text management. * Case studies are featured throughout along with examples for each technique presented. * R code and solutions to exercises featured in the book are provided on a supporting website.
During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.
Sentiment analysis is the computational study of people's opinions, sentiments, emotions, moods, and attitudes. This fascinating problem offers numerous research challenges, but promises insight useful to anyone interested in opinion analysis and social media analysis. This comprehensive introduction to the topic takes a natural-language-processing point of view to help readers understand the underlying structure of the problem and the language constructs commonly used to express opinions, sentiments, and emotions. The book covers core areas of sentiment analysis and also includes related topics such as debate analysis, intention mining, and fake-opinion detection. It will be a valuable resource for researchers and practitioners in natural language processing, computer science, management sciences, and the social sciences. In addition to traditional computational methods, this second edition includes recent deep learning methods to analyze and summarize sentiments and opinions, and also new material on emotion and mood analysis techniques, emotion-enhanced dialogues, and multimodal emotion analysis.
Decision trees have become one of the most powerful and popular approaches in knowledge discovery and data mining; it is the science of exploring large and complex bodies of data in order to discover useful patterns. Decision tree learning continues to evolve over time. Existing methods are constantly being improved and new methods introduced.This 2nd Edition is dedicated entirely to the field of decision trees in data mining; to cover all aspects of this important technique, as well as improved or new methods and techniques developed after the publication of our first edition. In this new edition, all chapters have been revised and new topics brought in. New topics include Cost-Sensitive Active Learning, Learning with Uncertain and Imbalanced Data, Using Decision Trees beyond Classification Tasks, Privacy Preserving Decision Tree Learning, Lessons Learned from Comparative Studies, and Learning Decision Trees for Big Data. A walk-through guide to existing open-source data mining software is also included in this edition.This book invites readers to explore the many benefits in data mining that decision trees offer:
Originating from Facebook, LinkedIn, Twitter, Instagram, YouTube, and many other networking sites, the social media shared by users and the associated metadata are collectively known as user generated content (UGC). To analyze UGC and glean insight about user behavior, robust techniques are needed to tackle the huge amount of real-time, multimedia, and multilingual data. Researchers must also know how to assess the social aspects of UGC, such as user relations and influential users. Mining User Generated Content is the first focused effort to compile state-of-the-art research and address future directions of UGC. It explains how to collect, index, and analyze UGC to uncover social trends and user habits. Divided into four parts, the book focuses on the mining and applications of UGC. The first part presents an introduction to this new and exciting topic. Covering the mining of UGC of different medium types, the second part discusses the social annotation of UGC, social network graph construction and community mining, mining of UGC to assist in music retrieval, and the popular but difficult topic of UGC sentiment analysis. The third part describes the mining and searching of various types of UGC, including knowledge extraction, search techniques for UGC content, and a specific study on the analysis and annotation of Japanese blogs. The fourth part on applications explores the use of UGC to support question-answering, information summarization, and recommendations.
Biologists are stepping up their efforts in understanding the biological processes that underlie disease pathways in the clinical contexts. This has resulted in a flood of biological and clinical data from genomic and protein sequences, DNA microarrays, protein interactions, biomedical images, to disease pathways and electronic health records. To exploit these data for discovering new knowledge that can be translated into clinical applications, there are fundamental data analysis difficulties that have to be overcome. Practical issues such as handling noisy and incomplete data, processing compute-intensive tasks, and integrating various data sources, are new challenges faced by biologists in the post-genome era. This book will cover the fundamentals of state-of-the-art data mining techniques which have been designed to handle such challenging data analysis problems, and demonstrate with real applications how biologists and clinical scientists can employ data mining to enable them to make meaningful observations and discoveries from a wide array of heterogeneous data from molecular biology to pharmaceutical and clinical domains.
Today, big data affects countless aspects of our daily lives. This book provides a comprehensive and cutting-edge study on big data analytics, based on the research findings and applications developed by the author and his colleagues in related areas. It addresses the concepts of big data analytics and/or data science, multi-criteria optimization for learning, expert and rule-based data analysis, support vector machines for classification, feature selection, data stream analysis, learning analysis, sentiment analysis, link analysis, and evaluation analysis. The book also explores lessons learned in applying big data to business, engineering and healthcare. Lastly, it addresses the advanced topic of intelligence-quotient (IQ) tests for artificial intelligence. Since each aspect mentioned above concerns a specific domain of application, taken together, the algorithms, procedures, analysis and empirical studies presented here offer a general picture of big data developments. Accordingly, the book can not only serve as a textbook for graduates with a fundamental grasp of training in big data analytics, but can also show practitioners how to use the proposed techniques to deal with real-world big data problems. |
You may like...
Data Science for Fake News - Surveys and…
Deepak P, Tanmoy Chakraborty, …
Hardcover
R4,271
Discovery Miles 42 710
Transforming Businesses With Bitcoin…
Dharmendra Singh Rajput, Ramjeevan Singh Thakur, …
Hardcover
R6,259
Discovery Miles 62 590
Handbook of Research on Automated…
Mrutyunjaya Panda, Harekrishna Misra
Hardcover
R8,195
Discovery Miles 81 950
Intelligent Analysis of Multimedia…
Siddhartha Bhattacharyya, Hrishikesh Bhaumik, …
Hardcover
R5,812
Discovery Miles 58 120
New Opportunities for Sentiment Analysis…
Aakanksha Sharaff, G. R. Sinha, …
Hardcover
R7,022
Discovery Miles 70 220
|