![]() |
![]() |
Your cart is empty |
||
Books > Computing & IT > Applications of computing > Databases > Data mining
Leverage the full power of Bayesian analysis for competitive advantage Bayesian methods can solve problems you can't reliably handle any other way. Building on your existing Excel analytics skills and experience, Microsoft Excel MVP Conrad Carlberg helps you make the most of Excel's Bayesian capabilities and move toward R to do even more. Step by step, with real-world examples, Carlberg shows you how to use Bayesian analytics to solve a wide array of real problems. Carlberg clarifies terminology that often bewilders analysts, and offers sample R code to take advantage of the rethinking package in R and its gateway to Stan. As you incorporate these Bayesian approaches into your analytical toolbox, you'll build a powerful competitive advantage for your organization-and yourself. Explore key ideas and strategies that underlie Bayesian analysis Distinguish prior, likelihood, and posterior distributions, and compare algorithms for driving sampling inputs Use grid approximation to solve simple univariate problems, and understand its limits as parameters increase Perform complex simulations and regressions with quadratic approximation and Richard McElreath's quap function Manage text values as if they were numeric Learn today's gold-standard Bayesian sampling technique: Markov Chain Monte Carlo (MCMC) Use MCMC to optimize execution speed in high-complexity problems Discover when frequentist methods fail and Bayesian methods are essential-and when to use both in tandem
The Definitive Volume on Cutting-Edge Exploratory Analysis of Massive Spatial and Spatiotemporal Databases Since the publication of the first edition of Geographic Data Mining and Knowledge Discovery, new techniques for geographic data warehousing (GDW), spatial data mining, and geovisualization (GVis) have been developed. In addition, there has been a rise in the use of knowledge discovery techniques due to the increasing collection and storage of data on spatiotemporal processes and mobile objects. Incorporating these novel developments, this second edition reflects the current state of the art in the field. New to the Second Edition
Geographic data mining and knowledge discovery is a promising young discipline with many challenging research problems. This book shows that this area represents an important direction in the development of a new generation of spatial analysis tools for data-rich environments. Exploring various problems and possible solutions, it will motivate researchers to develop new methods and applications in this emerging field.
The Definitive Resource on Text Mining Theory and Applications from Foremost Researchers in the Field Giving a broad perspective of the field from numerous vantage points, Text Mining: Classification, Clustering, and Applications focuses on statistical methods for text mining and analysis. It examines methods to automatically cluster and classify text documents and applies these methods in a variety of areas, including adaptive information filtering, information distillation, and text search. The book begins with chapters on the classification of documents into predefined categories. It presents state-of-the-art algorithms and their use in practice. The next chapters describe novel methods for clustering documents into groups that are not predefined. These methods seek to automatically determine topical structures that may exist in a document corpus. The book concludes by discussing various text mining applications that have significant implications for future research and industrial use. There is no doubt that text mining will continue to play a critical role in the development of future information systems and advances in research will be instrumental to their success. This book captures the technical depth and immense practical potential of text mining, guiding readers to a sound appreciation of this burgeoning field.
Identifying some of the most influential algorithms that are widely used in the data mining community, The Top Ten Algorithms in Data Mining provides a description of each algorithm, discusses its impact, and reviews current and future research. Thoroughly evaluated by independent reviewers, each chapter focuses on a particular algorithm and is written by either the original authors of the algorithm or world-class researchers who have extensively studied the respective algorithm. The book concentrates on the following important algorithms: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. Examples illustrate how each algorithm works and highlight its overall performance in a real-world application. The text covers key topics?including classification, clustering, statistical learning, association analysis, and link mining?in data mining research and development as well as in data mining, machine learning, and artificial intelligence courses. By naming the leading algorithms in this field, this book encourages the use of data mining techniques in a broader realm of real-world applications. It should inspire more data mining researchers to further explore the impact and novel research issues of these algorithms.
This book presents the results of discussions and presentation from the latest ISDT event (2014) which was dedicated to the 94th birthday anniversary of Prof. Lotfi A. Zade, father of Fuzzy logic. The book consists of three main chapters, namely: Chapter 1: Integrated Systems Design Chapter 2: Knowledge, Competence and Business Process Management Chapter 3: Integrated Systems Technologies Each article presents novel and scientific research results with respect to the target goal of improving our common understanding of KT integration.
Introduction to Bio-Ontologies explores the computational background of ontologies. Emphasizing computational and algorithmic issues surrounding bio-ontologies, this self-contained text helps readers understand ontological algorithms and their applications. The first part of the book defines ontology and bio-ontologies. It also explains the importance of mathematical logic for understanding concepts of inference in bio-ontologies, discusses the probability and statistics topics necessary for understanding ontology algorithms, and describes ontology languages, including OBO (the preeminent language for bio-ontologies), RDF, RDFS, and OWL. The second part covers significant bio-ontologies and their applications. The book presents the Gene Ontology; upper-level ontologies, such as the Basic Formal Ontology and the Relation Ontology; and current bio-ontologies, including several anatomy ontologies, Chemical Entities of Biological Interest, Sequence Ontology, Mammalian Phenotype Ontology, and Human Phenotype Ontology. The third part of the text introduces the major graph-based algorithms for bio-ontologies. The authors discuss how these algorithms are used in overrepresentation analysis, model-based procedures, semantic similarity analysis, and Bayesian networks for molecular biology and biomedical applications. With a focus on computational reasoning topics, the final part describes the ontology languages of the Semantic Web and their applications for inference. It covers the formal semantics of RDF and RDFS, OWL inference rules, a key inference algorithm, the SPARQL query language, and the state of the art for querying OWL ontologies. Web ResourceSoftware and data designed to complement material in the text are available on the book's website: http://bio-ontologies-book.org The site provides the R Robo package developed for the book, along with a compressed archive of data and ontology files used in some of the exercises. It also offers teaching/presentation slides and links to other relevant websites. This book provides readers with the foundation to use ontologies as a starting point for new bioinformatics research projects or to support current molecular genetics research projects. By supplying a self-contained introduction to OBO ontologies and the Semantic Web, it bridges the gap between both fields and helps readers see what each can contribute to the analysis and understanding of biomedical data.
Build predictive models from time-based patterns in your data. Master statistical models including new deep learning approaches for time series forecasting. In Time Series Forecasting in Python you will learn how to: Recognize a time series forecasting problem and build a performant predictive model Create univariate forecasting models that account for seasonal effects and external variables Build multivariate forecasting models to predict many time series at once Leverage large datasets by using deep learning for forecasting time series Automate the forecasting process DESCRIPTION Time Series Forecasting in Python teaches you to build powerful predictive models from time-based data. Every model you create is relevant, useful, and easy to implement with Python. You'll explore interesting real-world datasets like Google's daily stock price and economic data for the USA, quickly progressing from the basics to developing large-scale models that use deep learning tools like TensorFlow.Time Series Forecasting in Python teaches you to apply time series forecasting and get immediate, meaningful predictions. You'll learn both traditional statistical and new deep learning models for time series forecasting, all fully illustrated with Python source code. Time Series Forecasting in Python teaches you to build powerful predictive models from time-based data. Every model you create is relevant, useful, and easy to implement with Python. You'll explore interesting real-world datasets like Google's daily stock price and economic data for the USA, quickly progressing from the basics to developing large-scale models that use deep learning tools like TensorFlow. about the technology Time series forecasting reveals hidden trends and makes predictions about the future from your data. This powerful technique has proven incredibly valuable across multiple fields-from tracking business metrics, to healthcare and the sciences. Modern Python libraries and powerful deep learning tools have opened up new methods and utilities for making practical time series forecasts. about the book Time Series Forecasting in Python teaches you to apply time series forecasting and get immediate, meaningful predictions. You'll learn both traditional statistical and new deep learning models for time series forecasting, all fully illustrated with Python source code. Test your skills with hands-on projects for forecasting air travel, volume of drug prescriptions, and the earnings of Johnson & Johnson. By the time you're done, you'll be ready to build accurate and insightful forecasting models with tools from the Python ecosystem.
Extremal Optimization: Fundamentals, Algorithms, and Applications introduces state-of-the-art extremal optimization (EO) and modified EO (MEO) solutions from fundamentals, methodologies, and algorithms to applications based on numerous classic publications and the authors' recent original research results. It promotes the movement of EO from academic study to practical applications. The book covers four aspects, beginning with a general review of real-world optimization problems and popular solutions with a focus on computational complexity, such as "NP-hard" and the "phase transitions" occurring on the search landscape. Next, it introduces computational extremal dynamics and its applications in EO from principles, mechanisms, and algorithms to the experiments on some benchmark problems such as TSP, spin glass, Max-SAT (maximum satisfiability), and graph partition. It then presents studies on the fundamental features of search dynamics and mechanisms in EO with a focus on self-organized optimization, evolutionary probability distribution, and structure features (e.g., backbones), which are based on the authors' recent research results. Finally, it discusses applications of EO and MEO in multiobjective optimization, systems modeling, intelligent control, and production scheduling. The authors present the advanced features of EO in solving NP-hard problems through problem formulation, algorithms, and simulation studies on popular benchmarks and industrial applications. They also focus on the development of MEO and its applications. This book can be used as a reference for graduate students, research developers, and practical engineers who work on developing optimization solutions for those complex systems with hardness that cannot be solved with mathematical optimization or other computational intelligence, such as evolutionary computations.
Recentyearshaveseentheadventanddevelopmentofmanydevicesabletorecordand storeaneverincreasingamountofinformation. Thefastprogressofthesetechnologies is ubiquitousthroughoutall ?elds of science and applied contexts, ranging from medicine,biologyandlifesciences,toeconomicsandindustry. Thedataprovided bytheseinstrumentshavedifferentforms:2D-3Dimagesgeneratedbydiagnostic medicalscanners,computervisionorsatelliteremotesensing,microarraydataand genesets,integratedclinicalandadministrativedatafrompublichealthdatabases, realtimemonitoringdataofabio-marker,systemcontroldatasets. Allthesedata sharethecommoncharacteristicofbeingcomplexandoftenhighlydimensional. Theanalysisofcomplexandhighlydimensionaldataposesnewchallengesto thestatisticianandrequiresthedevelopmentofnovelmodelsandtechniques,fueling manyfascinatingandfastgrowingresearchareasofmodernstatistics. Anincomplete listincludes for example: functionaldata analysis, that deals with data having a functionalnature,suchascurvesandsurfaces;shapeanalysisofgeometricforms,that relatestoshapematchingandshaperecognition,appliedtocomputationalvisionand medicalimaging;datamining,thatstudiesalgorithmsfortheautomaticextraction ofinformationfromdata,elicitingrulesandpatternsoutofmassivedatasets;risk analysis,fortheevaluationofhealth,environmental,andengineeringrisks;graphical models,thatallowproblemsinvolvinglarge-scalemodelswithmillionsofrandom variableslinkedincomplexwaystobeapproached;reliabilityofcomplexsystems, whoseevaluationrequirestheuseofmanystatisticalandprobabilistictools;optimal designofcomputersimulationstoreplaceexpensiveandtimeconsumingphysical experiments. Thecontributionspublishedinthisvolumearetheresultofaselectionbasedonthe presentations(aboutonehundred)givenattheconference"S. Co. 2009:Complexdata modelingandcomputationallyintensivemethodsforestimationandprediction",held ? atthePolitecnicodiMilano. S. Co. isaforumforthediscussionofnewdevelopments ? September14-16,2009. Thatof2009isitssixthedition,the?rstonebeingheldinVenice in1999. VI Preface andapplicationsofstatisticalmethodsandcomputationaltechniquesforcomplexand highlydimensionaldatasets. Thebookisaddressedtostatisticiansworkingattheforefrontofthestatistical analysisofcomplexandhighlydimensionaldataandoffersawidevarietyofstatistical models,computerintensivemethodsandapplications. Wewishtothankallassociateeditorsandrefereesfortheirvaluablecontributions thatmadethisvolumepossible. MilanandVenice,May2010 PietroMantovan PiercesareSecchi Contents Space-timetextureanalysisinthermalinfraredimagingforclassi?cation ofRaynaud'sPhenomenon GrazianoAretusi,LaraFontanella,LuigiIppolitiandArcangeloMerla...1 Mixed-effectsmodellingofKevlar?brefailuretimesthroughBayesian non-parametrics RaffaeleArgiento,AlessandraGuglielmiandAntonioPievatolo...13 Space?llingandlocallyoptimaldesignsforGaussianUniversalKriging AlessandroBaldiAntogniniandMaroussaZagoraiou...27 Exploitation,integrationandstatisticalanalysisofthePublicHealth DatabaseandSTEMIArchiveintheLombardiaregion PietroBarbieri,Niccolo'Grieco,FrancescaIeva,AnnaMariaPaganoniand PiercesareSecchi...41 Bootstrapalgorithmsforvarianceestimationin PSsampling AlessandroBarbieroandFulviaMecatti...5 7 FastBayesianfunctionaldataanalysisofbasalbodytemperature JamesM. Ciera...71 AparametricMarkovchaintomodelage-andstate-dependentwear processes MassimilianoGiorgio,MaurizioGuidaandGianpaoloPulcini...85 CasestudiesinBayesiancomputationusingINLA SaraMartinoandHav ? ardRue...99 Agraphicalmodelsapproachforcomparinggenesets M. So?aMassa,MonicaChiognaandChiaraRomualdi...115 VIII Contents Predictivedensitiesandpredictionlimitsbasedonpredictivelikelihoods PaoloVidoni...123 Computer-intensiveconditionalinference G. AlastairYoungandThomasJ. DiCiccio...137 MonteCarlosimulationmethodsforreliabilityestimationandfailure prognostics EnricoZio...151 ListofContributors AlessandroBaldiAntognini JamesM. Ciera DepartmentofStatisticalSciences DepartmentofStatisticalSciences UniversityofBologna UniversityofPadova Bologna,Italy Padova,Italy ThomasJ. DiCiccio GrazianoAretusi DepartmentofSocialStatistics DepartmentofQuantitativeMethods CornellUniversity andEconomicTheory Ithaca,USA UniversityG. d'Annunzio Chieti-Pescara,Italy LaraFontanella DepartmentofQuantitativeMethods RaffaeleArgiento andEconomicTheory CNRIMATI UniversityG. d'Annunzio Milan,Italy Chieti-Pescara,Italy MassimilianoGiorgio PietroBarbieri DepartmentofAerospace Uf? cioQualita' andMechanicalEngineering CernuscosulNaviglio,Italy SecondUniversityofNaples Aversa(CE),Italy AlessandroBarbiero DepartmentofEconomics Niccolo'Grieco BusinessandStatistics A. O. NiguardaCa'Granda UniversityofMilan Milan,Italy Milan,Italy MaurizioGuida MonicaChiogna DepartmentofElectrical DepartmentofStatisticalSciences andInformationEngineering UniversityofPadova UniversityofSalerno Padova,Italy Fisciano(SA),Italy X ListofContributors AlessandraGuglielmi AntonioPievatolo DepartmentofMathematics CNRIMATI PolitecnicodiMilano Milan,Italy Milan,Italy GianpaoloPulcini alsoaf?liatedtoCNRIMATI,Milano IstitutoMotori NationalResearchCouncil(CNR) FrancescaIeva Naples,Italy MOX-DepartmentofMathematics PolitecnicodiMilano ChiaraRomualdi Milan,Italy DepartmentofBiology UniversityofPadova LuigiIppoliti Padova,Italy DepartmentofQuantitativeMethods andEconomicTheory H?avardRue UniversityG. d'Annunzio DepartmentofMathematicalSciences Chieti-Pescara,Italy NorwegianUniversityforScience andTechnology SaraMartino Trondheim,Norway DepartmentofMathematicalSciences NorwegianUniversityforScience PiercesareSecchi andTechnology MOX-DepartmentofMathematics Trondheim,Norway PolitecnicodiMilano Milan,Italy M. So?aMassa DepartmentofStatisticalSciences PaoloVidoni UniversityofPadova DepartmentofStatistics Padova,Italy UniversityofUdine Udine,Italy FulviaMecatti DepartmentofStatistics G.
Today's malware mutates randomly to avoid detection, but reactively adaptive malware is more intelligent, learning and adapting to new computer defenses on the fly. Using the same algorithms that antivirus software uses to detect viruses, reactively adaptive malware deploys those algorithms to outwit antivirus defenses and to go undetected. This book provides details of the tools, the types of malware the tools will detect, implementation of the tools in a cloud computing framework and the applications for insider threat detection.
Human Capital Systems, Analytics, and Data Mining provides human capital professionals, researchers, and students with a comprehensive and portable guide to human capital systems, analytics and data mining. The main purpose of this book is to provide a rich tool set of methods and tutorials for Human Capital Management Systems (HCMS) database modeling, analytics, interactive dashboards, and data mining that is independent of any human capital software vendor offerings and is equally usable and portable among both commercial and internally developed HCMS. The book begins with an overview of HCMS, including coverage of human resource systems history and current HCMS Computing Environments. It next explores relational and dimensional database management concepts and principles. HCMS Instructional databases developed by the Author for use in Graduate Level HCMS and Compensation Courses are used for database modeling and dashboard design exercises. Exciting knowledge discovery and research Tutorials and Exercises using Online Analytical Processing (OLAP) and data mining tools through replication of actual original pay equity research by the author are included. New findings concerning Gender Based Pay Equity Research through the lens Comparable Worth and Occupational Mobility are covered extensively in Human Capital Metrics, Analytics and Data Mining Chapters.
From the Foreword: "While large-scale machine learning and data mining have greatly impacted a range of commercial applications, their use in the field of Earth sciences is still in the early stages. This book, edited by Ashok Srivastava, Ramakrishna Nemani, and Karsten Steinhaeuser, serves as an outstanding resource for anyone interested in the opportunities and challenges for the machine learning community in analyzing these data sets to answer questions of urgent societal interest...I hope that this book will inspire more computer scientists to focus on environmental applications, and Earth scientists to seek collaborations with researchers in machine learning and data mining to advance the frontiers in Earth sciences." --Vipin Kumar, University of Minnesota Large-Scale Machine Learning in the Earth Sciences provides researchers and practitioners with a broad overview of some of the key challenges in the intersection of Earth science, computer science, statistics, and related fields. It explores a wide range of topics and provides a compilation of recent research in the application of machine learning in the field of Earth Science. Making predictions based on observational data is a theme of the book, and the book includes chapters on the use of network science to understand and discover teleconnections in extreme climate and weather events, as well as using structured estimation in high dimensions. The use of ensemble machine learning models to combine predictions of global climate models using information from spatial and temporal patterns is also explored. The second part of the book features a discussion on statistical downscaling in climate with state-of-the-art scalable machine learning, as well as an overview of methods to understand and predict the proliferation of biological species due to changes in environmental conditions. The problem of using large-scale machine learning to study the formation of tornadoes is also explored in depth. The last part of the book covers the use of deep learning algorithms to classify images that have very high resolution, as well as the unmixing of spectral signals in remote sensing images of land cover. The authors also apply long-tail distributions to geoscience resources, in the final chapter of the book.
Big Data of Complex Networks presents and explains the methods from the study of big data that can be used in analysing massive structural data sets, including both very large networks and sets of graphs. As well as applying statistical analysis techniques like sampling and bootstrapping in an interdisciplinary manner to produce novel techniques for analyzing massive amounts of data, this book also explores the possibilities offered by the special aspects such as computer memory in investigating large sets of complex networks. Intended for computer scientists, statisticians and mathematicians interested in the big data and networks, Big Data of Complex Networks is also a valuable tool for researchers in the fields of visualization, data analysis, computer vision and bioinformatics. Key features: Provides a complete discussion of both the hardware and software used to organize big data Describes a wide range of useful applications for managing big data and resultant data sets Maintains a firm focus on massive data and large networks Unveils innovative techniques to help readers handle big data Matthias Dehmer received his PhD in computer science from the Darmstadt University of Technology, Germany. Currently, he is Professor at UMIT - The Health and Life Sciences University, Austria, and the Universitat der Bundeswehr Munchen. His research interests are in graph theory, data science, complex networks, complexity, statistics and information theory. Frank Emmert-Streib received his PhD in theoretical physics from the University of Bremen, and is currently Associate professor at Tampere University of Technology, Finland. His research interests are in the field of computational biology, machine learning and network medicine. Stefan Pickl holds a PhD in mathematics from the Darmstadt University of Technology, and is currently a Professor at Bundeswehr Universitat Munchen. His research interests are in operations research, systems biology, graph theory and discrete optimization. Andreas Holzinger received his PhD in cognitive science from Graz University and his habilitation (second PhD) in computer science from Graz University of Technology. He is head of the Holzinger Group HCI-KDD at the Medical University Graz and Visiting Professor for Machine Learning in Health Informatics Vienna University of Technology.
Utilizing the ubiquity of social media in modern society, the emerging interdisciplinary field of social computing offers the promise of important human-centered applications. "Human-Centered Social Media Analytics" provides a timely and unique survey of next-generation social computational methodologies. The text explains the fundamentals of this field, and describes state-of-the-art methods for inferring social status, relationships, preferences, intentions, personalities, needs, and lifestyles from human information in unconstrained visual data. The collected chapters present a range of different viewpoints examining the various possibilities and challenges to machine understanding of humans in a social context. Topics and features: includes perspectives from an international and interdisciplinary selection of pre-eminent authorities; presents balanced coverage of both detailed theoretical analysis and real-world applications; examines social relationships in human-centered media for the development of socially-aware video, location-based, and multimedia applications; reviews techniques for recognizing the social roles played by people in an event, and for classifying human-object interaction activities; discusses the prediction and recognition of human attributes via social media analytics, including social relationships, facial age and beauty, and occupation; requires no prior background knowledge of the area. This authoritative text/reference will be a valuable resource for researchers and graduate students interested in social media and networking, computer vision and biometrics, big data, and HCI. Practitioners in these fields, as well as in image processing and computer graphics, will also find the book of great interest.
In this age of information overload, people use a variety of strategies to make choices about what to buy, how to spend their leisure time, and even whom to date. Recommender systems automate some of these strategies with the goal of providing affordable, personal, and high-quality recommendations. This book offers an overview of approaches to developing state-of-the-art recommender systems. The authors present current algorithmic approaches for generating personalized buying proposals, such as collaborative and content-based filtering, as well as more interactive and knowledge-based approaches. They also discuss how to measure the effectiveness of recommender systems and illustrate the methods with practical case studies. The final chapters cover emerging topics such as recommender systems in the social web and consumer buying behavior theory. Suitable for computer science researchers and students interested in getting an overview of the field, this book will also be useful for professionals looking for the right technology to build real-world recommender systems.
Rules - the clearest, most explored and best understood form of knowledge representation - are particularly important for data mining, as they offer the best tradeoff between human and machine understandability. This book presents the fundamentals of rule learning as investigated in classical machine learning and modern data mining. It introduces a feature-based view, as a unifying framework for propositional and relational rule learning, thus bridging the gap between attribute-value learning and inductive logic programming, and providing complete coverage of most important elements of rule learning. The book can be used as a textbook for teaching machine learning, as well as a comprehensive reference to research in the field of inductive rule learning. As such, it targets students, researchers and developers of rule learning algorithms, presenting the fundamental rule learning concepts in sufficient breadth and depth to enable the reader to understand, develop and apply rule learning techniques to real-world data.
Collecting the latest developments in the field, Multimedia Data Mining: A Systematic Introduction to Concepts and Theory defines multimedia data mining, its theory, and its applications. Two of the most active researchers in multimedia data mining explore how this young area has rapidly developed in recent years. The book first discusses the theoretical foundations of multimedia data mining, presenting commonly used feature representation, knowledge representation, statistical learning, and soft computing techniques. It then provides application examples that showcase the great potential of multimedia data mining technologies. In this part, the authors show how to develop a semantic repository training method and a concept discovery method in an imagery database. They demonstrate how knowledge discovery helps achieve the goal of imagery annotation. The authors also describe an effective solution to large-scale video search, along with an application of audio data classification and categorization. This novel, self-contained book examines how the merging of multimedia and data mining research can promote the understanding and advance the development of knowledge discovery in multimedia data.
This book seeks to promote the exploitation of data science in healthcare systems. The focus is on advancing the automated analytical methods used to extract new knowledge from data for healthcare applications. To do so, the book draws on several interrelated disciplines, including machine learning, big data analytics, statistics, pattern recognition, computer vision, and Semantic Web technologies, and focuses on their direct application to healthcare. Building on three tutorial-like chapters on data science in healthcare, the following eleven chapters highlight success stories on the application of data science in healthcare, where data science and artificial intelligence technologies have proven to be very promising. This book is primarily intended for data scientists involved in the healthcare or medical sector. By reading this book, they will gain essential insights into the modern data science technologies needed to advance innovation for both healthcare businesses and patients. A basic grasp of data science is recommended in order to fully benefit from this book.
Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals. You'll explore the basic operations and common functions of Spark's structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Spark's scalable machine-learning library. Get a gentle overview of big data and Spark Learn about DataFrames, SQL, and Datasets-Spark's core APIs-through worked examples Dive into Spark's low-level APIs, RDDs, and execution of SQL and DataFrames Understand how Spark runs on a cluster Debug, monitor, and tune Spark clusters and applications Learn the power of Structured Streaming, Spark's stream-processing engine Learn how you can apply MLlib to a variety of problems, including classification or recommendation
DATA VISUALIZATION: Exploring and Explaining with Data is designed to introduce best practices in data visualization to undergraduate and graduate students. This is one of the first books on data visualization designed for college courses. The book contains material on effective design, choice of chart type, effective use of color, how to both explore data visually, and how to explain concepts and results visually in a compelling way with data. The book explains both the "why" of data visualization and the "how." That is, the book provides lucid explanations of the guiding principles of data visualization through the use of interesting examples.
The latest inventions in internet technology influence most of business and daily activities. Internet security, internet data management, web search, data grids, cloud computing, and web-based applications play vital roles, especially in business and industry, as more transactions go online and mobile. Issues related to ubiquitous computing are becoming critical. Internet technology and data engineering should reinforce efficiency and effectiveness of business processes. These technologies should help people make better and more accurate decisions by presenting necessary information and possible consequences for the decisions. Intelligent information systems should help us better understand and manage information with ubiquitous data repository and cloud computing. This book is a compilation of some recent research findings in Internet Technology and Data Engineering. This book provides state-of-the-art accounts in computational algorithms/tools, database management and database technologies, intelligent information systems, data engineering applications, internet security, internet data management, web search, data grids, cloud computing, web-based application, and other related topics.
This book offers practical guidelines on creating value from the application of data science based on selected artificial intelligence methods. In Part I, the author introduces a problem-driven approach to implementing AI-based data science and offers practical explanations of key technologies: machine learning, deep learning, decision trees and random forests, evolutionary computation, swarm intelligence, and intelligent agents. In Part II, he describes the main steps in creating AI-based data science solutions for business problems, including problem knowledge acquisition, data preparation, data analysis, model development, and model deployment lifecycle. Finally, in Part III the author illustrates the power of AI-based data science with successful applications in manufacturing and business. He also shows how to introduce this technology in a business setting and guides the reader on how to build the appropriate infrastructure and develop the required skillsets. The book is ideal for data scientists who will implement the proposed methodology and techniques in their projects. It is also intended to help business leaders and entrepreneurs who want to create competitive advantage by using AI-based data science, as well as academics and students looking for an industrial view of this discipline.
With today's consumers spending more time on their mobiles than on their PCs, new methods of empirical stochastic modeling have emerged that can provide marketers with detailed information about the products, content, and services their customers desire. Data Mining Mobile Devices defines the collection of machine-sensed environmental data pertaining to human social behavior. It explains how the integration of data mining and machine learning can enable the modeling of conversation context, proximity sensing, and geospatial location throughout large communities of mobile users. Examines the construction and leveraging of mobile sites Describes how to use mobile apps to gather key data about consumers' behavior and preferences Discusses mobile mobs, which can be differentiated as distinct marketplaces-including Apple (R), Google (R), Facebook (R), Amazon (R), and Twitter (R) Provides detailed coverage of mobile analytics via clustering, text, and classification AI software and techniques Mobile devices serve as detailed diaries of a person, continuously and intimately broadcasting where, how, when, and what products, services, and content your consumers desire. The future is mobile-data mining starts and stops in consumers' pockets. Describing how to analyze Wi-Fi and GPS data from websites and apps, the book explains how to model mined data through the use of artificial intelligence software. It also discusses the monetization of mobile devices' desires and preferences that can lead to the triangulated marketing of content, products, or services to billions of consumers-in a relevant, anonymous, and personal manner.
The three volume set provides a systematic overview of theories and technique on social network analysis. Volume 1 of the set mainly focuses on the structure characteristics, the modeling, and the evolution mechanism of social network analysis. Techniques and approaches for virtual community detection are discussed in detail as well. It is an essential reference for scientist and professionals in computer science. |
![]() ![]() You may like...
Contemporary Perspectives in Data Mining
Kenneth D. Lawrence, Ronald K. Klimberg
Hardcover
R2,837
Discovery Miles 28 370
Handbook of Research on Automated…
Mrutyunjaya Panda, Harekrishna Misra
Hardcover
R8,424
Discovery Miles 84 240
Big Data - Concepts, Methodologies…
Information Reso Management Association
Hardcover
R19,115
Discovery Miles 191 150
New Opportunities for Sentiment Analysis…
Aakanksha Sharaff, G. R. Sinha, …
Hardcover
R7,211
Discovery Miles 72 110
Intelligent Analysis of Multimedia…
Siddhartha Bhattacharyya, Hrishikesh Bhaumik, …
Hardcover
R6,091
Discovery Miles 60 910
Implementation of Machine Learning…
Veljko Milutinovi, Nenad Mitic, …
Hardcover
R7,211
Discovery Miles 72 110
Opinion Mining and Text Analytics on…
Pantea Keikhosrokiani, Moussa Pourya Asl
Hardcover
R10,065
Discovery Miles 100 650
|