![]() |
Welcome to Loot.co.za!
Sign in / Register |Wishlists & Gift Vouchers |Help | Advanced search
|
Your cart is empty |
||
|
Books > Computing & IT > Applications of computing > Databases > Data capture & analysis
This is the first rigorous, self-contained treatment of the theory of deep learning. Starting with the foundations of the theory and building it up, this is essential reading for any scientists, instructors, and students interested in artificial intelligence and deep learning. It provides guidance on how to think about scientific questions, and leads readers through the history of the field and its fundamental connections to neuroscience. The author discusses many applications to beautiful problems in the natural sciences, in physics, chemistry, and biomedicine. Examples include the search for exotic particles and dark matter in experimental physics, the prediction of molecular properties and reaction outcomes in chemistry, and the prediction of protein structures and the diagnostic analysis of biomedical images in the natural sciences. The text is accompanied by a full set of exercises at different difficulty levels and encourages out-of-the-box thinking.
Many students find it daunting to move from studying environmental science, to designing and implementing their own research proposals. This book provides a practical introduction to help develop scientific thinking, aimed at undergraduate and new graduate students in the earth and environmental sciences. Students are guided through the steps of scientific thinking using published scientific literature and real environmental data. The book starts with advice on how to effectively read scientific papers, before outlining how to articulate testable questions and answer them using basic data analysis. The Mauna Loa CO2 dataset is used to demonstrate how to read metadata, prepare data, generate effective graphs and identify dominant cycles on various timescales. Practical, question-driven examples are explored to explain running averages, anomalies, correlations and simple linear models. The final chapter provides a framework for writing persuasive research proposals, making this an essential guide for students embarking on their first research project.
Many students find it daunting to move from studying environmental science, to designing and implementing their own research proposals. This book provides a practical introduction to help develop scientific thinking, aimed at undergraduate and new graduate students in the earth and environmental sciences. Students are guided through the steps of scientific thinking using published scientific literature and real environmental data. The book starts with advice on how to effectively read scientific papers, before outlining how to articulate testable questions and answer them using basic data analysis. The Mauna Loa CO2 dataset is used to demonstrate how to read metadata, prepare data, generate effective graphs and identify dominant cycles on various timescales. Practical, question-driven examples are explored to explain running averages, anomalies, correlations and simple linear models. The final chapter provides a framework for writing persuasive research proposals, making this an essential guide for students embarking on their first research project.
This is the first comprehensive overview of the 'science of science,' an emerging interdisciplinary field that relies on big data to unveil the reproducible patterns that govern individual scientific careers and the workings of science. It explores the roots of scientific impact, the role of productivity and creativity, when and what kind of collaborations are effective, the impact of failure and success in a scientific career, and what metrics can tell us about the fundamental workings of science. The book relies on data to draw actionable insights, which can be applied by individuals to further their career or decision makers to enhance the role of science in society. With anecdotes and detailed, easy-to-follow explanations of the research, this book is accessible to all scientists and graduate students, policymakers, and administrators with an interest in the wider scientific enterprise.
Text is everywhere, and it is a fantastic resource for social scientists. However, because it is so abundant, and because language is so variable, it is often difficult to extract the information we want. There is a whole subfield of AI concerned with text analysis (natural language processing). Many of the basic analysis methods developed are now readily available as Python implementations. This Element will teach you when to use which method, the mathematical background of how it works, and the Python code to implement it.
Anomaly detection is the detective work of machine learning: finding the unusual, catching the fraud, discovering strange activity in large and complex datasets. But, unlike Sherlock Holmes, you may not know what the puzzle is, much less what "suspects" you're looking for. This O'Reilly report uses practical examples to explain how the underlying concepts of anomaly detection work. From banking security to natural sciences, medicine, and marketing, anomaly detection has many useful applications in this age of big data. And the search for anomalies will intensify once the Internet of Things spawns even more new types of data. The concepts described in this report will help you tackle anomaly detection in your own project. Use probabilistic models to predict what's normal and contrast that to what you observe Set an adaptive threshold to determine which data falls outside of the normal range, using the t-digest algorithm Establish normal fluctuations in complex systems and signals (such as an EKG) with a more adaptive probablistic model Use historical data to discover anomalies in sporadic event streams, such as web traffic Learn how to use deviations in expected behavior to trigger fraud alerts
An extensive treatment of a key method in the statistician’s toolbox For more than two decades, the First Edition of Linear Regression Analysis has been an authoritative resource for one of the most common methods of handling statistical data. There have been many advances in the field over the last twenty years, including the development of more efficient and accurate regression computer programs, new ways of fitting regressions, and new methods of model selection and prediction. Linear Regression Analysis, Second Edition, revises and expands this standard text, providing extensive coverage of state-of-the-art theory and applications of linear regression analysis. Requiring no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models, this new edition features:
Concise, mathematically clear, and comprehensive, Linear Regression Analysis, Second Edition, serves as both a reliable reference for the practitioner and a valuable textbook for the student.
Everywhere you look people are talking about data. Buzzwords abound - 'data science', 'machine learning', 'artificial intelligence'. But what does any of it really mean, and most importantly what does it mean for your business? Long-established businesses in many industries find themselves competing with new entrants built entirely on data and analytics. This ground-breaking new book levels the playing field in dramatic fashion. The Average is Always Wrong is a completely pragmatic and hands-on guide to harnessing data to transform your business for the better. Experienced CEO and CMO Ian Shepherd takes you behind the jargon and puts together a powerful change programme anyone can enact in their business right now, to reap the rewards of simple but sophisticated uses of data. Filled with practical examples and case studies, readers will come away with a powerful understanding of the real value of data and the analytical techniques that can drive profit growth.
What is latent class analysis? If you asked that question thirty or forty years ago you would have gotten a different answer than you would today. Closer to its time of inception, latent class analysis was viewed primarily as a categorical data analysis technique, often framed as a factor analysis model where both the measured variable indicators and underlying latent variables are categorical. Today, however, it rests within much broader mixture and diagnostic modeling framework, integrating measured and latent variables that may be categorical and/or continuous, and where latent classes serve to define the subpopulations for whom many aspects of the focal measured and latent variable model may differ. For latent class analysis to take these developmental leaps required contributions that were methodological, certainly, as well as didactic. Among the leaders on both fronts was C. Mitchell "Chan" Dayton, at the University of Maryland, whose work in latent class analysis spanning several decades helped the method to expand and reach its current potential. The current volume in the Center for Integrated Latent Variable Research (CILVR) series reflects the diversity that is latent class analysis today, celebrating work related to, made possible by, and inspired by Chan's noted contributions, and signaling the even more exciting future yet to come.
This open access book presents the foundations of the Big Data research and innovation ecosystem and the associated enablers that facilitate delivering value from data for business and society. It provides insights into the key elements for research and innovation, technical architectures, business models, skills, and best practices to support the creation of data-driven solutions and organizations. The book is a compilation of selected high-quality chapters covering best practices, technologies, experiences, and practical recommendations on research and innovation for big data. The contributions are grouped into four parts: * Part I: Ecosystem Elements of Big Data Value focuses on establishing the big data value ecosystem using a holistic approach to make it attractive and valuable to all stakeholders. * Part II: Research and Innovation Elements of Big Data Value details the key technical and capability challenges to be addressed for delivering big data value. * Part III: Business, Policy, and Societal Elements of Big Data Value investigates the need to make more efficient use of big data and understanding that data is an asset that has significant potential for the economy and society. * Part IV: Emerging Elements of Big Data Value explores the critical elements to maximizing the future potential of big data value. Overall, readers are provided with insights which can support them in creating data-driven solutions, organizations, and productive data ecosystems. The material represents the results of a collective effort undertaken by the European data community as part of the Big Data Value Public-Private Partnership (PPP) between the European Commission and the Big Data Value Association (BDVA) to boost data-driven digital transformation.
This book presents an accessible introduction to data-driven storytelling. Resulting from unique discussions between data visualization researchers and data journalists, it offers an integrated definition of the topic, presents vivid examples and patterns for data storytelling, and calls out key challenges and new opportunities for researchers and practitioners.
Construct, analyze, and visualize networks with networkx, a Python language module. Network analysis is a powerful tool you can apply to a multitude of datasets and situations. Discover how to work with all kinds of networks, including social, product, temporal, spatial, and semantic networks. Convert almost any real-world data into a complex network--such as recommendations on co-using cosmetic products, muddy hedge fund connections, and online friendships. Analyze and visualize the network, and make business decisions based on your analysis. If you're a curious Python programmer, a data scientist, or a CNA specialist interested in mechanizing mundane tasks, you'll increase your productivity exponentially. Complex network analysis used to be done by hand or with non-programmable network analysis tools, but not anymore! You can now automate and program these tasks in Python. Complex networks are collections of connected items, words, concepts, or people. By exploring their structure and individual elements, we can learn about their meaning, evolution, and resilience. Starting with simple networks, convert real-life and synthetic network graphs into networkx data structures. Look at more sophisticated networks and learn more powerful machinery to handle centrality calculation, blockmodeling, and clique and community detection. Get familiar with presentation-quality network visualization tools, both programmable and interactive--such as Gephi, a CNA explorer. Adapt the patterns from the case studies to your problems. Explore big networks with NetworKit, a high-performance networkx substitute. Each part in the book gives you an overview of a class of networks, includes a practical study of networkx functions and techniques, and concludes with case studies from various fields, including social networking, anthropology, marketing, and sports analytics. Combine your CNA and Python programming skills to become a better network analyst, a more accomplished data scientist, and a more versatile programmer. What You Need: You will need a Python 3.x installation with the following additional modules: Pandas (>=0.18), NumPy (>=1.10), matplotlib (>=1.5), networkx (>=1.11), python-louvain (>=0.5), NetworKit (>=3.6), and generalizesimilarity. We recommend using the Anaconda distribution that comes with all these modules, except for python-louvain, NetworKit, and generalizedsimilarity, and works on all major modern operating systems.
This unique volume is an introduction for computer scientists, including a formal study of theoretical algorithms for Big Data applications, which allows them to work on such algorithms in the future. It also serves as a useful reference guide for the general computer science population, providing a comprehensive overview of the fascinating world of such algorithms.To achieve these goals, the algorithmic results presented have been carefully chosen so that they demonstrate the important techniques and tools used in Big Data algorithms, and yet do not require tedious calculations or a very deep mathematical background.
Gain the basics of Ruby's map, reduce, and select functions and discover how to use them to solve data-processing problems. This compact hands-on book explains how you can encode certain complex programs in 10 lines of Ruby code, an astonishingly small number. You will walk through problems and solutions which are effective because they use map, reduce, and select. As you read Ruby Data Processing, type in the code, run the code, and ponder the results. Tweak the code to test the code and see how the results change. After reading this book, you will have a deeper understanding of how to break data-processing problems into processing stages, each of which is understandable, debuggable, and composable, and how to combine the stages to solve your data-processing problem. As a result, your Ruby coding will become more efficient and your programs will be more elegant and robust. What You Will Learn Discover Ruby data processing and how to do it using the map, reduce, and select functions Develop complex solutions including debugging, randomizing, sorting, grouping, and more Reverse engineer complex data-processing solutions Who This Book Is For Those who have at least some prior experience programming in Ruby and who have a background and interest in data analysis and processing using Ruby.
If you're a business team leader, CIO, business analyst, or developer interested in how Apache Hadoop and Apache HBase-related technologies can address problems involving large-scale data in cost-effective ways, this book is for you. Using real-world stories and situations, authors Ted Dunning and Ellen Friedman show Hadoop newcomers and seasoned users alike how NoSQL databases and Hadoop can solve a variety of business and research issues. You'll learn about early decisions and pre-planning that can make the process easier and more productive. If you're already using these technologies, you'll discover ways to gain the full range of benefits possible with Hadoop. While you don't need a deep technical background to get started, this book does provide expert guidance to help managers, architects, and practitioners succeed with their Hadoop projects.Examine a day in the life of big data: India's ambitious Aadhaar project; review tools in the Hadoop ecosystem such as Apache's Spark, Storm, and Drill to learn how they can help you; pick up a collection of technical and strategic tips that have helped others succeed with Hadoop; learn from several prototypical Hadoop use cases, based on how organizations have actually applied the technology. You can explore real-world stories that reveal how MapR customers combine use cases when putting Hadoop and NoSQL to work, including in production.
Power BI Data Analysis and Visualization provides a roadmap to vendor choices and highlights why Microsoft's Power BI is a very viable, cost effective option for data visualization. The book covers the fundamentals and most commonly used features of Power BI, but also includes an in-depth discussion of advanced Power BI features such as natural language queries; embedding Power BI dashboards; and live streaming data. It discusses real solutions to extract data from the ERP application, Microsoft Dynamics CRM, and also offers ways to host the Power BI Dashboard as an Azure application, extracting data from popular data sources like Microsoft SQL Server and open-source PostgreSQL. Authored by Microsoft experts, this book uses real-world coding samples and screenshots to spotlight how to create reports, embed them in a webpage, view them across multiple platforms, and more. Business owners, IT professionals, data scientists, and analysts will benefit from this thorough presentation of Power BI and its functions.
This SpringerBrief reviews the knowledge engineering problem of engineering objectivity in top-k query answering; essentially, answers must be computed taking into account the user's preferences and a collection of (subjective) reports provided by other users. Most assume each report can be seen as a set of scores for a list of features, its author's preferences among the features, as well as other information is discussed in this brief. These pieces of information for every report are then combined, along with the querying user's preferences and their trust in each report, to rank the query results. Everyday examples of this setup are the online reviews that can be found in sites like Amazon, Trip Advisor, and Yelp, among many others. Throughout this knowledge engineering effort the authors adopt the Datalog+/- family of ontology languages as the underlying knowledge representation and reasoning formalism, and investigate several alternative ways in which rankings can b e derived, along with algorithms for top-k (atomic) query answering under these rankings. This SpringerBrief also investigate assumptions under which our algorithms run in polynomial time in the data complexity. Since this SpringerBrief contains a gentle introduction to the main building blocks (OBDA, Datalog+/-, and reasoning with preferences), it should be of value to students, researchers, and practitioners who are interested in the general problem of incorporating user preferences into related formalisms and tools. Practitioners also interested in using Ontology-based Data Access to leverage information contained in reviews of products and services for a better customer experience will be interested in this brief and researchers working in the areas of Ontological Languages, Semantic Web, Data Provenance, and Reasoning with Preferences.
Data clustering is a highly interdisciplinary field, the goal of which is to divide a set of objects into homogeneous groups such that objects in the same group are similar and objects in different groups are quite distinct. Thousands of theoretical papers and a number of books on data clustering have been published over the past 50 years. However, few books exist to teach people how to implement data clustering algorithms. This book was written for anyone who wants to implement or improve their data clustering algorithms. Using object-oriented design and programming techniques, Data Clustering in C++ exploits the commonalities of all data clustering algorithms to create a flexible set of reusable classes that simplifies the implementation of any data clustering algorithm. Readers can follow the development of the base data clustering classes and several popular data clustering algorithms. Additional topics such as data pre-processing, data visualization, cluster visualization, and cluster interpretation are briefly covered. This book is divided into three parts-- Data Clustering and C++ Preliminaries: A review of basic concepts of data clustering, the unified modeling language, object-oriented programming in C++, and design patterns A C++ Data Clustering Framework: The development of data clustering base classes Data Clustering Algorithms: The implementation of several popular data clustering algorithms A key to learning a clustering algorithm is to implement and experiment the clustering algorithm. Complete listings of classes, examples, unit test cases, and GNU configuration files are included in the appendices of this book as well as in the downloadable resources. The only requirements to compile the code are a modern C++ compiler and the Boost C++ libraries.
This book constitutes the thoroughly refereed proceedings of the Fourth International Conference on Data Technologies and Applications, DATA 2016, held in Colmar, France, in July 2016. The 9 revised full papers were carefully reviewed and selected from 50 submissions. The papers deal with the following topics: databases, data warehousing, data mining, data management, data security, knowledge and information systems and technologies; advanced application of data.
If you're like most R users, you have deep knowledge and love for statistics. But as your organization continues to collect huge amounts of data, adding tools such as Apache Spark makes a lot of sense. With this practical book, data scientists and professionals working with large-scale data applications will learn how to use Spark from R to tackle big data and big compute problems. Authors Javier Luraschi, Kevin Kuo, and Edgar Ruiz show you how to use R with Spark to solve different data analysis problems. This book covers relevant data science topics, cluster computing, and issues that should interest even the most advanced users. Analyze, explore, transform, and visualize data in Apache Spark with R Create statistical models to extract information and predict outcomes; automate the process in production-ready workflows Perform analysis and modeling across many machines using distributed computing techniques Use large-scale data from multiple sources and different formats with ease from within Spark Learn about alternative modeling frameworks for graph processing, geospatial analysis, and genomics at scale Dive into advanced topics including custom transformations, real-time data processing, and creating custom Spark extensions
This book gathers papers presented at the ECC 2016, the Third Euro-China Conference on Intelligent Data Analysis and Applications, which was held in Fuzhou City, China from November 7 to 9, 2016. The aim of the ECC is to provide an internationally respected forum for scientific research in the broad areas of intelligent data analysis, computational intelligence, signal processing, and all associated applications of artificial intelligence (AI). The third installment of the ECC was jointly organized by Fujian University of Technology, China, and VSB-Technical University of Ostrava, Czech Republic. The conference was co-sponsored by Taiwan Association for Web Intelligence Consortium, and Immersion Co., Ltd.
This book constitutes the thoroughly refereed post-conference proceedings of the International Conference on Scalable Information Systems, INFOSCALE 2014, held in September 2014 in Seoul, South Korea. The 9 revised full papers presented were carefully reviewed and selected from 14 submissions. The papers cover a wide range of topics such as scalable data analysis and big data applications.
This book provides comprehensive reviews of recent progress in matrix variate and tensor variate data analysis from applied points of view. Matrix and tensor approaches for data analysis are known to be extremely useful for recently emerging complex and high-dimensional data in various applied fields. The reviews contained herein cover recent applications of these methods in psychology (Chap. 1), audio signals (Chap. 2) , image analysis from tensor principal component analysis (Chap. 3), and image analysis from decomposition (Chap. 4), and genetic data (Chap. 5) . Readers will be able to understand the present status of these techniques as applicable to their own fields. In Chapter 5 especially, a theory of tensor normal distributions, which is a basic in statistical inference, is developed, and multi-way regression, classification, clustering, and principal component analysis are exemplified under tensor normal distributions. Chapter 6 treats one-sided tests under matrix variate and tensor variate normal distributions, whose theory under multivariate normal distributions has been a popular topic in statistics since the books of Barlow et al. (1972) and Robertson et al. (1988). Chapters 1, 5, and 6 distinguish this book from ordinary engineering books on these topics. |
You may like...
Scientific English - A Guide for…
Robert A. Day, Nancy Sakaduski
Hardcover
R1,886
Discovery Miles 18 860
Numberblocks and Alphablocks: My First…
Sweet Cherry Publishing
Board book
|