![]() |
Welcome to Loot.co.za!
Sign in / Register |Wishlists & Gift Vouchers |Help | Advanced search
|
Your cart is empty |
||
|
Books > Computing & IT > Applications of computing > Databases > Data capture & analysis
When it comes to data analytics, it pays tothink big. PySpark blends the powerful Spark big data processing engine withthe Python programming language to provide a data analysis platform that can scaleup for nearly any task. Data Analysis with Python and PySpark is yourguide to delivering successful Python-driven data projects. Data Analysis with Python and PySpark is a carefully engineered tutorial that helps you use PySpark to deliver your data-driven applications at any scale. This clear and hands-on guide shows you how to enlarge your processing capabilities across multiple machines with data from any source, ranging from Had oop-based clusters to Excel worksheets. You'll learn how to break down big analysis tasks into manageable chunks and how to choose and use the best PySpark data abstraction for your unique needs. The Spark data processing engine is an amazing analytics factory: raw data comes in,and insight comes out. Thanks to its ability to handle massive amounts of data distributed across a cluster, Spark has been adopted as standard by organizations both big and small. PySpark, which wraps the core Spark engine with a Python-based API, puts Spark-based data pipelines in the hands of programmers and data scientists working with the Python programming language. PySpark simplifies Spark's steep learning curve, and provides a seamless bridge between Spark and an ecosystem of Python-based data science tools.
This book provides a first-hand account of business analytics and its implementation, and an account of the brief theoretical framework underpinning each component of business analytics. The themes of the book include (1) learning the contours and boundaries of business analytics which are in scope; (2) understanding the organization design aspects of an analytical organization; (3) providing knowledge on the domain focus of developing business activities for financial impact in functional analysis; and (4) deriving a whole gamut of business use cases in a variety of situations to apply the techniques. The book gives a complete, insightful understanding of developing and implementing analytical solution.
This book provides a comprehensive introduction on opinion analysis for online reviews. It offers the newest research on opinion mining, including theories, algorithms and datasets. A new feature presentation method is highlighted for sentiment classification. Then, a three-phase framework for sentiment classification is proposed, where a set of sentiment classifiers are selected automatically to make predictions. Such predictions are integrated via ensemble learning. Finally, to solve the problem of combination explosion encountered, a greedy algorithm is devised to select the base classifiers.
This book provides a comprehensive introduction on opinion analysis for online reviews. It offers the newest research on opinion mining, including theories, algorithms and datasets. A new feature presentation method is highlighted for sentiment classification. Then, a three-phase framework for sentiment classification is proposed, where a set of sentiment classifiers are selected automatically to make predictions. Such predictions are integrated via ensemble learning. Finally, to solve the problem of combination explosion encountered, a greedy algorithm is devised to select the base classifiers.
This book explains how to perform data de-noising, in large scale, with a satisfactory level of accuracy. Three main issues are considered. Firstly, how to eliminate the error propagation from one stage to next stages while developing a filtered model. Secondly, how to maintain the positional importance of data whilst purifying it. Finally, preservation of memory in the data is crucial to extract smart data from noisy big data. If, after the application of any form of smoothing or filtering, the memory of the corresponding data changes heavily, then the final data may lose some important information. This may lead to wrong or erroneous conclusions. But, when anticipating any loss of information due to smoothing or filtering, one cannot avoid the process of denoising as on the other hand any kind of analysis of big data in the presence of noise can be misleading. So, the entire process demands very careful execution with efficient and smart models in order to effectively deal with it.
First Published in 2004. Learning how to analyze qualitative data by computer can be fun. That is one assumption underpinning this introduction to qualitative analysis, which takes account of how computing techniques have enhanced and transformed the field. The author provides a practical discussion of the main procedures for analyzing qualitative data by computer, with most of its examples taken from humour or everyday life. He examines ways in which computers can contribute to greater rigour and creativity, as well as greater efficiency in analysis. He discusses some of the pitfalls and paradoxes as well as the practicalities of computer-based qualitative analysis. The perspective of "Qualitative Data Analysis" is pragmatic rather than prescriptive, introducing different possibilities without advocating one particular approach. The result is a largely discipline-neutral text, which is suitable for arts and social science students and first-time qualitative analysts.
There is increasing pressure to protect computer networks against unauthorized intrusion, and some work in this area is concerned with engineering systems that are robust to attack. However, no system can be made invulnerable. Data Analysis for Network Cyber-Security focuses on monitoring and analyzing network traffic data, with the intention of preventing, or quickly identifying, malicious activity. Such work involves the intersection of statistics, data mining and computer science. Fundamentally, network traffic is relational, embodying a link between devices. As such, graph analysis approaches are a natural candidate. However, such methods do not scale well to the demands of real problems, and the critical aspect of the timing of communications events is not accounted for in these approaches. This book gathers papers from leading researchers to provide both background to the problems and a description of cutting-edge methodology. The contributors are from diverse institutions and areas of expertise and were brought together at a workshop held at the University of Bristol in March 2013 to address the issues of network cyber security.The workshop was supported by the Heilbronn Institute for Mathematical Research.
Newcomers to quantitative analysis need practical guidance on how to analyze data in the real world yet most introductory books focus on lengthy derivations and justifications instead of practical techniques. Covering the technical and professional skills needed by analysts in the academic, private, and public sectors, Applying Analytics: A Practical Introduction systematically teaches novices how to apply algorithms to real data and how to recognize potential pitfalls. It offers one of the first textbooks for the emerging first course in analytics. The text concentrates on the interpretation, strengths, and weaknesses of analytical techniques, along with challenges encountered by analysts in their daily work. The author shares various lessons learned from applying analytics in the real world. He supplements the technical material with coverage of professional skills traditionally learned through experience, such as project management, analytic communication, and using analysis to inform decisions. Example data sets used in the text are available for download online so that readers can test their own analytic routines. Suitable for beginning analysts in the sciences, business, engineering, and government, this book provides an accessible, example-driven introduction to the emerging field of analytics. It shows how to interpret data and identify trends across a range of fields.
The main purpose of this book is to investigate, explore and describe approaches and methods to facilitate data understanding through analytics solutions based on its principles, concepts and applications. But analyzing data is also about involving the use of software. For this, and in order to cover some aspect of data analytics, this book uses software (Excel, SPSS, Python, etc) which can help readers to better understand the analytics process in simple terms and supporting useful methods in its application.
Build predictive models from time-based patterns in your data. Master statistical models including new deep learning approaches for time series forecasting. In Time Series Forecasting in Python you will learn how to: Recognize a time series forecasting problem and build a performant predictive model Create univariate forecasting models that account for seasonal effects and external variables Build multivariate forecasting models to predict many time series at once Leverage large datasets by using deep learning for forecasting time series Automate the forecasting process DESCRIPTION Time Series Forecasting in Python teaches you to build powerful predictive models from time-based data. Every model you create is relevant, useful, and easy to implement with Python. You'll explore interesting real-world datasets like Google's daily stock price and economic data for the USA, quickly progressing from the basics to developing large-scale models that use deep learning tools like TensorFlow.Time Series Forecasting in Python teaches you to apply time series forecasting and get immediate, meaningful predictions. You'll learn both traditional statistical and new deep learning models for time series forecasting, all fully illustrated with Python source code. Time Series Forecasting in Python teaches you to build powerful predictive models from time-based data. Every model you create is relevant, useful, and easy to implement with Python. You'll explore interesting real-world datasets like Google's daily stock price and economic data for the USA, quickly progressing from the basics to developing large-scale models that use deep learning tools like TensorFlow. about the technology Time series forecasting reveals hidden trends and makes predictions about the future from your data. This powerful technique has proven incredibly valuable across multiple fields-from tracking business metrics, to healthcare and the sciences. Modern Python libraries and powerful deep learning tools have opened up new methods and utilities for making practical time series forecasts. about the book Time Series Forecasting in Python teaches you to apply time series forecasting and get immediate, meaningful predictions. You'll learn both traditional statistical and new deep learning models for time series forecasting, all fully illustrated with Python source code. Test your skills with hands-on projects for forecasting air travel, volume of drug prescriptions, and the earnings of Johnson & Johnson. By the time you're done, you'll be ready to build accurate and insightful forecasting models with tools from the Python ecosystem.
The authors provide an understanding of big data and MapReduce by clearly presenting the basic terminologies and concepts. They have employed over 100 illustrations and many worked-out examples to convey the concepts and methods used in big data, the inner workings of MapReduce, and single node/multi-node installation on physical/virtual machines. This book covers almost all the necessary information on Hadoop MapReduce for most online certification exams. Upon completing this book, readers will find it easy to understand other big data processing tools such as Spark, Storm, etc. Ultimately, readers will be able to: * understand what big data is and the factors that are involved * understand the inner workings of MapReduce, which is essential for certification exams * learn the features and weaknesses of MapReduce * set up Hadoop clusters with 100s of physical/virtual machines * create a virtual machine in AWS * write MapReduce with Eclipse in a simple way * understand other big data processing tools and their applications
Enterprise Resource Planning (ERP), Supply Chain Management (SCM), Customer Relationship Management (CRM), Business Intelligence (BI) and Big Data analytics (BDA) are business related tasks and processes, which are supported by standardized software solutions. The book explains that this requires business-oriented thinking and acting from IT specialists and data scientists. It is a good idea to let students experience this directly from the business perspective, for example as executives of a virtual company in a role-playing game. The second edition of the book has been completely revised, restructured and supplemented with actual topics such as blockchains in supply chains and the correlation between Big Data analytics, artificial intelligence and machine learning. The structure of the book is based on the gradual implementation and integration of the respective information systems from the business and management perspectives. Part I contains chapters with detailed descriptions of the topics supplemented by online tests and exercises. Part II introduces role play and the online gaming and simulation environment. Supplementary teaching material, presentations, templates, and video clips are available online in the gaming area. The gaming and business simulation Kdibisglobal.com, newly created for this book, now includes a beer division, a bottled water division, a soft drink division and a manufacturing division for barcode cash register scanner with their specific business processes and supply chains.
Product information not available.
Data analysis is changing fast. Driven by a vast range of application domains and affordable tools, machine learning has become mainstream. Unsupervised data analysis, including cluster analysis, factor analysis, and low dimensionality mapping methods continually being updated, have reached new heights of achievement in the incredibly rich data world that we inhabit. Statistical Learning and Data Science is a work of reference in the rapidly evolving context of converging methodologies. It gathers contributions from some of the foundational thinkers in the different fields of data analysis to the major theoretical results in the domain. On the methodological front, the volume includes conformal prediction and frameworks for assessing confidence in outputs, together with attendant risk. It illustrates a wide range of applications, including semantics, credit risk, energy production, genomics, and ecology. The book also addresses issues of origin and evolutions in the unsupervised data analysis arena, and presents some approaches for time series, symbolic data, and functional data. Over the history of multidimensional data analysis, more and more complex data have become available for processing. Supervised machine learning, semi-supervised analysis approaches, and unsupervised data analysis, provide great capability for addressing the digital data deluge. Exploring the foundations and recent breakthroughs in the field, Statistical Learning and Data Science demonstrates how data analysis can improve personal and collective health and the well-being of our social, business, and physical environments.
The general theme of this book is to present innovative psychometric modeling and methods. In particular, this book includes research and successful examples of modeling techniques for new data sources from digital assessments, such as eye-tracking data, hint uses, and process data from game-based assessments. In addition, innovative psychometric modeling approaches, such as graphical models, item tree models, network analysis, and cognitive diagnostic models, are included. Chapters 1, 2, 4 and 6 are about psychometric models and methods for learning analytics. The first two chapters focus on advanced cognitive diagnostic models for tracking learning and the improvement of attribute classification accuracy. Chapter 4 demonstrates the use of network analysis for learning analytics. Chapter 6 introduces the conjunctive root causes model for the understanding of prerequisite skills in learning. Chapters 3, 5, 8, 9 are about innovative psychometric techniques to model process data. Specifically, Chapters 3 and 5 illustrate the usage of generalized linear mixed effect models and item tree models to analyze eye-tracking data. Chapter 8 discusses the modeling approach of hint uses and response accuracy in learning environment. Chapter 9 demonstrates the identification of observable outcomes in the game-based assessments. Chapters 7 and 10 introduce innovative latent variable modeling approaches, including the graphical and generalized linear model approach and the dynamic modeling approach. In summary, the book includes theoretical, methodological, and applied research and practices that serve as the foundation for future development. These chapters provide illustrations of efforts to model and analyze multiple data sources from digital assessments. When computer-based assessments are emerging and evolving, it is important that researchers can expand and improve the methods for modeling and analyzing new data sources. This book provides a useful resource to researchers who are interested in the development of psychometric methods to solve issues in this digital assessment age.
First book to examine game analysis, modern didactic reflections on learning, and big data in a key topic in science and society today. Provides understanding on how to use game analysis when applied to different sports and how to use the approach for video, event and positional data. Presents translational work that has implications for academics, programmers and applied practitioners.
There is a lack of an exposition on interdisciplinary and innovative methods of data mining and visualization for biodata. This book fills the gap by introducing an interdisciplinary set of the most recent methods and references on novel techniques from artificial intelligence, data mining, engineering, pattern recognition, and ontological data mining fields that are applicable to bioinformatics. The latest novel approaches are explained in detail, their advantages and disadvantages are summarized, and pointers to the future development of new applications are given. By widening the pool from which biologists and bioinformaticians can adopt methods for biodata mining and visualization, computational data mining experts in nonbiological fields are also encouraged to utilize their expertise in order to contribute to the progress of computational biology, thus enhancing the collaboration between these two disciplines.
Regarding the set of all feature attributes in a given database as the universal set, this monograph discusses various nonadditive set functions that describe the interaction among the contributions from feature attributes towards a considered target attribute. Then, the relevant nonlinear integrals are investigated. These integrals can be applied as aggregation tools in information fusion and data mining, such as synthetic evaluation, nonlinear multiregressions, and nonlinear classifications. Some methods of fuzzification are also introduced for nonlinear integrals such that fuzzy data can be treated and fuzzy information is retrievable. The book is suitable as a text for graduate courses in mathematics, computer science, and information science. It is also useful to researchers in the relevant area.
This is a book about how ecologists can integrate remote sensing and GIS in their research. It will allow readers to get started with the application of remote sensing and to understand its potential and limitations. Using practical examples, the book covers all necessary steps from planning field campaigns to deriving ecologically relevant information through remote sensing and modelling of species distributions. An Introduction to Spatial Data Analysis introduces spatial data handling using the open source software Quantum GIS (QGIS). In addition, readers will be guided through their first steps in the R programming language. The authors explain the fundamentals of spatial data handling and analysis, empowering the reader to turn data acquired in the field into actual spatial data. Readers will learn to process and analyse spatial data of different types and interpret the data and results. After finishing this book, readers will be able to address questions such as "What is the distance to the border of the protected area?", "Which points are located close to a road?", "Which fraction of land cover types exist in my study area?" using different software and techniques. This book is for novice spatial data users and does not assume any prior knowledge of spatial data itself or practical experience working with such data sets. Readers will likely include student and professional ecologists, geographers and any environmental scientists or practitioners who need to collect, visualize and analyse spatial data. The software used is the widely applied open source scientific programs QGIS and R. All scripts and data sets used in the book will be provided online at book.ecosens.org. This book covers specific methods including: what to consider before collecting in situ data how to work with spatial data collected in situ the difference between raster and vector data how to acquire further vector and raster data how to create relevant environmental information how to combine and analyse in situ and remote sensing data how to create useful maps for field work and presentations how to use QGIS and R for spatial analysis how to develop analysis scripts
Poor data quality is known to compromise the credibility and efficiency of commercial and public endeavours. Also, the importance of managing data quality has increased manifold as the diversity of sources, formats and volume of data grows. This volume targets the data quality in the light of collaborative information systems where data creation and ownership is increasingly difficult to establish.
Data Warehousing has been around for 20 years and has become part
of the information technology infrastructure. Data warehousing
originally grew in response to the corporate need for
information--not data--and it supplies integrated, granular, and
historical data to the corporation.
Numerical simulation models are used in all engineering disciplines for modeling physical phenomena to learn how the phenomena work, and to identify problems and optimize behavior. Smart Proxy Models provide an opportunity to replicate numerical simulations with very high accuracy and can be run on a laptop within a few minutes, thereby simplifying the use of complex numerical simulations, which can otherwise take tens of hours. This book focuses on Smart Proxy Modeling and provides readers with all the essential details on how to develop Smart Proxy Models using Artificial Intelligence and Machine Learning, as well as how it may be used in real-world cases. Covers replication of highly accurate numerical simulations using Artificial Intelligence and Machine Learning Details application in reservoir simulation and modeling and computational fluid dynamics Includes real case studies based on commercially available simulators Smart Proxy Modeling is ideal for petroleum, chemical, environmental, and mechanical engineers, as well as statisticians and others working with applications of data-driven analytics.
The ability of storing, managing and giving access to the huge quantity of data collected by astronomical observatories is one of the major challenges of modern astronomy. At the same time, the growing complexity of data systems implies a change of concepts: the scientist has to manipulate data as well as information. Developments of the "World Wide Web" bring answers to these problems. The book presents a wide selection of databases, archives, data centres and information systems. Descriptions are included, together with their scientific context and motivations. This volume should prove a useful tool for astronomers, librarians, data specialists and computer engineers.
This is the first comprehensive book dedicated entirely to the field of decision trees in data mining and covers all aspects of this important technique.Decision trees have become one of the most powerful and popular approaches in knowledge discovery and data mining, the science and technology of exploring large and complex bodies of data in order to discover useful patterns. The area is of great importance because it enables modeling and knowledge extraction from the abundance of data available. Both theoreticians and practitioners are continually seeking techniques to make the process more efficient, cost-effective and accurate. Decision trees, originally implemented in decision theory and statistics, are highly effective tools in other areas such as data mining, text mining, information extraction, machine learning, and pattern recognition. This book invites readers to explore the many benefits in data mining that decision trees offer:
This book is the culmination of three years of research effort on a multidisciplinary project in which physicists, mathematicians, computer scientists and social scientists worked together to arrive at a unifying picture of complex networks. The contributed chapters form a reference for the various problems in data analysis visualization and modeling of complex networks. |
You may like...
Mathematical Methods in Data Science
Jingli Ren, Haiyan Wang
Paperback
R3,925
Discovery Miles 39 250
Machine Learning and Data Analytics for…
Manikant Roy, Lovi Raj Gupta
Hardcover
R10,591
Discovery Miles 105 910
Challenges and Applications of Data…
V. Sathiyamoorthi, Atilla Elci
Hardcover
R6,734
Discovery Miles 67 340
AIoT Technologies and Applications for…
Mamoun Alazab, Meenu Gupta, …
Hardcover
Demystifying Graph Data Science - Graph…
Pethuru Raj, Abhishek Kumar, …
Hardcover
|