![]() |
![]() |
Your cart is empty |
||
Books > Computing & IT > Applications of computing > Databases > Data capture & analysis
Topological data analysis (TDA) has emerged recently as a viable tool for analyzing complex data, and the area has grown substantially both in its methodologies and applicability. Providing a computational and algorithmic foundation for techniques in TDA, this comprehensive, self-contained text introduces students and researchers in mathematics and computer science to the current state of the field. The book features a description of mathematical objects and constructs behind recent advances, the algorithms involved, computational considerations, as well as examples of topological structures or ideas that can be used in applications. It provides a thorough treatment of persistent homology together with various extensions - like zigzag persistence and multiparameter persistence - and their applications to different types of data, like point clouds, triangulations, or graph data. Other important topics covered include discrete Morse theory, the Mapper structure, optimal generating cycles, as well as recent advances in embedding TDA within machine learning frameworks.
Text contains a wealth of information about about a wide variety of sociocultural constructs. Automated prediction methods can infer these quantities (sentiment analysis is probably the most well-known application). However, there is virtually no limit to the kinds of things we can predict from text: power, trust, misogyny, are all signaled in language. These algorithms easily scale to corpus sizes infeasible for manual analysis. Prediction algorithms have become steadily more powerful, especially with the advent of neural network methods. However, applying these techniques usually requires profound programming knowledge and machine learning expertise. As a result, many social scientists do not apply them. This Element provides the working social scientist with an overview of the most common methods for text classification, an intuition of their applicability, and Python code to execute them. It covers both the ethical foundations of such work as well as the emerging potential of neural network methods.
This book provides readers the "big picture" and a comprehensive survey of the domain of big data processing systems. For the past decade, the Hadoop framework has dominated the world of big data processing, yet recently academia and industry have started to recognize its limitations in several application domains and thus, it is now gradually being replaced by a collection of engines that are dedicated to specific verticals (e.g. structured data, graph data, and streaming data). The book explores this new wave of systems, which it refers to as Big Data 2.0 processing systems. After Chapter 1 presents the general background of the big data phenomena, Chapter 2 provides an overview of various general-purpose big data processing systems that allow their users to develop various big data processing jobs for different application domains. In turn, Chapter 3 examines various systems that have been introduced to support the SQL flavor on top of the Hadoop infrastructure and provide competing and scalable performance in the processing of large-scale structured data. Chapter 4 discusses several systems that have been designed to tackle the problem of large-scale graph processing, while the main focus of Chapter 5 is on several systems that have been designed to provide scalable solutions for processing big data streams, and on other sets of systems that have been introduced to support the development of data pipelines between various types of big data processing jobs and systems. Next, Chapter 6 focuses on covering the emerging frameworks and systems in the domain of scalable machine learning and deep learning processing. Lastly, Chapter 7 shares conclusions and an outlook on future research challenges. This new and considerably enlarged second edition not only contains the completely new chapter 6, but also offers a refreshed content for the state-of-the-art in all domains of big data processing over the last years. Overall, the book offers a valuable reference guide for professional, students, and researchers in the domain of big data processing systems. Further, its comprehensive content will hopefully encourage readers to pursue further research on the subject.
Measuring the abundance of individuals and the diversity of species are core components of most ecological research projects and conservation monitoring. This book brings together in one place, for the first time, the methods used to estimate the abundance of individuals in nature. The statistical basis of each method is detailed along with practical considerations for survey design and data collection. Methods are illustrated using data ranging from Alaskan shrubs to Yellowstone grizzly bears, not forgetting Costa Rican ants and Prince Edward Island lobsters. Where necessary, example code for use with the open source software R is supplied. When appropriate, reference is made to other widely used programs. After opening with a brief synopsis of relevant statistical methods, the first section deals with the abundance of stationary items such as trees, shrubs, coral, etc. Following a discussion of the use of quadrats and transects in the contexts of forestry sampling and the assessment of plant cover, there are chapters addressing line-intercept sampling, the use of nearest-neighbour distances, and variable sized plots. The second section deals with individuals that move, such as birds, mammals, reptiles, fish, etc. Approaches discussed include double-observer sampling, removal sampling, capture-recapture methods and distance sampling. The final section deals with the measurement of species richness; species diversity; species-abundance distributions; and other aspects of diversity such as evenness, similarity, turnover and rarity. This is an essential reference for anyone involved in advanced undergraduate or postgraduate ecological research and teaching, or those planning and carrying out data analysis as part of conservation survey and monitoring programmes.
The increasing availability of data in our current, information overloaded society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract knowledge from such data. This book provides an accessible introduction to data mining methods in a consistent and application oriented statistical framework, using case studies drawn from real industry projects and highlighting the use of data mining methods in a variety of business applications. Introduces data mining methods and applications.Covers classical and Bayesian multivariate statistical methodology as well as machine learning and computational data mining methods.Includes many recent developments such as association and sequence rules, graphical Markov models, lifetime value modelling, credit risk, operational risk and web mining.Features detailed case studies based on applied projects within industry.Incorporates discussion of data mining software, with case studies analysed using R.Is accessible to anyone with a basic knowledge of statistics or data analysis.Includes an extensive bibliography and pointers to further reading within the text. "Applied Data Mining for Business and Industry, 2nd edition" is aimed at advanced undergraduate and graduate students of data mining, applied statistics, database management, computer science and economics. The case studies will provide guidance to professionals working in industry on projects involving large volumes of data, such as customer relationship management, web design, risk management, marketing, economics and finance.
The LNCS journal Transactions on Large-Scale Data- and Knowledge-Centered Systems focuses on data management, knowledge discovery, and knowledge processing, which are core and hot topics in computer science. Since the 1990s, the Internet has become the main driving force behind application development in all domains. An increase in the demand for resource sharing (e.g., computing resources, services, metadata, data sources) across different sites connected through networks has led to an evolution of data- and knowledge management systems from centralized systems to decentralized systems enabling large-scale distributed applications providing high scalability. This, the 48th issue of Transactions on Large-Scale Data- and Knowledge-Centered Systems, contains 8 invited papers dedicated to the memory of Prof. Dr. Roland Wagner. The topics covered include distributed database systems, NewSQL, scalable transaction management, strong consistency, caches, data warehouse, ETL, reinforcement learning, stochastic approximation, multi-agent systems, ontology, model-driven development, organisational modelling, digital government, new institutional economics and data governance.
Imagine you are a business user, consultant, or developer about to enter an SAP S/4HANA implementation project. You are well-versed with SAP's product portfolio and you know that the preferred reporting option in S/4HANA is embedded analytics. But what exactly is embedded analytics? And how can it be implemented? And who can do it: a business user, a functional consultant specialized in financial or logistics processes? Or does a business intelligence expert or a programmer need to be involved? Good questions! This book will answer these questions, one by one. It will also take you on the same journey that the implementation team needs to follow for every reporting requirement that pops up: start with assessing a more standard option and only move on to a less standard option if the requirement cannot be fulfilled. In consecutive chapters, analytical apps delivered by SAP, apps created using Smart Business Services, and Analytical Queries developed either using tiles or in a development environment are explained in detail with practical examples. The book also explains which option is preferred in which situation. The book covers topics such as in-memory computing, cloud, UX, OData, agile development, and more. Author Freek Keijzer writes from the perspective of an implementation consultant, focusing on functionality that has proven itself useful in the field. Practical examples are abundant, ranging from "codeless" to "hardcore coding." What You Will Learn Know the difference between static reporting and interactive querying on real-time data Understand which options are available for analytics in SAP S/4HANA Understand which option to choose in which situation Know how to implement these options Who This Book is ForSAP power users, functional consultants, developers
Statistical data and evidence-based claims are increasingly central to our everyday lives. Critically examining 'Big Data', this book charts the recent explosion in sources of data, including those precipitated by global developments and technological change. It sets out changes and controversies related to data harvesting and construction, dissemination and data analytics by a range of private, governmental and social organisations in multiple settings. Analysing the power of data to shape political debate, the presentation of ideas to us by the media, and issues surrounding data ownership and access, the authors suggest how data can be used to uncover injustices and to advance social progress.
Predator-prey interactions are ubiquitous, govern the flow of energy up trophic levels, and strongly influence the structure of ecological systems. They are typically quantified using the functional response - the relationship between a predator's foraging rate and the availability of food. As such, the functional response is central to how all ecological communities function - since all communities contain foragers - and a principal driver of the abundance, diversity, and dynamics of ecological communities. The functional response also reflects all the behaviors, traits, and strategies that predators use to hunt prey and that prey use to evade predation. It is thus both a clear reflection of past evolution, including predator-prey arms races, and a major force driving the future evolution of both predator and prey. Despite their importance, there have been remarkably few attempts to synthesize or even briefly review functional responses. This novel and accessible book fills this gap, clearly demonstrating their crucial role as the link between individuals, evolution, and community properties, representing a highly-integrated and measurable aspect of ecological function. It provides a clear entry point for students, a refresher for more advanced researchers, and a motivator for future research. Predator Ecology is an advanced textbook suitable for graduate students and researchers in ecology and evolutionary biology seeking a broad, up-to-date, and authoritative coverage of the field. It will also be of relevance and use to mathematical ecologists, wildlife biologists, and anyone interested in predator-prey interactions.
This book constitutes revised selected papers of the 9th International Conference on Analysis of Images, Social Networks and Texts, AIST 2020, held in Moscow, Russia, in october 2020. Due to the COVID-19 pandemic the conference was held online. The 14 full papers, 9 short papers and 4 poster papers were carefully reviewed and selected from 108 qualified submissions. The papers are organized in topical sections on natural language processing; computer vision; social network analysis; data analysis and machine learning; theoretical machine learning and optimization; process mining; posters.
Demography is everywhere in our lives: from birth to death. Indeed, the universal currencies of survival, development, reproduction, and recruitment shape the performance of all species, from microbes to humans. The number of techniques for demographic data acquisition and analyses across the entire tree of life (microbes, fungi, plants, and animals) has drastically increased in recent decades. These developments have been partially facilitated by the advent of technologies such as GIS and drones, as well as analytical methods including Bayesian statistics and high-throughput molecular analyses. However, despite the universality of demography and the significant research potential that could emerge from unifying: (i) questions across taxa, (ii) data collection protocols, and (iii) analytical tools, demographic methods to date have remained taxonomically siloed and methodologically disintegrated. This is the first book to attempt a truly unified approach to demography and population ecology in order to address a wide range of questions in ecology, evolution, and conservation biology across the entire spectrum of life. This novel book provides the reader with the fundamentals of data collection, model construction, analyses, and interpretation across a wide repertoire of demographic techniques and protocols. It introduces the novice demographer to a broad range of demographic methods, including abundance-based models, life tables, matrix population models, integral projection models, integrated population models, individual based models, and more. Through the careful integration of data collection methods, analytical approaches, and applications, clearly guided throughout with fully reproducible R scripts, the book provides an up-to-date and authoritative overview of the most popular and effective demographic tools. Demographic Methods across the Tree of Life is aimed at graduate students and professional researchers in the fields of demography, ecology, animal behaviour, genetics, evolutionary biology, mathematical biology, and wildlife management.
Most economists agree that AI is a general purpose technology (GPT) like the steam engine, electricity, and the computer. AI will drive innovation in all sectors of the economy for the foreseeable future. Practical AI for Business Leaders, Product Managers, and Entrepreneurs is a technical guidebook for the business leader or anyone responsible for leading AI-related initiatives in their organization. The book can also be used as a foundation to explore the ethical implications of AI. Authors Alfred Essa and Shirin Mojarad provide a gentle introduction to foundational topics in AI. Each topic is framed as a triad: concept, theory, and practice. The concept chapters develop the intuition, culminating in a practical case study. The theory chapters reveal the underlying technical machinery. The practice chapters provide code in Python to implement the models discussed in the case study. With this book, readers will learn: The technical foundations of machine learning and deep learning How to apply the core technical concepts to solve business problems The different methods used to evaluate AI models How to understand model development as a tradeoff between accuracy and generalization How to represent the computational aspects of AI using vectors and matrices How to express the models in Python by using machine learning libraries such as scikit-learn, statsmodels, and keras
Data are not only ubiquitous in society, but are increasingly complex both in size and dimensionality. Dimension reduction offers researchers and scholars the ability to make such complex, high dimensional data spaces simpler and more manageable. This Element offers readers a suite of modern unsupervised dimension reduction techniques along with hundreds of lines of R code, to efficiently represent the original high dimensional data space in a simplified, lower dimensional subspace. Launching from the earliest dimension reduction technique principal components analysis and using real social science data, I introduce and walk readers through application of the following techniques: locally linear embedding, t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection, self-organizing maps, and deep autoencoders. The result is a well-stocked toolbox of unsupervised algorithms for tackling the complexities of high dimensional data so common in modern society. All code is publicly accessible on Github.
This two-volume set, LNCS 11317 and 12318, constitutes the thoroughly refereed proceedings of the 4th International Joint Conference, APWeb-WAIM 2020, held in Tianjin, China, in September 2020. Due to the COVID-19 pandemic the conference was organizedas a fully online conference. The 42 full papers presented together with 17 short papers, and 6 demonstration papers were carefully reviewed and selected from 180 submissions. The papers are organized around the following topics: Big Data Analytics; Graph Data and Social Networks; Knowledge Graph; Recommender Systems; Information Extraction and Retrieval; Machine Learning; Blockchain; Data Mining; Text Analysis and Mining; Spatial, Temporal and Multimedia Databases; Database Systems; and Demo.
Many students find it daunting to move from studying environmental science, to designing and implementing their own research proposals. This book provides a practical introduction to help develop scientific thinking, aimed at undergraduate and new graduate students in the earth and environmental sciences. Students are guided through the steps of scientific thinking using published scientific literature and real environmental data. The book starts with advice on how to effectively read scientific papers, before outlining how to articulate testable questions and answer them using basic data analysis. The Mauna Loa CO2 dataset is used to demonstrate how to read metadata, prepare data, generate effective graphs and identify dominant cycles on various timescales. Practical, question-driven examples are explored to explain running averages, anomalies, correlations and simple linear models. The final chapter provides a framework for writing persuasive research proposals, making this an essential guide for students embarking on their first research project.
Many students find it daunting to move from studying environmental science, to designing and implementing their own research proposals. This book provides a practical introduction to help develop scientific thinking, aimed at undergraduate and new graduate students in the earth and environmental sciences. Students are guided through the steps of scientific thinking using published scientific literature and real environmental data. The book starts with advice on how to effectively read scientific papers, before outlining how to articulate testable questions and answer them using basic data analysis. The Mauna Loa CO2 dataset is used to demonstrate how to read metadata, prepare data, generate effective graphs and identify dominant cycles on various timescales. Practical, question-driven examples are explored to explain running averages, anomalies, correlations and simple linear models. The final chapter provides a framework for writing persuasive research proposals, making this an essential guide for students embarking on their first research project.
This is the first comprehensive overview of the 'science of science,' an emerging interdisciplinary field that relies on big data to unveil the reproducible patterns that govern individual scientific careers and the workings of science. It explores the roots of scientific impact, the role of productivity and creativity, when and what kind of collaborations are effective, the impact of failure and success in a scientific career, and what metrics can tell us about the fundamental workings of science. The book relies on data to draw actionable insights, which can be applied by individuals to further their career or decision makers to enhance the role of science in society. With anecdotes and detailed, easy-to-follow explanations of the research, this book is accessible to all scientists and graduate students, policymakers, and administrators with an interest in the wider scientific enterprise.
BIG DATA, ARTIFICIAL INTELLIGENCE AND DATA ANALYSIS SET Coordinated by Jacques Janssen Data analysis is a scientific field that continues to grow enormously, most notably over the last few decades, following rapid growth within the tech industry, as well as the wide applicability of computational techniques alongside new advances in analytic tools. Modeling enables data analysts to identify relationships, make predictions, and to understand, interpret and visualize the extracted information more strategically. This book includes the most recent advances on this topic, meeting increasing demand from wide circles of the scientific community. Applied Modeling Techniques and Data Analysis 1 is a collective work by a number of leading scientists, analysts, engineers, mathematicians and statisticians, working on the front end of data analysis and modeling applications. The chapters cover a cross section of current concerns and research interests in the above scientific areas. The collected material is divided into appropriate sections to provide the reader with both theoretical and applied information on data analysis methods, models and techniques, along with appropriate applications.
Take your first steps to becoming a fully qualified data analyst by learning how to explore complex datasets Key Features Master each concept through practical exercises and activities Discover various statistical techniques to analyze your data Implement everything you've learned on a real-world case study to uncover valuable insights Book DescriptionEvery day, businesses operate around the clock, and a huge amount of data is generated at a rapid pace. This book helps you analyze this data and identify key patterns and behaviors that can help you and your business understand your customers at a deep, fundamental level. SQL for Data Analytics, Third Edition is a great way to get started with data analysis, showing how to effectively sort and process information from raw data, even without any prior experience. You will begin by learning how to form hypotheses and generate descriptive statistics that can provide key insights into your existing data. As you progress, you will learn how to write SQL queries to aggregate, calculate, and combine SQL data from sources outside of your current dataset. You will also discover how to work with advanced data types, like JSON. By exploring advanced techniques, such as geospatial analysis and text analysis, you will be able to understand your business at a deeper level. Finally, the book lets you in on the secret to getting information faster and more effectively by using advanced techniques like profiling and automation. By the end of this book, you will be proficient in the efficient application of SQL techniques in everyday business scenarios and looking at data with the critical eye of analytics professional. What you will learn Use SQL to clean, prepare, and combine different datasets Aggregate basic statistics using GROUP BY clauses Perform advanced statistical calculations using a WINDOW function Import data into a database to combine with other tables Export SQL query results into various sources Analyze special data types in SQL, including geospatial, date/time, and JSON data Optimize queries and automate tasks Think about data problems and find answers using SQL Who this book is forIf you're a database engineer looking to transition into analytics or a backend engineer who wants to develop a deeper understanding of production data and gain practical SQL knowledge, you will find this book useful. This book is also ideal for data scientists or business analysts who want to improve their data analytics skills using SQL. Basic familiarity with SQL (such as basic SELECT, WHERE, and GROUP BY clauses) as well as a good understanding of linear algebra, statistics, and PostgreSQL 14 are necessary to make the most of this SQL data analytics book.
Get started using Python in data analysis with this compact practical guide. This book includes three exercises and a case study on getting data in and out of Python code in the right format. Learn Data Analysis with Python also helps you discover meaning in the data using analysis and shows you how to visualize it. Each lesson is, as much as possible, self-contained to allow you to dip in and out of the examples as your needs dictate. If you are already using Python for data analysis, you will find a number of things that you wish you knew how to do in Python. You can then take these techniques and apply them directly to your own projects. If you aren't using Python for data analysis, this book takes you through the basics at the beginning to give you a solid foundation in the topic. As you work your way through the book you will have a better of idea of how to use Python for data analysis when you are finished. What You Will Learn Get data into and out of Python code Prepare the data and its format Find the meaning of the data Visualize the data using iPython Who This Book Is For Those who want to learn data analysis using Python. Some experience with Python is recommended but not required, as is some prior experience with data analysis or data science.
Text is everywhere, and it is a fantastic resource for social scientists. However, because it is so abundant, and because language is so variable, it is often difficult to extract the information we want. There is a whole subfield of AI concerned with text analysis (natural language processing). Many of the basic analysis methods developed are now readily available as Python implementations. This Element will teach you when to use which method, the mathematical background of how it works, and the Python code to implement it.
Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere fraction of what classical analytics solutions cost, while at the same time getting the results you need, incrementally faster. This book explains how the confluence of these pivotal technologies gives you enormous power, and cheaply, when it comes to huge datasets. You will begin by learning how cloud infrastructure makes it possible to scale your code to large amounts of processing units, without having to pay for the machinery in advance. From there you will learn how Apache Spark, an open source framework, can enable all those CPUs for data analytics use. Finally, you will see how services such as Databricks provide the power of Apache Spark, without you having to know anything about configuring hardware or software. By removing the need for expensive experts and hardware, your resources can instead be allocated to actually finding business value in the data. This book guides you through some advanced topics such as analytics in the cloud, data lakes, data ingestion, architecture, machine learning, and tools, including Apache Spark, Apache Hadoop, Apache Hive, Python, and SQL. Valuable exercises help reinforce what you have learned. What You Will Learn Discover the value of big data analytics that leverage the power of the cloud Get started with Databricks using SQL and Python in either Microsoft Azure or AWS Understand the underlying technology, and how the cloud and Apache Spark fit into the bigger picture See how these tools are used in the real world Run basic analytics, including machine learning, on billions of rows at a fraction of a cost or free Who This Book Is For Data engineers, data scientists, and cloud architects who want or need to run advanced analytics in the cloud. It is assumed that the reader has data experience, but perhaps minimal exposure to Apache Spark and Azure Databricks. The book is also recommended for people who want to get started in the analytics field, as it provides a strong foundation.
Images play a crucial role in shaping and reflecting political life. Digitization has vastly increased the presence of such images in daily life, creating valuable new research opportunities for social scientists. We show how recent innovations in computer vision methods can substantially lower the costs of using images as data. We introduce readers to the deep learning algorithms commonly used for object recognition, facial recognition, and visual sentiment analysis. We then provide guidance and specific instructions for scholars interested in using these methods in their own research.
In today s digital environment, distributed systems are increasingly present in a wide variety of environments, ranging from public software applications to critical systems. Distributed Systems introduces the underlying concepts, the associated design techniques and the related security issues. Distributed Systems: Design and Algorithms, is dedicated to engineers, students, and anyone familiar with algorithms and programming, who want to know more about distributed systems. These systems are characterized by: several components with one or more threads, possibly running on different processors; asynchronous communications with possible additional assumptions (reliability, order preserving, etc.); local views for every component and no shared data between components. This title presents distributed systems from a point of view dedicated to their design and their main principles: the main algorithms are described and placed in their application context, i.e. consistency management and the way they are used in distributed file-systems.
|
![]() ![]() You may like...
Intelligent Data Analysis for e-Learning…
Jorge Miguel, Santi Caballe, …
Paperback
Data Analytics for Social Microblogging…
Soumi Dutta, Asit Kumar Das, …
Paperback
R3,454
Discovery Miles 34 540
Handbook of Research on Engineering…
Bhushan Patil, Manisha Vohra
Hardcover
R10,417
Discovery Miles 104 170
Cross-Cultural Analysis of Image-Based…
Lisa Keller, Robert Keller, …
Hardcover
R3,599
Discovery Miles 35 990
Machine Learning and Data Analytics for…
Manikant Roy, Lovi Raj Gupta
Hardcover
R11,772
Discovery Miles 117 720
Managing and Processing Big Data in…
Rajkumar Kannan, Raihan Ur Rasool, …
Hardcover
R5,521
Discovery Miles 55 210
|