|
|
Books > Computing & IT > Applications of computing > Databases > Data capture & analysis
Discover how to describe your data in detail, identify data issues,
and find out how to solve them using commonly used techniques and
tips and tricks Key Features Get well-versed with various data
cleaning techniques to reveal key insights Manipulate data of
different complexities to shape them into the right form as per
your business needs Clean, monitor, and validate large data volumes
to diagnose problems before moving on to data analysis Book
DescriptionGetting clean data to reveal insights is essential, as
directly jumping into data analysis without proper data cleaning
may lead to incorrect results. This book shows you tools and
techniques that you can apply to clean and handle data with Python.
You'll begin by getting familiar with the shape of data by using
practices that can be deployed routinely with most data sources.
Then, the book teaches you how to manipulate data to get it into a
useful form. You'll also learn how to filter and summarize data to
gain insights and better understand what makes sense and what does
not, along with discovering how to operate on data to address the
issues you've identified. Moving on, you'll perform key tasks, such
as handling missing values, validating errors, removing duplicate
data, monitoring high volumes of data, and handling outliers and
invalid dates. Next, you'll cover recipes on using supervised
learning and Naive Bayes analysis to identify unexpected values and
classification errors, and generate visualizations for exploratory
data analysis (EDA) to visualize unexpected values. Finally, you'll
build functions and classes that you can reuse without modification
when you have new data. By the end of this Python book, you'll be
equipped with all the key skills that you need to clean data and
diagnose problems within it. What you will learn Find out how to
read and analyze data from a variety of sources Produce summaries
of the attributes of data frames, columns, and rows Filter data and
select columns of interest that satisfy given criteria Address
messy data issues, including working with dates and missing values
Improve your productivity in Python pandas by using method chaining
Use visualizations to gain additional insights and identify
potential data issues Enhance your ability to learn what is going
on in your data Build user-defined functions and classes to
automate data cleaning Who this book is forThis book is for anyone
looking for ways to handle messy, duplicate, and poor data using
different Python tools and techniques. The book takes a
recipe-based approach to help you to learn how to clean and manage
data. Working knowledge of Python programming is all you need to
get the most out of the book.
Understand the complexities of modern-day data engineering
platforms and explore strategies to deal with them with the help of
use case scenarios led by an industry expert in big data Key
Features Become well-versed with the core concepts of Apache Spark
and Delta Lake for building data platforms Learn how to ingest,
process, and analyze data that can be later used for training
machine learning models Understand how to operationalize data
models in production using curated data Book DescriptionIn the
world of ever-changing data and schemas, it is important to build
data pipelines that can auto-adjust to changes. This book will help
you build scalable data platforms that managers, data scientists,
and data analysts can rely on. Starting with an introduction to
data engineering, along with its key concepts and architectures,
this book will show you how to use Microsoft Azure Cloud services
effectively for data engineering. You'll cover data lake design
patterns and the different stages through which the data needs to
flow in a typical data lake. Once you've explored the main features
of Delta Lake to build data lakes with fast performance and
governance in mind, you'll advance to implementing the lambda
architecture using Delta Lake. Packed with practical examples and
code snippets, this book takes you through real-world examples
based on production scenarios faced by the author in his 10 years
of experience working with big data. Finally, you'll cover data
lake deployment strategies that play an important role in
provisioning the cloud resources and deploying the data pipelines
in a repeatable and continuous way. By the end of this data
engineering book, you'll know how to effectively deal with
ever-changing data and create scalable data pipelines to streamline
data science, ML, and artificial intelligence (AI) tasks. What you
will learn Discover the challenges you may face in the data
engineering world Add ACID transactions to Apache Spark using Delta
Lake Understand effective design strategies to build
enterprise-grade data lakes Explore architectural and design
patterns for building efficient data ingestion pipelines
Orchestrate a data pipeline for preprocessing data using Apache
Spark and Delta Lake APIs Automate deployment and monitoring of
data pipelines in production Get to grips with securing,
monitoring, and managing data pipelines models efficiently Who this
book is forThis book is for aspiring data engineers and data
analysts who are new to the world of data engineering and are
looking for a practical guide to building scalable data platforms.
If you already work with PySpark and want to use Delta Lake for
data engineering, you'll find this book useful. Basic knowledge of
Python, Spark, and SQL is expected.
Comprehensive recipes to give you valuable insights on
Transformers, Reinforcement Learning, and more Key Features Deep
Learning solutions from Kaggle Masters and Google Developer Experts
Get to grips with the fundamentals including variables, matrices,
and data sources Learn advanced techniques to make your algorithms
faster and more accurate Book DescriptionThe independent recipes in
Machine Learning Using TensorFlow Cookbook will teach you how to
perform complex data computations and gain valuable insights into
your data. Dive into recipes on training models, model evaluation,
sentiment analysis, regression analysis, artificial neural
networks, and deep learning - each using Google's machine learning
library, TensorFlow. This cookbook covers the fundamentals of the
TensorFlow library, including variables, matrices, and various data
sources. You'll discover real-world implementations of Keras and
TensorFlow and learn how to use estimators to train linear models
and boosted trees, both for classification and regression. Explore
the practical applications of a variety of deep learning
architectures, such as recurrent neural networks and Transformers,
and see how they can be used to solve computer vision and natural
language processing (NLP) problems. With the help of this book, you
will be proficient in using TensorFlow, understand deep learning
from the basics, and be able to implement machine learning
algorithms in real-world scenarios. What you will learn Take
TensorFlow into production Implement and fine-tune Transformer
models for various NLP tasks Apply reinforcement learning
algorithms using the TF-Agents framework Understand linear
regression techniques and use Estimators to train linear models
Execute neural networks and improve predictions on tabular data
Master convolutional neural networks and recurrent neural networks
through practical recipes Who this book is forIf you are a data
scientist or a machine learning engineer, and you want to skip
detailed theoretical explanations in favor of building
production-ready machine learning models using TensorFlow, this
book is for you. Basic familiarity with Python, linear algebra,
statistics, and machine learning is necessary to make the most out
of this book.
Explore expert techniques such as advanced indexing and high
availability to build scalable, reliable, and fault-tolerant
database applications using PostgreSQL 13 Key Features Master
advanced PostgreSQL 13 concepts with the help of real-world
datasets and examples Leverage PostgreSQL's indexing features to
fine-tune the performance of your queries Extend PostgreSQL's
functionalities to suit your organization's needs with minimal
effort Book DescriptionThanks to its reliability, robustness, and
high performance, PostgreSQL has become one of the most advanced
open source databases on the market. This updated fourth edition
will help you understand PostgreSQL administration and how to build
dynamic database solutions for enterprise apps with the latest
release of PostgreSQL, including designing both physical and
technical aspects of the system architecture with ease. Starting
with an introduction to the new features in PostgreSQL 13, this
book will guide you in building efficient and fault-tolerant
PostgreSQL apps. You'll explore advanced PostgreSQL features, such
as logical replication, database clusters, performance tuning,
advanced indexing, monitoring, and user management, to manage and
maintain your database. You'll then work with the PostgreSQL
optimizer, configure PostgreSQL for high speed, and move from
Oracle to PostgreSQL. The book also covers transactions, locking,
and indexes, and shows you how to improve performance with query
optimization. You'll also focus on how to manage network security
and work with backups and replication while exploring useful
PostgreSQL extensions that optimize the performance of large
databases. By the end of this PostgreSQL book, you'll be able to
get the most out of your database by executing advanced
administrative tasks. What you will learn Get well versed with
advanced SQL functions in PostgreSQL 13 Get to grips with
administrative tasks such as log file management and monitoring
Work with stored procedures and manage backup and recovery Employ
replication and failover techniques to reduce data loss Perform
database migration from Oracle to PostgreSQL with ease Replicate
PostgreSQL database systems to create backups and scale your
database Manage and improve server security to protect your data
Troubleshoot your PostgreSQL instance to find solutions to common
and not-so-common problems Who this book is forThis database
administration book is for PostgreSQL developers and database
administrators and professionals who want to implement advanced
functionalities and master complex administrative tasks with
PostgreSQL 13. Prior experience in PostgreSQL and familiarity with
the basics of database administration will assist with
understanding key concepts covered in the book.
One-stop solution for NLP practitioners, ML developers, and data
scientists to build effective NLP systems that can perform
real-world complicated tasks Key Features Apply deep learning
algorithms and techniques such as BiLSTMS, CRFs, BPE and more using
TensorFlow 2 Explore applications like text generation,
summarization, weakly supervised labelling and more Read cutting
edge material with seminal papers provided in the GitHub repository
with full working code Book DescriptionRecently, there have been
tremendous advances in NLP, and we are now moving from research
labs into practical applications. This book comes with a perfect
blend of both the theoretical and practical aspects of trending and
complex NLP techniques. The book is focused on innovative
applications in the field of NLP, language generation, and dialogue
systems. It helps you apply the concepts of pre-processing text
using techniques such as tokenization, parts of speech tagging, and
lemmatization using popular libraries such as Stanford NLP and
SpaCy. You will build Named Entity Recognition (NER) from scratch
using Conditional Random Fields and Viterbi Decoding on top of
RNNs. The book covers key emerging areas such as generating text
for use in sentence completion and text summarization, bridging
images and text by generating captions for images, and managing
dialogue aspects of chatbots. You will learn how to apply transfer
learning and fine-tuning using TensorFlow 2. Further, it covers
practical techniques that can simplify the labelling of textual
data. The book also has a working code that is adaptable to your
use cases for each tech piece. By the end of the book, you will
have an advanced knowledge of the tools, techniques and deep
learning architecture used to solve complex NLP problems. What you
will learn Grasp important pre-steps in building NLP applications
like POS tagging Use transfer and weakly supervised learning using
libraries like Snorkel Do sentiment analysis using BERT Apply
encoder-decoder NN architectures and beam search for summarizing
texts Use Transformer models with attention to bring images and
text together Build apps that generate captions and answer
questions about images using custom Transformers Use advanced
TensorFlow techniques like learning rate annealing, custom layers,
and custom loss functions to build the latest DeepNLP models Who
this book is forThis is not an introductory book and assumes the
reader is familiar with basics of NLP and has fundamental Python
skills, as well as basic knowledge of machine learning and
undergraduate-level calculus and linear algebra. The readers who
can benefit the most from this book include intermediate ML
developers who are familiar with the basics of supervised learning
and deep learning techniques and professionals who already use
TensorFlow/Python for purposes such as data science, ML, research,
analysis, etc.
Discover techniques to summarize the characteristics of your data
using PyPlot, NumPy, SciPy, and pandas Key Features Understand the
fundamental concepts of exploratory data analysis using Python Find
missing values in your data and identify the correlation between
different variables Practice graphical exploratory analysis
techniques using Matplotlib and the Seaborn Python package Book
DescriptionExploratory Data Analysis (EDA) is an approach to data
analysis that involves the application of diverse techniques to
gain insights into a dataset. This book will help you gain
practical knowledge of the main pillars of EDA - data cleaning,
data preparation, data exploration, and data visualization. You'll
start by performing EDA using open source datasets and perform
simple to advanced analyses to turn data into meaningful insights.
You'll then learn various descriptive statistical techniques to
describe the basic characteristics of data and progress to
performing EDA on time-series data. As you advance, you'll learn
how to implement EDA techniques for model development and
evaluation and build predictive models to visualize results. Using
Python for data analysis, you'll work with real-world datasets,
understand data, summarize its characteristics, and visualize it
for business intelligence. By the end of this EDA book, you'll have
developed the skills required to carry out a preliminary
investigation on any dataset, yield insights into data, present
your results with visual aids, and build a model that correctly
predicts future outcomes. What you will learn Import, clean, and
explore data to perform preliminary analysis using powerful Python
packages Identify and transform erroneous data using different data
wrangling techniques Explore the use of multiple regression to
describe non-linear relationships Discover hypothesis testing and
explore techniques of time-series analysis Understand and interpret
results obtained from graphical analysis Build, train, and optimize
predictive models to estimate results Perform complex EDA
techniques on open source datasets Who this book is forThis EDA
book is for anyone interested in data analysis, especially
students, statisticians, data analysts, and data scientists. The
practical concepts presented in this book can be applied in various
disciplines to enhance decision-making processes with data analysis
and synthesis. Fundamental knowledge of Python programming and
statistical concepts is all you need to get started with this book.
Leverage the Azure analytics platform's key analytics services to
deliver unmatched intelligence for your data Key Features Learn to
ingest, prepare, manage, and serve data for immediate business
requirements Bring enterprise data warehousing and big data
analytics together to gain insights from your data Develop
end-to-end analytics solutions using Azure Synapse Book
DescriptionAzure Synapse Analytics, which Microsoft describes as
the next evolution of Azure SQL Data Warehouse, is a limitless
analytics service that brings enterprise data warehousing and big
data analytics together. With this book, you'll learn how to
discover insights from your data effectively using this platform.
The book starts with an overview of Azure Synapse Analytics, its
architecture, and how it can be used to improve business
intelligence and machine learning capabilities. Next, you'll go on
to choose and set up the correct environment for your business
problem. You'll also learn a variety of ways to ingest data from
various sources and orchestrate the data using transformation
techniques offered by Azure Synapse. Later, you'll explore how to
handle both relational and non-relational data using the SQL
language. As you progress, you'll perform real-time streaming and
execute data analysis operations on your data using various
languages, before going on to apply ML techniques to derive
accurate and granular insights from data. Finally, you'll discover
how to protect sensitive data in real time by using security and
privacy features. By the end of this Azure book, you'll be able to
build end-to-end analytics solutions while focusing on data prep,
data management, data warehousing, and AI tasks. What you will
learn Explore the necessary considerations for data ingestion and
orchestration while building analytical pipelines Understand
pipelines and activities in Synapse pipelines and use them to
construct end-to-end data-driven workflows Query data using various
coding languages on Azure Synapse Focus on Synapse SQL and Synapse
Spark Manage and monitor resource utilization and query activity in
Azure Synapse Connect Power BI workspaces with Azure Synapse and
create or modify reports directly from Synapse Studio Create and
manage IP firewall rules in Azure Synapse Who this book is forThis
book is for data architects, data scientists, data engineers, and
business analysts who are looking to get up and running with the
Azure Synapse Analytics platform. Basic knowledge of data
warehousing will be beneficial to help you understand the concepts
covered in this book more effectively.
Get to grips with building and productionizing end-to-end big data
solutions in Azure and learn best practices for working with large
datasets Key Features Integrate with Azure Synapse Analytics,
Cosmos DB, and Azure HDInsight Kafka Cluster to scale and analyze
your projects and build pipelines Use Databricks SQL to run ad hoc
queries on your data lake and create dashboards Productionize a
solution using CI/CD for deploying notebooks and Azure Databricks
Service to various environments Book DescriptionAzure Databricks is
a unified collaborative platform for performing scalable analytics
in an interactive environment. The Azure Databricks Cookbook
provides recipes to get hands-on with the analytics process,
including ingesting data from various batch and streaming sources
and building a modern data warehouse. The book starts by teaching
you how to create an Azure Databricks instance within the Azure
portal, Azure CLI, and ARM templates. You'll work through clusters
in Databricks and explore recipes for ingesting data from sources,
including files, databases, and streaming sources such as Apache
Kafka and EventHub. The book will help you explore all the features
supported by Azure Databricks for building powerful end-to-end data
pipelines. You'll also find out how to build a modern data
warehouse by using Delta tables and Azure Synapse Analytics. Later,
you'll learn how to write ad hoc queries and extract meaningful
insights from the data lake by creating visualizations and
dashboards with Databricks SQL. Finally, you'll deploy and
productionize a data pipeline as well as deploy notebooks and Azure
Databricks service using continuous integration and continuous
delivery (CI/CD). By the end of this Azure book, you'll be able to
use Azure Databricks to streamline different processes involved in
building data-driven apps. What you will learn Read and write data
from and to various Azure resources and file formats Build a modern
data warehouse with Delta Tables and Azure Synapse Analytics
Explore jobs, stages, and tasks and see how Spark lazy evaluation
works Handle concurrent transactions and learn performance
optimization in Delta tables Learn Databricks SQL and create
real-time dashboards in Databricks SQL Integrate Azure DevOps for
version control, deploying, and productionizing solutions with
CI/CD pipelines Discover how to use RBAC and ACLs to restrict data
access Build end-to-end data processing pipeline for near real-time
data analytics Who this book is forThis recipe-based book is for
data scientists, data engineers, big data professionals, and
machine learning engineers who want to perform data analytics on
their applications. Prior experience of working with Apache Spark
and Azure is necessary to get the most out of this book.
Think about your data intelligently and ask the right questions Key
Features Master data cleaning techniques necessary to perform
real-world data science and machine learning tasks Spot common
problems with dirty data and develop flexible solutions from first
principles Test and refine your newly acquired skills through
detailed exercises at the end of each chapter Book DescriptionData
cleaning is the all-important first step to successful data
science, data analysis, and machine learning. If you work with any
kind of data, this book is your go-to resource, arming you with the
insights and heuristics experienced data scientists had to learn
the hard way. In a light-hearted and engaging exploration of
different tools, techniques, and datasets real and fictitious,
Python veteran David Mertz teaches you the ins and outs of data
preparation and the essential questions you should be asking of
every piece of data you work with. Using a mixture of Python, R,
and common command-line tools, Cleaning Data for Effective Data
Science follows the data cleaning pipeline from start to end,
focusing on helping you understand the principles underlying each
step of the process. You'll look at data ingestion of a vast range
of tabular, hierarchical, and other data formats, impute missing
values, detect unreliable data and statistical anomalies, and
generate synthetic features. The long-form exercises at the end of
each chapter let you get hands-on with the skills you've acquired
along the way, also providing a valuable resource for academic
courses. What you will learn Ingest and work with common data
formats like JSON, CSV, SQL and NoSQL databases, PDF, and binary
serialized data structures Understand how and why we use tools such
as pandas, SciPy, scikit-learn, Tidyverse, and Bash Apply useful
rules and heuristics for assessing data quality and detecting bias,
like Benford's law and the 68-95-99.7 rule Identify and handle
unreliable data and outliers, examining z-score and other
statistical properties Impute sensible values into missing data and
use sampling to fix imbalances Use dimensionality reduction,
quantization, one-hot encoding, and other feature engineering
techniques to draw out patterns in your data Work carefully with
time series data, performing de-trending and interpolation Who this
book is forThis book is designed to benefit software developers,
data scientists, aspiring data scientists, teachers, and students
who work with data. If you want to improve your rigor in data
hygiene or are looking for a refresher, this book is for you. Basic
familiarity with statistics, general concepts in machine learning,
knowledge of a programming language (Python or R), and some
exposure to data science are helpful.
Dieser Buchtitel ist Teil des Digitalisierungsprojekts Springer
Book Archives mit Publikationen, die seit den Anfangen des Verlags
von 1842 erschienen sind. Der Verlag stellt mit diesem Archiv
Quellen fur die historische wie auch die disziplingeschichtliche
Forschung zur Verfugung, die jeweils im historischen Kontext
betrachtet werden mussen. Dieser Titel erschien in der Zeit vor
1945 und wird daher in seiner zeittypischen politisch-ideologischen
Ausrichtung vom Verlag nicht beworben.
A beginner's guide to storing, managing, and analyzing data with
the updated features of Elastic 7.0 Key Features Gain access to new
features and updates introduced in Elastic Stack 7.0 Grasp the
fundamentals of Elastic Stack including Elasticsearch, Logstash,
and Kibana Explore useful tips for using Elastic Cloud and
deploying Elastic Stack in production environments Book
DescriptionThe Elastic Stack is a powerful combination of tools for
techniques such as distributed search, analytics, logging, and
visualization of data. Elastic Stack 7.0 encompasses new features
and capabilities that will enable you to find unique insights into
analytics using these techniques. This book will give you a
fundamental understanding of what the stack is all about, and help
you use it efficiently to build powerful real-time data processing
applications. The first few sections of the book will help you
understand how to set up the stack by installing tools, and
exploring their basic configurations. You'll then get up to speed
with using Elasticsearch for distributed searching and analytics,
Logstash for logging, and Kibana for data visualization. As you
work through the book, you will discover the technique of creating
custom plugins using Kibana and Beats. This is followed by coverage
of the Elastic X-Pack, a useful extension for effective security
and monitoring. You'll also find helpful tips on how to use Elastic
Cloud and deploy Elastic Stack in production environments. By the
end of this book, you'll be well versed with the fundamental
Elastic Stack functionalities and the role of each component in the
stack to solve different data processing problems. What you will
learn Install and configure an Elasticsearch architecture Solve the
full-text search problem with Elasticsearch Discover powerful
analytics capabilities through aggregations using Elasticsearch
Build a data pipeline to transfer data from a variety of sources
into Elasticsearch for analysis Create interactive dashboards for
effective storytelling with your data using Kibana Learn how to
secure, monitor and use Elastic Stack's alerting and reporting
capabilities Take applications to an on-premise or cloud-based
production environment with Elastic Stack Who this book is forThis
book is for entry-level data professionals, software engineers,
e-commerce developers, and full-stack developers who want to learn
about Elastic Stack and how the real-time processing and search
engine works for business analytics and enterprise search
applications. Previous experience with Elastic Stack is not
required, however knowledge of data warehousing and database
concepts will be helpful.
Learn how to gain insights from your data as well as machine
learning and become a presentation pro who can create interactive
dashboards Key Features Enhance your presentation skills by
implementing engaging data storytelling and visualization
techniques Learn the basics of machine learning and easily apply
machine learning models to your data Improve productivity by
automating your data processes Book DescriptionData Analytics Made
Easy is an accessible beginner's guide for anyone working with
data. The book interweaves four key elements: Data visualizations
and storytelling - Tired of people not listening to you and
ignoring your results? Don't worry; chapters 7 and 8 show you how
to enhance your presentations and engage with your managers and
co-workers. Learn to create focused content with a well-structured
story behind it to captivate your audience. Automating your data
workflows - Improve your productivity by automating your data
analysis. This book introduces you to the open-source platform,
KNIME Analytics Platform. You'll see how to use this no-code and
free-to-use software to create a KNIME workflow of your data
processes just by clicking and dragging components. Machine
learning - Data Analytics Made Easy describes popular machine
learning approaches in a simplified and visual way before
implementing these machine learning models using KNIME. You'll not
only be able to understand data scientists' machine learning
models; you'll be able to challenge them and build your own.
Creating interactive dashboards - Follow the book's simple
methodology to create professional-looking dashboards using
Microsoft Power BI, giving users the capability to slice and dice
data and drill down into the results. What you will learn
Understand the potential of data and its impact on your business
Import, clean, transform, combine data feeds, and automate your
processes Influence business decisions by learning to create
engaging presentations Build real-world models to improve
profitability, create customer segmentation, automate and improve
data reporting, and more Create professional-looking and
business-centric visuals and dashboards Open the lid on the black
box of AI and learn about and implement supervised and unsupervised
machine learning models Who this book is forThis book is for
beginners who work with data and those who need to know how to
interpret their business/customer data. The book also covers the
high-level concepts of data workflows, machine learning, data
storytelling, and visualizations, which are useful for managers. No
previous math, statistics, or computer science knowledge is
required.
|
You may like...
React Quickly
Azat Mardan
Paperback
R1,078
Discovery Miles 10 780
|