|
Showing 1 - 3 of
3 matches in All Departments
Combine the power of Apache Spark and Python to build effective big
data applications Key Features Perform effective data processing,
machine learning, and analytics using PySpark Overcome challenges
in developing and deploying Spark solutions using Python Explore
recipes for efficiently combining Python and Apache Spark to
process data Book DescriptionApache Spark is an open source
framework for efficient cluster computing with a strong interface
for data parallelism and fault tolerance. The PySpark Cookbook
presents effective and time-saving recipes for leveraging the power
of Python and putting it to use in the Spark ecosystem. You'll
start by learning the Apache Spark architecture and how to set up a
Python environment for Spark. You'll then get familiar with the
modules available in PySpark and start using them effortlessly. In
addition to this, you'll discover how to abstract data with RDDs
and DataFrames, and understand the streaming capabilities of
PySpark. You'll then move on to using ML and MLlib in order to
solve any problems related to the machine learning capabilities of
PySpark and use GraphFrames to solve graph-processing problems.
Finally, you will explore how to deploy your applications to the
cloud using the spark-submit command. By the end of this book, you
will be able to use the Python API for Apache Spark to solve any
problems associated with building data-intensive applications. What
you will learn Configure a local instance of PySpark in a virtual
environment Install and configure Jupyter in local and multi-node
environments Create DataFrames from JSON and a dictionary using
pyspark.sql Explore regression and clustering models available in
the ML module Use DataFrames to transform data used for modeling
Connect to PubNub and perform aggregations on streams Who this book
is forThe PySpark Cookbook is for you if you are a Python developer
looking for hands-on recipes for using the Apache Spark 2.x
ecosystem in the best possible way. A thorough understanding of
Python (and some familiarity with Spark) will help you get the best
out of the book.
Build data-intensive applications locally and deploy at scale using
the combined powers of Python and Spark 2.0 About This Book * Learn
why and how you can efficiently use Python to process data and
build machine learning models in Apache Spark 2.0 * Develop and
deploy efficient, scalable real-time Spark solutions * Take your
understanding of using Spark with Python to the next level with
this jump start guide Who This Book Is For If you are a Python
developer who wants to learn about the Apache Spark 2.0 ecosystem,
this book is for you. A firm understanding of Python is expected to
get the best out of the book. Familiarity with Spark would be
useful, but is not mandatory. What You Will Learn * Learn about
Apache Spark and the Spark 2.0 architecture * Build and interact
with Spark DataFrames using Spark SQL * Learn how to solve graph
and deep learning problems using GraphFrames and TensorFrames
respectively * Read, transform, and understand data and use it to
train machine learning models * Build machine learning models with
MLlib and ML * Learn how to submit your applications
programmatically using spark-submit * Deploy locally built
applications to a cluster In Detail Apache Spark is an open source
framework for efficient cluster computing with a strong interface
for data parallelism and fault tolerance. This book will show you
how to leverage the power of Python and put it to use in the Spark
ecosystem. You will start by getting a firm understanding of the
Spark 2.0 architecture and how to set up a Python environment for
Spark. You will get familiar with the modules available in PySpark.
You will learn how to abstract data with RDDs and DataFrames and
understand the streaming capabilities of PySpark. Also, you will
get a thorough overview of machine learning capabilities of PySpark
using ML and MLlib, graph processing using GraphFrames, and
polyglot persistence using Blaze. Finally, you will learn how to
deploy your applications to the cloud using the spark-submit
command. By the end of this book, you will have established a firm
understanding of the Spark Python API and how it can be used to
build data-intensive applications. Style and approach This book
takes a very comprehensive, step-by-step approach so you understand
how the Spark ecosystem can be used with Python to develop
efficient, scalable solutions. Every chapter is standalone and
written in a very easy-to-understand manner, with a focus on both
the hows and the whys of each concept.
Over 60 practical recipes on data exploration and analysis About
This Book * Clean dirty data, extract accurate information, and
explore the relationships between variables * Forecast the output
of an electric plant and the water flow of American rivers using
pandas, NumPy, Statsmodels, and scikit-learn * Find and extract the
most important features from your dataset using the most efficient
Python libraries Who This Book Is For If you are a beginner or
intermediate-level professional who is looking to solve your
day-to-day, analytical problems with Python, this book is for you.
Even with no prior programming and data analytics experience, you
will be able to finish each recipe and learn while doing so. What
You Will Learn * Read, clean, transform, and store your data usng
Pandas and OpenRefine * Understand your data and explore the
relationships between variables using Pandas and D3.js * Explore a
variety of techniques to classify and cluster outbound marketing
campaign calls data of a bank using Pandas, mlpy, NumPy, and
Statsmodels * Reduce the dimensionality of your dataset and extract
the most important features with pandas, NumPy, and mlpy * Predict
the output of a power plant with regression models and forecast
water flow of American rivers with time series methods using
pandas, NumPy, Statsmodels, and scikit-learn * Explore social
interactions and identify fraudulent activities with graph theory
concepts using NetworkX and Gephi * Scrape Internet web pages using
urlib and BeautifulSoup and get to know natural language processing
techniques to classify movies ratings using NLTK * Study simulation
techniques in an example of a gas station with agent-based modeling
In Detail Data analysis is the process of systematically applying
statistical and logical techniques to describe and illustrate,
condense and recap, and evaluate data. Its importance has been most
visible in the sector of information and communication
technologies. It is an employee asset in almost all economy
sectors. This book provides a rich set of independent recipes that
dive into the world of data analytics and modeling using a variety
of approaches, tools, and algorithms. You will learn the basics of
data handling and modeling, and will build your skills gradually
toward more advanced topics such as simulations, raw text
processing, social interactions analysis, and more. First, you will
learn some easy-to-follow practical techniques on how to read,
write, clean, reformat, explore, and understand your data-arguably
the most time-consuming (and the most important) tasks for any data
scientist. In the second section, different independent recipes
delve into intermediate topics such as classification, clustering,
predicting, and more. With the help of these easy-to-follow
recipes, you will also learn techniques that can easily be expanded
to solve other real-life problems such as building recommendation
engines or predictive models. In the third section, you will
explore more advanced topics: from the field of graph theory
through natural language processing, discrete choice modeling to
simulations. You will also get to expand your knowledge on
identifying fraud origin with the help of a graph, scrape Internet
websites, and classify movies based on their reviews. By the end of
this book, you will be able to efficiently use the vast array of
tools that the Python environment has to offer. Style and approach
This hands-on recipe guide is divided into three sections that
tackle and overcome real-world data modeling problems faced by data
analysts/scientist in their everyday work. Each independent recipe
is written in an easy-to-follow and step-by-step fashion.
|
You may like...
Loot
Nadine Gordimer
Paperback
(2)
R205
R168
Discovery Miles 1 680
Loot
Nadine Gordimer
Paperback
(2)
R205
R168
Discovery Miles 1 680
|