Books > Computing & IT > Applications of computing > Artificial intelligence
|
Buy Now
Hands-On Big Data Analytics with PySpark - Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs (Paperback)
Loot Price: R719
Discovery Miles 7 190
|
|
Hands-On Big Data Analytics with PySpark - Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs (Paperback)
Expected to ship within 10 - 15 working days
|
Use PySpark to easily crush messy data at-scale and discover proven
techniques to create testable, immutable, and easily parallelizable
Spark jobs Key Features Work with large amounts of agile data using
distributed datasets and in-memory caching Source data from all
popular data hosting platforms, such as HDFS, Hive, JSON, and S3
Employ the easy-to-use PySpark API to deploy big data Analytics for
production Book DescriptionApache Spark is an open source
parallel-processing framework that has been around for quite some
time now. One of the many uses of Apache Spark is for data
analytics applications across clustered computers. In this book,
you will not only learn how to use Spark and the Python API to
create high-performance analytics with big data, but also discover
techniques for testing, immunizing, and parallelizing Spark jobs.
You will learn how to source data from all popular data hosting
platforms, including HDFS, Hive, JSON, and S3, and deal with large
datasets with PySpark to gain practical big data experience. This
book will help you work on prototypes on local machines and
subsequently go on to handle messy data in production and at scale.
This book covers installing and setting up PySpark, RDD operations,
big data cleaning and wrangling, and aggregating and summarizing
data into useful reports. You will also learn how to implement some
practical and proven techniques to improve certain aspects of
programming and administration in Apache Spark. By the end of the
book, you will be able to build big data analytical solutions using
the various PySpark offerings and also optimize them effectively.
What you will learn Get practical big data experience while working
on messy datasets Analyze patterns with Spark SQL to improve your
business intelligence Use PySpark's interactive shell to speed up
development time Create highly concurrent Spark programs by
leveraging immutability Discover ways to avoid the most expensive
operation in the Spark API: the shuffle operation Re-design your
jobs to use reduceByKey instead of groupBy Create robust processing
pipelines by testing Apache Spark jobs Who this book is forThis
book is for developers, data scientists, business analysts, or
anyone who needs to reliably analyze large amounts of large-scale,
real-world data. Whether you're tasked with creating your company's
business intelligence function or creating great data platforms for
your machine learning models, or are looking to use code to magnify
the impact of your business, this book is for you.
General
Is the information for this product incomplete, wrong or inappropriate?
Let us know about it.
Does this product have an incorrect or missing image?
Send us a new image.
Is this product missing categories?
Add more categories.
Review This Product
No reviews yet - be the first to create one!
|
You might also like..
|
Email address subscribed successfully.
A activation email has been sent to you.
Please click the link in that email to activate your subscription.