Apache Spark's speed, ease of use, sophisticated analytics, and
multilanguage support makes practical knowledge of this
cluster-computing framework a required skill for data engineers and
data scientists. With this hands-on guide, anyone looking for an
introduction to Spark will learn practical algorithms and examples
using PySpark. In each chapter, author Mahmoud Parsian shows you
how to solve a data problem with a set of Spark transformations and
algorithms. You'll learn how to tackle problems involving ETL,
design patterns, machine learning algorithms, data partitioning,
and genomics analysis. Each detailed recipe includes PySpark
algorithms using the PySpark driver and shell script. With this
book, you will: Learn how to select Spark transformations for
optimized solutions Explore powerful transformations and reductions
including reduceByKey(), combineByKey(), and mapPartitions()
Understand data partitioning for optimized queries Build and apply
a model using PySpark design patterns Apply motif-finding
algorithms to graph data Analyze graph data by using the
GraphFrames API Apply PySpark algorithms to clinical and genomics
data Learn how to use and apply feature engineering in ML
algorithms Understand and use practical and pragmatic data design
patterns
General
Imprint: |
O'Reilly Media
|
Country of origin: |
United States |
Release date: |
April 2022 |
Authors: |
Mahmoud Parsian
|
Dimensions: |
232 x 178 x 26mm (L x W x T) |
Format: |
Paperback
|
Pages: |
500 |
ISBN-13: |
978-1-4920-8238-5 |
Categories: |
Books
|
LSN: |
1-4920-8238-4 |
Barcode: |
9781492082385 |
Is the information for this product incomplete, wrong or inappropriate?
Let us know about it.
Does this product have an incorrect or missing image?
Send us a new image.
Is this product missing categories?
Add more categories.
Review This Product
No reviews yet - be the first to create one!