|
Showing 1 - 4 of
4 matches in All Departments
Modern systems contain multicore CPUs and GPUs that have the
potential for parallel computing. But many scientific Python tools
were not designed to leverage this parallelism. With this short but
thorough resource, data scientists and Python programmers will
learn how the Dask open source library for parallel computing
provides APIs that make it easy to parallelize PyData libraries
including NumPy, pandas, and scikit-learn. Authors Holden Karau and
Mika Kimmins show you how to use Dask computations in local systems
and then scale to the cloud for heavier workloads. This practical
book explains why Dask is popular among industry experts and
academics and is used by organizations that include Walmart,
Capital One, Harvard Medical School, and NASA. With this book,
you'll learn: What Dask is, where you can use it, and how it
compares with other tools How to use Dask for batch data parallel
processing Key distributed system concepts for working with Dask
Methods for using Dask with higher-level APIs and building blocks
How to work with integrated libraries such as scikit-learn, pandas,
and PyTorch How to use Dask with GPUs
Serverless computing enables developers to concentrate solely on
their applications rather than worry about where they've been
deployed. With the Ray general-purpose serverless implementation in
Python, programmers and data scientists can hide servers, implement
stateful applications, support direct communication between tasks,
and access hardware accelerators. In this book, authors Holden
Karau and Boris Lublinsky show you how to scale existing Python
applications and pipelines, allowing you to stay in the Python
ecosystem while avoiding single points of failure and manual
scheduling. If your data processing has grown beyond what a single
computer can handle, this book is for you. Written by experienced
software architecture practitioners, Scaling Python with Ray is
ideal for software architects and developers eager to explore
successful case studies and learn more about decision and
measurement effectiveness. This book covers distributed processing
(the pure Python implementation of serverless) and shows you how
to: Implement stateful applications with Ray actors Build workflow
management in Ray Use Ray as a unified platform for batch and
streaming Implement advanced data processing with Ray Apply
microservices with Ray platform Implement reliable Ray applications
If you're training a machine learning model but aren't sure how to
put it into production, this book will get you there. Kubeflow
provides a collection of cloud native tools for different stages of
a model's lifecycle, from data exploration, feature preparation,
and model training to model serving. This guide helps data
scientists build production-grade machine learning implementations
with Kubeflow and shows data engineers how to make models scalable
and reliable. Using examples throughout the book, authors Holden
Karau, Trevor Grant, Ilan Filonenko, Richard Liu, and Boris
Lublinsky explain how to use Kubeflow to train and serve your
machine learning models on top of Kubernetes in the cloud or in a
development environment on-premises. Understand Kubeflow's design,
core components, and the problems it solves Learn how to set up
Kubeflow on a cloud provider or on an in-house cluster Train models
using Kubeflow with popular tools including scikit-learn,
TensorFlow, and Apache Spark Learn how to add custom stages such as
serving and prediction Keep your model up-to-date with Kubeflow
Pipelines Understand how to validate machine learning pipelines
Apache Spark is amazing when everything clicks. But if you haven't
seen the performance improvements you expected, or still don't feel
confident enough to use Spark in production, this practical book is
for you. Authors Holden Karau and Rachel Warren demonstrate
performance optimizations to help your Spark queries run faster and
handle larger data sizes, while using fewer resources. Ideal for
software engineers, data engineers, developers, and system
administrators working with large-scale data applications, this
book describes techniques that can reduce data infrastructure costs
and developer hours. Not only will you gain a more comprehensive
understanding of Spark, you'll also learn how to make it sing. With
this book, you'll explore: How Spark SQL's new interfaces improve
performance over SQL's RDD data structure The choice between data
joins in Core Spark and Spark SQL Techniques for getting the most
out of standard RDD transformations How to work around performance
issues in Spark's key/value pair paradigm Writing high-performance
Spark code without Scala or the JVM How to test for functionality
and performance when applying suggested improvements Using Spark
MLlib and Spark ML machine learning libraries Spark's Streaming
components and external community packages
|
You may like...
Loot
Nadine Gordimer
Paperback
(2)
R398
R330
Discovery Miles 3 300
Loot
Nadine Gordimer
Paperback
(2)
R398
R330
Discovery Miles 3 300
|