|
Showing 1 - 5 of
5 matches in All Departments
Discover the capabilities of PySpark and its application in the
realm of data science. This comprehensive guide with hand-picked
examples of daily use cases will walk you through the end-to-end
predictive model-building cycle with the latest techniques and
tricks of the trade. Applied Data Science Using PySpark is divided
unto six sections which walk you through the book. In section 1,
you start with the basics of PySpark focusing on data manipulation.
We make you comfortable with the language and then build upon it to
introduce you to the mathematical functions available off the
shelf. In section 2, you will dive into the art of variable
selection where we demonstrate various selection techniques
available in PySpark. In section 3, we take you on a journey
through machine learning algorithms, implementations, and
fine-tuning techniques. We will also talk about different
validation metrics and how to use them for picking the best models.
Sections 4 and 5 go through machine learning pipelines and various
methods available to operationalize the model and serve it through
Docker/an API. In the final section, you will cover reusable
objects for easy experimentation and learn some tricks that can
help you optimize your programs and machine learning pipelines. By
the end of this book, you will have seen the flexibility and
advantages of PySpark in data science applications. This book is
recommended to those who want to unleash the power of parallel
computing by simultaneously working with big datasets. What You
Will Learn Build an end-to-end predictive model Implement multiple
variable selection techniques Operationalize models Master multiple
algorithms and implementations Who This Book is For Data scientists
and machine learning and deep learning engineers who want to learn
and use PySpark for real-time analysis of streaming data.
Integrate MLOps principles into existing or future projects using
MLFlow, operationalize your models, and deploy them in AWS
SageMaker, Google Cloud, and Microsoft Azure. This book guides you
through the process of data analysis, model construction, and
training. The authors begin by introducing you to basic data
analysis on a credit card data set and teach you how to analyze the
features and their relationships to the target variable. You will
learn how to build logistic regression models in scikit-learn and
PySpark, and you will go through the process of hyperparameter
tuning with a validation data set. You will explore three different
deployment setups of machine learning models with varying levels of
automation to help you better understand MLOps. MLFlow is covered
and you will explore how to integrate MLOps into your existing
code, allowing you to easily track metrics, parameters, graphs, and
models. You will be guided through the process of deploying and
querying your models with AWS SageMaker, Google Cloud, and
Microsoft Azure. And you will learn how to integrate your MLOps
setups using Databricks. What You Will Learn Perform basic data
analysis and construct models in scikit-learn and PySpark Train,
test, and validate your models (hyperparameter tuning) Know what
MLOps is and what an ideal MLOps setup looks like Easily integrate
MLFlow into your existing or future projects Deploy your models and
perform predictions with them on the cloud Who This Book Is For
Data scientists and machine learning engineers who want to learn
MLOps and know how to operationalize their models
Explore big data concepts, platforms, analytics, and their
applications using the power of Hadoop 3 Key Features Learn Hadoop
3 to build effective big data analytics solutions on-premise and on
cloud Integrate Hadoop with other big data tools such as R, Python,
Apache Spark, and Apache Flink Exploit big data using Hadoop 3 with
real-world examples Book DescriptionApache Hadoop is the most
popular platform for big data processing, and can be combined with
a host of other big data tools to build powerful analytics
solutions. Big Data Analytics with Hadoop 3 shows you how to do
just that, by providing insights into the software as well as its
benefits with the help of practical examples. Once you have taken a
tour of Hadoop 3's latest features, you will get an overview of
HDFS, MapReduce, and YARN, and how they enable faster, more
efficient big data processing. You will then move on to learning
how to integrate Hadoop with the open source tools, such as Python
and R, to analyze and visualize data and perform statistical
computing on big data. As you get acquainted with all this, you
will explore how to use Hadoop 3 with Apache Spark and Apache Flink
for real-time data analytics and stream processing. In addition to
this, you will understand how to use Hadoop to build analytics
solutions on the cloud and an end-to-end pipeline to perform big
data analysis using practical use cases. By the end of this book,
you will be well-versed with the analytical capabilities of the
Hadoop ecosystem. You will be able to build powerful solutions to
perform big data analytics and get insight effortlessly. What you
will learn Explore the new features of Hadoop 3 along with HDFS,
YARN, and MapReduce Get well-versed with the analytical
capabilities of Hadoop ecosystem using practical examples Integrate
Hadoop with R and Python for more efficient big data processing
Learn to use Hadoop with Apache Spark and Apache Flink for
real-time data analytics Set up a Hadoop cluster on AWS cloud
Perform big data analytics on AWS using Elastic Map Reduce Who this
book is forBig Data Analytics with Hadoop 3 is for you if you are
looking to build high-performance analytics solutions for your
enterprise or business using Hadoop 3's powerful features, or
you're new to big data analytics. A basic understanding of the Java
programming language is required.
Build efficient data flow and machine learning programs with this
flexible, multi-functional open-source cluster-computing framework
Key Features Master the art of real-time big data processing and
machine learning Explore a wide range of use-cases to analyze large
data Discover ways to optimize your work by using many features of
Spark 2.x and Scala Book DescriptionApache Spark is an in-memory,
cluster-based data processing system that provides a wide range of
functionalities such as big data processing, analytics, machine
learning, and more. With this Learning Path, you can take your
knowledge of Apache Spark to the next level by learning how to
expand Spark's functionality and building your own data flow and
machine learning programs on this platform. You will work with the
different modules in Apache Spark, such as interactive querying
with Spark SQL, using DataFrames and datasets, implementing
streaming analytics with Spark Streaming, and applying machine
learning and deep learning techniques on Spark using MLlib and
various external tools. By the end of this elaborately designed
Learning Path, you will have all the knowledge you need to master
Apache Spark, and build your own big data processing and analytics
pipeline quickly and without any hassle. This Learning Path
includes content from the following Packt products: Mastering
Apache Spark 2.x by Romeo Kienzler Scala and Spark for Big Data
Analytics by Md. Rezaul Karim, Sridhar Alla Apache Spark 2.x
Machine Learning Cookbook by Siamak Amirghodsi, Meenakshi
Rajendran, Broderick Hall, Shuen MeiCookbook What you will learn
Get to grips with all the features of Apache Spark 2.x Perform
highly optimized real-time big data processing Use ML and DL
techniques with Spark MLlib and third-party tools Analyze
structured and unstructured data using SparkSQL and GraphX
Understand tuning, debugging, and monitoring of big data
applications Build scalable and fault-tolerant streaming
applications Develop scalable recommendation engines Who this book
is forIf you are an intermediate-level Spark developer looking to
master the advanced capabilities and use-cases of Apache Spark 2.x,
this Learning Path is ideal for you. Big data professionals who
want to learn how to integrate and use the features of Apache Spark
and build a strong big data pipeline will also find this Learning
Path useful. To grasp the concepts explained in this Learning Path,
you must know the fundamentals of Apache Spark and Scala.
Harness the power of Scala to program Spark and analyze tonnes of
data in the blink of an eye! About This Book * Learn Scala's
sophisticated type system that combines Functional Programming and
object-oriented concepts * Work on a wide array of applications,
from simple batch jobs to stream processing and machine learning *
Explore the most common as well as some complex use-cases to
perform large-scale data analysis with Spark Who This Book Is For
Anyone who wishes to learn how to perform data analysis by
harnessing the power of Spark will find this book extremely useful.
No knowledge of Spark or Scala is assumed, although prior
programming experience (especially with other JVM languages) will
be useful to pick up concepts quicker. What You Will Learn *
Understand object-oriented & functional programming concepts of
Scala * In-depth understanding of Scala collection APIs * Work with
RDD and DataFrame to learn Spark's core abstractions * Analysing
structured and unstructured data using SparkSQL and GraphX *
Scalable and fault-tolerant streaming application development using
Spark structured streaming * Learn machine-learning best practices
for classification, regression, dimensionality reduction, and
recommendation system to build predictive models with widely used
algorithms in Spark MLlib & ML * Build clustering models to
cluster a vast amount of data * Understand tuning, debugging, and
monitoring Spark applications * Deploy Spark applications on real
clusters in Standalone, Mesos, and YARN In Detail Scala has been
observing wide adoption over the past few years, especially in the
field of data science and analytics. Spark, built on Scala, has
gained a lot of recognition and is being used widely in
productions. Thus, if you want to leverage the power of Scala and
Spark to make sense of big data, this book is for you. The first
part introduces you to Scala, helping you understand the
object-oriented and functional programming concepts needed for
Spark application development. It then moves on to Spark to cover
the basic abstractions using RDD and DataFrame. This will help you
develop scalable and fault-tolerant streaming applications by
analyzing structured and unstructured data using SparkSQL, GraphX,
and Spark structured streaming. Finally, the book moves on to some
advanced topics, such as monitoring, configuration, debugging,
testing, and deployment. You will also learn how to develop Spark
applications using SparkR and PySpark APIs, interactive data
analytics using Zeppelin, and in-memory data processing with
Alluxio. By the end of this book, you will have a thorough
understanding of Spark, and you will be able to perform full-stack
data analytics with a feel that no amount of data is too big. Style
and approach Filled with practical examples and use cases, this
book will hot only help you get up and running with Spark, but will
also take you farther down the road to becoming a data scientist.
|
You may like...
Southpaw
Jake Gyllenhaal, Forest Whitaker, …
DVD
R99
R24
Discovery Miles 240
Loot
Nadine Gordimer
Paperback
(2)
R398
R330
Discovery Miles 3 300
Loot
Nadine Gordimer
Paperback
(2)
R398
R330
Discovery Miles 3 300
|