|
|
Showing 1 - 3 of
3 matches in All Departments
A practical guide for solving complex data processing challenges by
applying the best optimizations techniques in Apache Spark. Key
Features Learn about the core concepts and the latest developments
in Apache Spark Master writing efficient big data applications with
Spark's built-in modules for SQL, Streaming, Machine Learning and
Graph analysis Get introduced to a variety of optimizations based
on the actual experience Book DescriptionApache Spark is a flexible
framework that allows processing of batch and real-time data. Its
unified engine has made it quite popular for big data use cases.
This book will help you to get started with Apache Spark 2.0 and
write big data applications for a variety of use cases. It will
also introduce you to Apache Spark - one of the most popular Big
Data processing frameworks. Although this book is intended to help
you get started with Apache Spark, but it also focuses on
explaining the core concepts. This practical guide provides a quick
start to the Spark 2.0 architecture and its components. It teaches
you how to set up Spark on your local machine. As we move ahead,
you will be introduced to resilient distributed datasets (RDDs) and
DataFrame APIs, and their corresponding transformations and
actions. Then, we move on to the life cycle of a Spark application
and learn about the techniques used to debug slow-running
applications. You will also go through Spark's built-in modules for
SQL, streaming, machine learning, and graph analysis. Finally, the
book will lay out the best practices and optimization techniques
that are key for writing efficient Spark applications. By the end
of this book, you will have a sound fundamental understanding of
the Apache Spark framework and you will be able to write and
optimize Spark applications. What you will learn Learn core
concepts such as RDDs, DataFrames, transformations, and more Set up
a Spark development environment Choose the right APIs for your
applications Understand Spark's architecture and the execution flow
of a Spark application Explore built-in modules for SQL, streaming,
ML, and graph analysis Optimize your Spark job for better
performance Who this book is forIf you are a big data enthusiast
and love processing huge amount of data, this book is for you. If
you are data engineer and looking for the best optimization
techniques for your Spark applications, then you will find this
book helpful. This book also helps data scientists who want to
implement their machine learning algorithms in Spark. You need to
have a basic understanding of any one of the programming languages
such as Scala, Python or Java.
Easy, hands-on recipes to help you understand Hive and its
integration with frameworks that are used widely in today's big
data world About This Book * Grasp a complete reference of
different Hive topics. * Get to know the latest recipes in
development in Hive including CRUD operations * Understand Hive
internals and integration of Hive with different frameworks used in
today's world. Who This Book Is For The book is intended for those
who want to start in Hive or who have basic understanding of Hive
framework. Prior knowledge of basic SQL command is also required
What You Will Learn * Learn different features and offering on the
latest Hive * Understand the working and structure of the Hive
internals * Get an insight on the latest development in Hive
framework * Grasp the concepts of Hive Data Model * Master the key
concepts like Partition, Buckets and Statistics * Know how to
integrate Hive with other frameworks such as Spark, Accumulo, etc
In Detail Hive was developed by Facebook and later open sourced in
Apache community. Hive provides SQL like interface to run queries
on Big Data frameworks. Hive provides SQL like syntax also called
as HiveQL that includes all SQL capabilities like analytical
functions which are the need of the hour in today's Big Data world.
This book provides you easy installation steps with different types
of metastores supported by Hive. This book has simple and easy to
learn recipes for configuring Hive clients and services. You would
also learn different Hive optimizations including Partitions and
Bucketing. The book also covers the source code explanation of
latest Hive version. Hive Query Language is being used by other
frameworks including spark. Towards the end you will cover
integration of Hive with these frameworks. Style and approach
Starting with the basics and covering the core concepts with the
practical usage, this book is a complete guide to learn and explore
Hive offerings.
Moving beyond MapReduce - learn resource management and big data
processing using YARN About This Book * Deep dive into YARN
components, schedulers, life cycle management and security
architecture * Create your own Hadoop-YARN applications and
integrate big data technologies with YARN * Step-by-step guide to
provision, manage, and monitor Hadoop-YARN clusters with ease Who
This Book Is For This book is intended for those who want to
understand what YARN is and how to efficiently use it for the
resource management of large clusters. For cluster administrators,
this book gives a detailed explanation of provisioning and managing
YARN clusters. If you are a Java developer or an open source
contributor, this book will help you to drill down the YARN
architecture, write your own YARN applications and understand the
application execution phases. This book will also help big data
engineers explore YARN integration with real-time analytics
technologies such as Spark and Storm. What You Will Learn * Explore
YARN features and offerings * Manage big data clusters efficiently
using the YARN framework * Create single as well as multi-node
Hadoop-YARN clusters on Linux machines * Understand YARN components
and their administration * Gain insights into application execution
flow over a YARN cluster * Write your own distributed application
and execute it over YARN cluster * Work with schedulers and queues
for efficient scheduling of applications * Integrate big data
projects like Spark and Storm with YARN In Detail Today enterprises
generate huge volumes of data. In order to provide effective
services and to make smarter and more intelligent decisions from
these huge volumes of data, enterprises use big-data analytics. In
recent years, Hadoop has been used for massive data storage and
efficient distributed processing of data. The Yet Another Resource
Negotiator (YARN) framework solves the design problems related to
resource management faced by the Hadoop 1.x framework by providing
a more scalable, efficient, flexible, and highly available resource
management framework for distributed data processing. This book
starts with an overview of the YARN features and explains how YARN
provides a business solution for growing big data needs. You will
learn to provision and manage single, as well as multi-node,
Hadoop-YARN clusters in the easiest way. You will walk through the
YARN administration, life cycle management, application execution,
REST APIs, schedulers, security framework and so on. You will gain
insights about the YARN components and features such as
ResourceManager, NodeManager, ApplicationMaster, Container,
Timeline Server, High Availability, Resource Localisation and so
on. The book explains Hadoop-YARN commands and the configurations
of components and explores topics such as High Availability,
Resource Localization and Log aggregation. You will then be ready
to develop your own ApplicationMaster and execute it over a
Hadoop-YARN cluster. Towards the end of the book, you will learn
about the security architecture and integration of YARN with big
data technologies like Spark and Storm. This book promises
conceptual as well as practical knowledge of resource management
using YARN. Style and approach Starting with the basics and
covering the core concepts with the practical usage, this tutorial
is a complete guide to learn and explore YARN offerings.
|
You may like...
Loot
Nadine Gordimer
Paperback
(2)
R367
R340
Discovery Miles 3 400
Rare
Selena Gomez
CD
R138
Discovery Miles 1 380
|