|
Showing 1 - 3 of
3 matches in All Departments
Centralized data warehouses, the long-time defacto standard for
housing data for analytics, are rapidly giving way to multi-faceted
cloud data platforms. Companies that embrace modern cloud data
platforms benefit from an integrated view of their business using
all of their data and can take advantage of advanced analytic
practices to drive predictions and as yet unimagined data services.
Designing Cloud Data Platforms is an hands-on guide to envisioning
and designing a modern scalable data platform that takes full
advantage of the flexibility of the cloud. As you read, you'll
learn the core components of a cloud data platform design, along
with the role of key technologies like Spark and Kafka Streams.
You'll also explore setting up processes to manage cloud-based
data, keep it secure, and using advanced analytic and BI tools to
analyse it. about the technologyAccess to affordable, dependable,
serverless cloud services has revolutionized the way organizations
can approach data management, and companies both big and small are
raring to migrate to the cloud. But without a properly designed
data platform, data in the cloud can remain just as siloed and
inaccessible as it is today for most organizations. Designing Cloud
Data Platforms lays out the principles of a well-designed platform
that uses the scalable resources of the public cloud to manage all
of an organization's data, and present it as useful business
insights. about the bookIn Designing Cloud Data Platforms, you'll
learn how to integrate data from multiple sources into a single,
cloud-based, modern data platform. Drawing on their real-world
experiences designing cloud data platforms for dozens of
organizations, cloud data experts Danil Zburivsky and Lynda Partner
take you through a six-layer approach to creating cloud data
platforms that maximizes flexibility and manageability and reduces
costs. Starting with foundational principles, you'll learn how to
get data into your platform from different databases, files, and
APIs, the essential practices for organizing and processing that
raw data, and how to best take advantage of the services offered by
major cloud vendors. As you progress past the basics you'll take a
deep dive into advanced topics to get the most out of your data
platform, including real-time data management, machine learning
analytics, schema management, and more. what's inside The tools of
different public cloud for implementing data platforms Best
practices for managing structured and unstructured data sets
Machine learning tools that can be used on top of the cloud Cost
optimization techniques about the readerFor data professionals
familiar with the basics of cloud computing and distributed data
processing systems like Hadoop and Spark. about the authors Danil
Zburivsky has over 10 years experience designing and supporting
large-scale data infrastructure for enterprises across the globe.
Lynda Partner is the VP of Analytics-as-a-Service at Pythian, and
has been on the business side of data for over 20 years.
Understand the complexities of modern-day data engineering
platforms and explore strategies to deal with them with the help of
use case scenarios led by an industry expert in big data Key
Features Become well-versed with the core concepts of Apache Spark
and Delta Lake for building data platforms Learn how to ingest,
process, and analyze data that can be later used for training
machine learning models Understand how to operationalize data
models in production using curated data Book DescriptionIn the
world of ever-changing data and schemas, it is important to build
data pipelines that can auto-adjust to changes. This book will help
you build scalable data platforms that managers, data scientists,
and data analysts can rely on. Starting with an introduction to
data engineering, along with its key concepts and architectures,
this book will show you how to use Microsoft Azure Cloud services
effectively for data engineering. You'll cover data lake design
patterns and the different stages through which the data needs to
flow in a typical data lake. Once you've explored the main features
of Delta Lake to build data lakes with fast performance and
governance in mind, you'll advance to implementing the lambda
architecture using Delta Lake. Packed with practical examples and
code snippets, this book takes you through real-world examples
based on production scenarios faced by the author in his 10 years
of experience working with big data. Finally, you'll cover data
lake deployment strategies that play an important role in
provisioning the cloud resources and deploying the data pipelines
in a repeatable and continuous way. By the end of this data
engineering book, you'll know how to effectively deal with
ever-changing data and create scalable data pipelines to streamline
data science, ML, and artificial intelligence (AI) tasks. What you
will learn Discover the challenges you may face in the data
engineering world Add ACID transactions to Apache Spark using Delta
Lake Understand effective design strategies to build
enterprise-grade data lakes Explore architectural and design
patterns for building efficient data ingestion pipelines
Orchestrate a data pipeline for preprocessing data using Apache
Spark and Delta Lake APIs Automate deployment and monitoring of
data pipelines in production Get to grips with securing,
monitoring, and managing data pipelines models efficiently Who this
book is forThis book is for aspiring data engineers and data
analysts who are new to the world of data engineering and are
looking for a practical guide to building scalable data platforms.
If you already work with PySpark and want to use Delta Lake for
data engineering, you'll find this book useful. Basic knowledge of
Python, Spark, and SQL is expected.
This book is a step-by-step tutorial filled with practical examples
which will show you how to build and manage a Hadoop cluster along
with its intricacies. This book is ideal for database
administrators, data engineers, and system administrators, and it
will act as an invaluable reference if you are planning to use the
Hadoop platform in your organization. It is expected that you have
basic Linux skills since all the examples in this book use this
operating system. It is also useful if you have access to test
hardware or virtual machines to be able to follow the examples in
the book.
|
|