|
Books > Computing & IT > Applications of computing > Databases > Data warehousing
A fast paced guide that will help you learn about Apache Hadoop 3
and its ecosystem Key Features Set up, configure and get started
with Hadoop to get useful insights from large data sets Work with
the different components of Hadoop such as MapReduce, HDFS and YARN
Learn about the new features introduced in Hadoop 3 Book
DescriptionApache Hadoop is a widely used distributed data
platform. It enables large datasets to be efficiently processed
instead of using one large computer to store and process the data.
This book will get you started with the Hadoop ecosystem, and
introduce you to the main technical topics, including MapReduce,
YARN, and HDFS. The book begins with an overview of big data and
Apache Hadoop. Then, you will set up a pseudo Hadoop development
environment and a multi-node enterprise Hadoop cluster. You will
see how the parallel programming paradigm, such as MapReduce, can
solve many complex data processing problems. The book also covers
the important aspects of the big data software development
lifecycle, including quality assurance and control, performance,
administration, and monitoring. You will then learn about the
Hadoop ecosystem, and tools such as Kafka, Sqoop, Flume, Pig, Hive,
and HBase. Finally, you will look at advanced topics, including
real time streaming using Apache Storm, and data analytics using
Apache Spark. By the end of the book, you will be well versed with
different configurations of the Hadoop 3 cluster. What you will
learn Store and analyze data at scale using HDFS, MapReduce and
YARN Install and configure Hadoop 3 in different modes Use Yarn
effectively to run different applications on Hadoop based platform
Understand and monitor how Hadoop cluster is managed Consume
streaming data using Storm, and then analyze it using Spark Explore
Apache Hadoop ecosystem components, such as Flume, Sqoop, HBase,
Hive, and Kafka Who this book is forAspiring Big Data professionals
who want to learn the essentials of Hadoop 3 will find this book to
be useful. Existing Hadoop users who want to get up to speed with
the new features introduced in Hadoop 3 will also benefit from this
book. Having knowledge of Java programming will be an added
advantage.
This book takes you on a fantastic journey to discover the
attributes of big data using Apache Hive. Key Features Grasp the
skills needed to write efficient Hive queries to analyze the Big
Data Discover how Hive can coexist and work with other tools within
the Hadoop ecosystem Uses practical, example-oriented scenarios to
cover all the newly released features of Apache Hive 2.3.3 Book
DescriptionIn this book, we prepare you for your journey into big
data by frstly introducing you to backgrounds in the big data
domain, alongwith the process of setting up and getting familiar
with your Hive working environment. Next, the book guides you
through discovering and transforming the values of big data with
the help of examples. It also hones your skills in using the Hive
language in an effcient manner. Toward the end, the book focuses on
advanced topics, such as performance, security, and extensions in
Hive, which will guide you on exciting adventures on this
worthwhile big data journey. By the end of the book, you will be
familiar with Hive and able to work effeciently to find solutions
to big data problems What you will learn Create and set up the Hive
environment Discover how to use Hive's definition language to
describe data Discover interesting data by joining and filtering
datasets in Hive Transform data by using Hive sorting, ordering,
and functions Aggregate and sample data in different ways Boost
Hive query performance and enhance data security in Hive Customize
Hive to your needs by using user-defined functions and integrate it
with other tools Who this book is forIf you are a data analyst,
developer, or simply someone who wants to quickly get started with
Hive to explore and analyze Big Data in Hadoop, this is the book
for you. Since Hive is an SQL-like language, some previous
experience with SQL will be useful to get the most out of this
book.
Get up and running with the Pentaho Data Integration tool using
this hands-on, easy-to-read guide About This Book * Manipulate your
data by exploring, transforming, validating, and integrating it
using Pentaho Data Integration 8 CE * A comprehensive guide
exploring the features of Pentaho Data Integration 8 CE * Connect
to any database engine, explore the databases, and perform all kind
of operations on relational databases Who This Book Is For This
book is a must-have for software developers, business intelligence
analysts, IT students, or anyone involved or interested in
developing ETL solutions. If you plan on using Pentaho Data
Integration for doing any data manipulation task, this book will
help you as well. This book is also a good starting point for data
warehouse designers, architects, or anyone who is responsible for
data warehouse projects and needs to load data into them. What You
Will Learn * Explore the features and capabilities of Pentaho Data
Integration 8 Community Edition * Install and get started with PDI
* Learn the ins and outs of Spoon, the graphical designer tool *
Learn to get data from all kind of data sources, such as plain
files, Excel spreadsheets, databases, and XML files * Use Pentaho
Data Integration to perform CRUD (create, read, update, and delete)
operations on relationaldatabases * Populate a data mart with
Pentaho Data Integration * Use Pentaho Data Integration to organize
files and folders, run daily processes, deal with errors, and more
In Detail Pentaho Data Integration(PDI) is an intuitive and
graphical environment packed with drag-and-drop design and powerful
Extract-Tranform-Load (ETL) capabilities. This book shows and
explains the new interactive features of Spoon, the revamped look
and feel, and the newest features of the tool including
transformations and jobs Executors and the invaluable Metadata
Injection capability. We begin with the installation of PDI
software and then move on to cover all the key PDI concepts. Each
of the chapter introduces new features, enabling you to gradually
get practicing with the tool. First, you will learn to do all kind
of data manipulation and work with simple plain files. Then, the
book teaches you how you can work with relational databases inside
PDI. Moreover, you will be given a primer on data warehouse
concepts and you will learn how to load data in a data warehouse.
During the course of this book, you will be familiarized with its
intuitive, graphical and drag-and-drop design environment. By the
end of this book, you will learn everything you need to know in
order to meet your data manipulation requirements. Besides, your
will be given best practices and advises for designing and
deploying your projects. Style and approach Step by step guide
filled with practical, real world scenarios and examples.
Data management and analytics simplified with Teradata Key Features
Take your understanding of Teradata to the next level and build
efficient data warehousing applications for your organization
Covers recipes on data handling, warehousing, advanced querying and
the administrative tasks in Teradata. Contains practical solutions
to tackle common (and not-so-common) problems you might encounter
in your day to day activities Book DescriptionTeradata is an
enterprise software company that develops and sells its eponymous
relational database management system (RDBMS), which is considered
to be a leading data warehousing solutions and provides data
management solutions for analytics. This book will help you get all
the practical information you need for the creation and
implementation of your data warehousing solution using Teradata.
The book begins with recipes on quickly setting up a development
environment so you can work with different types of data
structuring and manipulation function. You will tackle all problems
related to efficient querying, stored procedure searching, and
navigation techniques. Additionally, you'll master various
administrative tasks such as user and security management, workload
management, high availability, performance tuning, and monitoring.
This book is designed to take you through the best practices of
performing the real daily tasks of a Teradata DBA, and will help
you tackle any problem you might encounter in the process. What you
will learn Understand Teradata's competitive advantage over other
RDBMSs. Use SQL to process data stored in Teradata tables. Leverage
Teradata's available application utilities and parallelism to play
with large datasets Apply various performance tuning techniques to
optimize the queries. Acquire deeper knowledge and understanding of
the Teradata Architecture. Easy steps to load, archive, restore
data and implement Teradata protection features Gain confidence in
running a wide variety of Data analytics and develop applications
for the Teradata environment Who this book is forThis book is for
Database administrator's and Teradata users who are looking for a
practical, one-stop resource to solve all their problems while
handling their Teradata solution. If you are looking to learn the
basic as well as the advanced tasks involved in Teradata querying
or administration, this book will be handy. Some knowledge of
relational database concepts will be helpful to get the best out of
this book.
Leverage the power of Redis 4.x to develop, optimize and administer
your Redis solutions with ease Key Features Build, deploy and
administer high performance and scalable applications in Redis
Covers a range of important tasks - including development and
administration of Redis A practical guide that takes your
understanding of Redis to the next level Book DescriptionRedis is
considered the world's most popular key-value store database. Its
versatility and the wide variety of use cases it enables have made
it a popular choice of database for many enterprises. Based on the
latest version of Redis, this book provides both step-by-step
recipes and relevant the background information required to utilize
its features to the fullest. It covers everything from a basic
understanding of Redis data types to advanced aspects of Redis high
availability, clustering, administration, and troubleshooting. This
book will be your great companion to master all aspects of Redis.
The book starts off by installing and configuring Redis for you to
get started with ease. Moving on, all the data types and features
of Redis are introduced in detail. Next, you will learn how to
develop applications with Redis in Java, Python, and the Spring
Boot web framework. You will also learn replication tasks, which
will help you to troubleshoot replication issues. Furthermore, you
will learn the steps that need to be undertaken to ensure high
availability on your cluster and during production deployment.
Toward the end of the book, you will learn the topmost tasks that
will help you to troubleshoot your ecosystem efficiently, along
with extending Redis by using different modules. What you will
learn Install and configure your Redis instance Explore various
data types and commands in Redis Build client-side applications as
well as a Big Data framework with Redis Manage data replication and
persistence in Redis Implement high availability and data sharding
in Redis Extend Redis with Redis Module Benchmark, debug, fine-tune
and troubleshoot various issues in Redis Who this book is forThis
book is for database administrators, developers and architects who
want to tackle the common and not so common problems associated
with the different development and administration-related tasks in
Redis. A fundamental understanding of Redis is expected to get the
best out of this book.
Data Storage: Systems, Management and Security Issues begins with a
chapter comparing digital or electronic storage systems, such as
magnetic, optical, and flash, with biological data storage systems,
like DNA and human brain memory. In the main part of the chapter,
the following quantitative storage traits are discussed: data
organisation, functionality, data density, capacity, power
consumption, redundancy, integrity, access time, data transfer
rate. Afterwards, various facets of data warehouses as well as the
necessity for security measures are reviewed. Because the
significance of security tools is greater than ever before, the
pertinent strategies and economics are discussed. The final chapter
supplements this by discussing media and storage systems
reliability and confidentiality in order to make a greater claim
about storage security. Confidentiality, integrity and availability
are three aspects of security identified as ones that should be
preserved during data transmission, processing and storage.
Harness the power of SQL Server 2017 Integration Services to build
your data integration solutions with ease About This Book *
Acquaint yourself with all the newly introduced features in SQL
Server 2017 Integration Services * Program and extend your packages
to enhance their functionality * This detailed, step-by-step guide
covers everything you need to develop efficient data integration
and data transformation solutions for your organization Who This
Book Is For This book is ideal for software engineers, DW/ETL
architects, and ETL developers who need to create a new, or enhance
an existing, ETL implementation with SQL Server 2017 Integration
Services. This book would also be good for individuals who develop
ETL solutions that use SSIS and are keen to learn the new features
and capabilities in SSIS 2017. What You Will Learn * Understand the
key components of an ETL solution using SQL Server 2016-2017
Integration Services * Design the architecture of a modern ETL
solution * Have a good knowledge of the new capabilities and
features added to Integration Services * Implement ETL solutions
using Integration Services for both on-premises and Azure data *
Improve the performance and scalability of an ETL solution *
Enhance the ETL solution using a custom framework * Be able to work
on the ETL solution with many other developers and have common
design paradigms or techniques * Effectively use scripting to solve
complex data issues In Detail SQL Server Integration Services is a
tool that facilitates data extraction, consolidation, and loading
options (ETL), SQL Server coding enhancements, data warehousing,
and customizations. With the help of the recipes in this book,
you'll gain complete hands-on experience of SSIS 2017 as well as
the 2016 new features, design and development improvements
including SCD, Tuning, and Customizations. At the start, you'll
learn to install and set up SSIS as well other SQL Server resources
to make optimal use of this Business Intelligence tools. We'll
begin by taking you through the new features in SSIS 2016/2017 and
implementing the necessary features to get a modern scalable ETL
solution that fits the modern data warehouse. Through the course of
chapters, you will learn how to design and build SSIS data
warehouses packages using SQL Server Data Tools. Additionally,
you'll learn to develop SSIS packages designed to maintain a data
warehouse using the Data Flow and other control flow tasks. You'll
also be demonstrated many recipes on cleansing data and how to get
the end result after applying different transformations. Some
real-world scenarios that you might face are also covered and how
to handle various issues that you might face when designing your
packages. At the end of this book, you'll get to know all the key
concepts to perform data integration and transformation. You'll
have explored on-premises Big Data integration processes to create
a classic data warehouse, and will know how to extend the toolbox
with custom tasks and transforms. Style and approach This cookbook
follows a problem-solution approach and tackles all kinds of data
integration scenarios by using the capabilities of SQL Server 2016
Integration Services. This book is well supplemented with
screenshots, tips, and tricks. Each recipe focuses on a particular
task and is written in a very easy-to-follow manner.
A handy reference guide for data analysts and data scientists to
help to obtain value from big data analytics using Spark on Hadoop
clusters About This Book * This book is based on the latest 2.0
version of Apache Spark and 2.7 version of Hadoop integrated with
most commonly used tools. * Learn all Spark stack components
including latest topics such as DataFrames, DataSets, GraphFrames,
Structured Streaming, DataFrame based ML Pipelines and SparkR. *
Integrations with frameworks such as HDFS, YARN and tools such as
Jupyter, Zeppelin, NiFi, Mahout, HBase Spark Connector,
GraphFrames, H2O and Hivemall. Who This Book Is For Though this
book is primarily aimed at data analysts and data scientists, it
will also help architects, programmers, and practitioners.
Knowledge of either Spark or Hadoop would be beneficial. It is
assumed that you have basic programming background in Scala,
Python, SQL, or R programming with basic Linux experience. Working
experience within big data environments is not mandatory. What You
Will Learn * Find out and implement the tools and techniques of big
data analytics using Spark on Hadoop clusters with wide variety of
tools used with Spark and Hadoop * Understand all the Hadoop and
Spark ecosystem components * Get to know all the Spark components:
Spark Core, Spark SQL, DataFrames, DataSets, Conventional and
Structured Streaming, MLLib, ML Pipelines and Graphx * See batch
and real-time data analytics using Spark Core, Spark SQL, and
Conventional and Structured Streaming * Get to grips with data
science and machine learning using MLLib, ML Pipelines, H2O,
Hivemall, Graphx, SparkR and Hivemall. In Detail Big Data Analytics
book aims at providing the fundamentals of Apache Spark and Hadoop.
All Spark components - Spark Core, Spark SQL, DataFrames, Data
sets, Conventional Streaming, Structured Streaming, MLlib, Graphx
and Hadoop core components - HDFS, MapReduce and Yarn are explored
in greater depth with implementation examples on Spark + Hadoop
clusters. It is moving away from MapReduce to Spark. So, advantages
of Spark over MapReduce are explained at great depth to reap
benefits of in-memory speeds. DataFrames API, Data Sources API and
new Data set API are explained for building Big Data analytical
applications. Real-time data analytics using Spark Streaming with
Apache Kafka and HBase is covered to help building streaming
applications. New Structured streaming concept is explained with an
IOT (Internet of Things) use case. Machine learning techniques are
covered using MLLib, ML Pipelines and SparkR and Graph Analytics
are covered with GraphX and GraphFrames components of Spark.
Readers will also get an opportunity to get started with web based
notebooks such as Jupyter, Apache Zeppelin and data flow tool
Apache NiFi to analyze and visualize data. Style and approach This
step-by-step pragmatic guide will make life easy no matter what
your level of experience. You will deep dive into Apache Spark on
Hadoop clusters through ample exciting real-life examples.
Practical tutorial explains data science in simple terms to help
programmers and data analysts get started with Data Science
Unleash the power of serverless integration with Azure About This
Book * Build and support highly available and scalable API Apps by
learning powerful Azure-based cloud integration * Deploy and
deliver applications that integrate seamlessly in the cloud and
quickly adapt as per your integration needs * Deploy hybrid
applications that work and integrate on the cloud (using Logic Apps
and BizTalk Server) Who This Book Is For This book is for Microsoft
Enterprise developers, DevOps, and IT professionals who would like
to use Azure App Service and Microsoft Cloud Integration
technologies to create cloud-based web and mobile apps. What You
Will Learn * Explore new models of robust cloud integration in
Microsoft Azure * Create your own connector and learn how to
publish and manage it * Build reliable, scalable, and secure
business workflows using Azure Logic Apps * Simplify SaaS
connectivity with Azure using Logic Apps * Connect your on-premises
system to Azure securely * Get to know more about Logic Apps and
how to connect to on-premises "line-of-business" applications using
Microsoft BizTalk Server In Detail Microsoft is focusing heavily on
Enterprise connectivity so that developers can build scalable web
and mobile apps and services in the cloud. In short, Enterprise
connectivity from anywhere and to any device. These integration
services are being offered through powerful Azure-based services.
This book will teach you how to design and implement cloud
integration using Microsoft Azure. It starts by showing you how to
build, deploy, and secure the API app. Next, it introduces you to
Logic Apps and helps you quickly start building your integration
applications. We'll then go through the different connectors
available for Logic Apps to build your automated business process
workflow. Further on, you will see how to create a complex workflow
in Logic Apps using Azure Function. You will then add a SaaS
application to your existing cloud applications and create Queues
and Topics in Service Bus on Azure using Azure Portal. Towards the
end, we'll explore event hubs and IoT hubs, and you'll get to know
more about how to tool and monitor the business workflow in Logic
Apps. Using this book, you will be able to support your apps that
connect to data anywhere-be it in the cloud or on-premises. Style
and approach This practical hands-on tutorial shows you the full
capability of App Service and other Azure-based integration
services to build scalable and highly available web and mobile
apps. It helps you successfully build and support your applications
in the cloud or on-premises successfully. We'll debunk the popular
myth that switching to cloud is risky-it's not!
Over 70 practical recipes to analyze multi-dimensional data in SQL
Server 2016 Analysis Services cubes About This Book * Updated for
SQL Server 2016, this book helps you take advantage of the new MDX
commands and the new features introduced in SSAS * Perform
time-related, context-aware, and business related-calculations with
ease to enrich your Business Intelligence solutions * Collection of
techniques to write flexible and high performing MDX queries in
SSAS with carefully structured examples Who This Book Is For This
book is for anyone who has been involved in working with
multidimensional data. If you are a multidimensional cube
developer, a multidimensional database administrator, or a report
developer who writes MDX queries to access multidimensional cube,
this book will help you. If you are a power cube user or an
experienced business analyst, you will also find this book
invaluable in your data analysis. This book is for you are
interested in doing more data analysis so that the management can
make timely and accurate business decisions. What You Will Learn *
Grasp the fundamental MDX concepts, features, and techniques * Work
with sets * Work with Time dimension and create time-aware
calculations * Make analytical reports compact, concise, and
efficient * Navigate cubes * Master MDX for reporting with
Reporting Services (new) * Perform business analytics * Design
efficient cubes and efficient MDX queries * Create metadata-driven
calculations (new) * Capture MDX queries and many other techniques
In Detail If you're often faced with MDX challenges, this is a book
for you. It will teach you how to solve various real-world business
requirements using MDX queries and calculations. Examples in the
book introduce an idea or a problem and then guide you through the
process of implementing the solution in a step-by-step manner,
inform you about the best practices and offer a deep knowledge in
terms of how the solution works. Recipes are organized by chapters,
each covering a single topic. They start slowly and logically
progress to more advanced techniques. In case of complexity, things
are broken down. Instead of one, there are series of recipes built
one on top of another. This way you are able to see intermediate
results and debug potential errors faster. Finally, the cookbook
format is here to help you quickly identify the topic of interest
and in it a wide range of practical solutions, that is - MDX
recipes for your success. Style and approach This book is written
in a cookbook format, where you can browse through and look for
solutions to a particular problem in one place. Each recipe is
short, to the point and grouped by relevancy. All the recipes are
sequenced in a logical progression; you will be able to build up
your understanding of the topic incrementally.
Master the intricacies of Elasticsearch 5 and use it to create
flexible and scalable search solutions About This Book * Master the
searching, indexing, and aggregation features in ElasticSearch *
Improve users' search experience with Elasticsearch's
functionalities and develop your own Elasticsearch plugins * A
comprehensive, step-by-step guide to master the intricacies of
ElasticSearch with ease Who This Book Is For If you have some prior
working experience with Elasticsearch and want to take your
knowledge to the next level, this book will be the perfect resource
for you.If you are a developer who wants to implement scalable
search solutions with Elasticsearch, this book will also help you.
Some basic knowledge of the query DSL and data indexing is required
to make the best use of this book. What You Will Learn * Understand
Apache Lucene and Elasticsearch 5's design and architecture * Use
and configure the new and improved default text scoring mechanism
in Apache Lucene 6 * Know how to overcome the pitfalls while
handling relational data in Elasticsearch * Learn about choosing
the right queries according to the use cases and master the
scripting module including new default scripting language,
painlessly * Explore the right way of scaling production clusters
to improve the performance of Elasticsearch * Master the searching,
indexing, and aggregation features in Elasticsearch * Develop your
own Elasticsearch plugins to extend the functionalities of
Elasticsearch In Detail Elasticsearch is a modern, fast,
distributed, scalable, fault tolerant, and open source search and
analytics engine. Elasticsearch leverages the capabilities of
Apache Lucene, and provides a new level of control over how you can
index and search even huge sets of data. This book will give you a
brief recap of the basics and also introduce you to the new
features of Elasticsearch 5. We will guide you through the
intermediate and advanced functionalities of Elasticsearch, such as
querying, indexing, searching, and modifying data. We'll also
explore advanced concepts, including aggregation, index control,
sharding, replication, and clustering. We'll show you the modules
of monitoring and administration available in Elasticsearch, and
will also cover backup and recovery. You will get an understanding
of how you can scale your Elasticsearch cluster to contextualize it
and improve its performance. We'll also show you how you can create
your own analysis plugin in Elasticsearch. By the end of the book,
you will have all the knowledge necessary to master Elasticsearch
and put it to efficient use. Style and approach This comprehensive
guide covers intermediate and advanced concepts in Elasticsearch as
well as their implementation. An easy-to-follow approach means
you'll be able to master even advanced querying, searching, and
administration tasks with ease.
Get the most out of the rich development capabilities of SQL Server
2016 to build efficient database applications for your organization
About This Book * Utilize the new enhancements in Transact-SQL and
security features in SQL Server 2016 to build efficient database
applications * Work with temporal tables to get information about
data stored in the table at any point in time * A detailed guide to
SQL Server 2016, introducing you to multiple new features and
enhancements to improve your overall development experience Who
This Book Is For This book is for database developers and solution
architects who plan to use the new SQL Server 2016 features for
developing efficient database applications. It is also ideal for
experienced SQL Server developers who want to switch to SQL Server
2016 for its rich development capabilities. Some understanding of
the basic database concepts and Transact-SQL language is assumed.
What You Will Learn * Explore the new development features
introduced in SQL Server 2016 * Identify opportunities for
In-Memory OLTP technology, significantly enhanced in SQL Server
2016 * Use columnstore indexes to get significant storage and
performance improvements * Extend database design solutions using
temporal tables * Exchange JSON data between applications and SQL
Server in a more efficient way * Migrate historical data
transparently and securely to Microsoft Azure by using Stretch
Database * Use the new security features to encrypt or to have more
granular control over access to rows in a table * Simplify
performance troubleshooting with Query Store * Discover the
potential of R's integration with SQL Server In Detail Microsoft
SQL Server 2016 is considered the biggest leap in the data platform
history of the Microsoft, in the ongoing era of Big Data and data
science. Compared to its predecessors, SQL Server 2016 offers
developers a unique opportunity to leverage the advanced features
and build applications that are robust, scalable, and easy to
administer. This book introduces you to new features of SQL Server
2016 which will open a completely new set of possibilities for you
as a developer. It prepares you for the more advanced topics by
starting with a quick introduction to SQL Server 2016's new
features and a recapitulation of the possibilities you may have
already explored with previous versions of SQL Server. The next
part introduces you to small delights in the Transact-SQL language
and then switches to a completely new technology inside SQL Server
- JSON support. We also take a look at the Stretch database,
security enhancements, and temporal tables. The last chapters
concentrate on implementing advanced topics, including Query Store,
columnstore indexes, and In-Memory OLTP. You will finally be
introduced to R and how to use the R language with Transact-SQL for
data exploration and analysis. By the end of this book, you will
have the required information to design efficient, high-performance
database applications without any hassle. Style and approach This
book is a detailed guide to mastering the development features
offered by SQL Server 2016, with a unique learn-as-you-do approach.
All the concepts are explained in a very easy-to-understand manner
and are supplemented with examples to ensure that you-the
developer-are able to take that next step in building more
powerful, robust applications for your organization with ease.
|
|