|
Books > Computing & IT > Applications of computing > Databases > Data warehousing
This SpringerBrief reviews the knowledge engineering problem of
engineering objectivity in top-k query answering; essentially,
answers must be computed taking into account the user's preferences
and a collection of (subjective) reports provided by other users.
Most assume each report can be seen as a set of scores for a list
of features, its author's preferences among the features, as well
as other information is discussed in this brief. These pieces of
information for every report are then combined, along with the
querying user's preferences and their trust in each report, to rank
the query results. Everyday examples of this setup are the online
reviews that can be found in sites like Amazon, Trip Advisor, and
Yelp, among many others. Throughout this knowledge engineering
effort the authors adopt the Datalog+/- family of ontology
languages as the underlying knowledge representation and reasoning
formalism, and investigate several alternative ways in which
rankings can b e derived, along with algorithms for top-k (atomic)
query answering under these rankings. This SpringerBrief also
investigate assumptions under which our algorithms run in
polynomial time in the data complexity. Since this SpringerBrief
contains a gentle introduction to the main building blocks (OBDA,
Datalog+/-, and reasoning with preferences), it should be of value
to students, researchers, and practitioners who are interested in
the general problem of incorporating user preferences into related
formalisms and tools. Practitioners also interested in using
Ontology-based Data Access to leverage information contained in
reviews of products and services for a better customer experience
will be interested in this brief and researchers working in the
areas of Ontological Languages, Semantic Web, Data Provenance, and
Reasoning with Preferences.
Data Storage: Systems, Management and Security Issues begins with a
chapter comparing digital or electronic storage systems, such as
magnetic, optical, and flash, with biological data storage systems,
like DNA and human brain memory. In the main part of the chapter,
the following quantitative storage traits are discussed: data
organisation, functionality, data density, capacity, power
consumption, redundancy, integrity, access time, data transfer
rate. Afterwards, various facets of data warehouses as well as the
necessity for security measures are reviewed. Because the
significance of security tools is greater than ever before, the
pertinent strategies and economics are discussed. The final chapter
supplements this by discussing media and storage systems
reliability and confidentiality in order to make a greater claim
about storage security. Confidentiality, integrity and availability
are three aspects of security identified as ones that should be
preserved during data transmission, processing and storage.
Leverage the power of Redis 4.x to develop, optimize and administer
your Redis solutions with ease Key Features Build, deploy and
administer high performance and scalable applications in Redis
Covers a range of important tasks - including development and
administration of Redis A practical guide that takes your
understanding of Redis to the next level Book DescriptionRedis is
considered the world's most popular key-value store database. Its
versatility and the wide variety of use cases it enables have made
it a popular choice of database for many enterprises. Based on the
latest version of Redis, this book provides both step-by-step
recipes and relevant the background information required to utilize
its features to the fullest. It covers everything from a basic
understanding of Redis data types to advanced aspects of Redis high
availability, clustering, administration, and troubleshooting. This
book will be your great companion to master all aspects of Redis.
The book starts off by installing and configuring Redis for you to
get started with ease. Moving on, all the data types and features
of Redis are introduced in detail. Next, you will learn how to
develop applications with Redis in Java, Python, and the Spring
Boot web framework. You will also learn replication tasks, which
will help you to troubleshoot replication issues. Furthermore, you
will learn the steps that need to be undertaken to ensure high
availability on your cluster and during production deployment.
Toward the end of the book, you will learn the topmost tasks that
will help you to troubleshoot your ecosystem efficiently, along
with extending Redis by using different modules. What you will
learn Install and configure your Redis instance Explore various
data types and commands in Redis Build client-side applications as
well as a Big Data framework with Redis Manage data replication and
persistence in Redis Implement high availability and data sharding
in Redis Extend Redis with Redis Module Benchmark, debug, fine-tune
and troubleshoot various issues in Redis Who this book is forThis
book is for database administrators, developers and architects who
want to tackle the common and not so common problems associated
with the different development and administration-related tasks in
Redis. A fundamental understanding of Redis is expected to get the
best out of this book.
Get up and running with the Pentaho Data Integration tool using
this hands-on, easy-to-read guide About This Book * Manipulate your
data by exploring, transforming, validating, and integrating it
using Pentaho Data Integration 8 CE * A comprehensive guide
exploring the features of Pentaho Data Integration 8 CE * Connect
to any database engine, explore the databases, and perform all kind
of operations on relational databases Who This Book Is For This
book is a must-have for software developers, business intelligence
analysts, IT students, or anyone involved or interested in
developing ETL solutions. If you plan on using Pentaho Data
Integration for doing any data manipulation task, this book will
help you as well. This book is also a good starting point for data
warehouse designers, architects, or anyone who is responsible for
data warehouse projects and needs to load data into them. What You
Will Learn * Explore the features and capabilities of Pentaho Data
Integration 8 Community Edition * Install and get started with PDI
* Learn the ins and outs of Spoon, the graphical designer tool *
Learn to get data from all kind of data sources, such as plain
files, Excel spreadsheets, databases, and XML files * Use Pentaho
Data Integration to perform CRUD (create, read, update, and delete)
operations on relationaldatabases * Populate a data mart with
Pentaho Data Integration * Use Pentaho Data Integration to organize
files and folders, run daily processes, deal with errors, and more
In Detail Pentaho Data Integration(PDI) is an intuitive and
graphical environment packed with drag-and-drop design and powerful
Extract-Tranform-Load (ETL) capabilities. This book shows and
explains the new interactive features of Spoon, the revamped look
and feel, and the newest features of the tool including
transformations and jobs Executors and the invaluable Metadata
Injection capability. We begin with the installation of PDI
software and then move on to cover all the key PDI concepts. Each
of the chapter introduces new features, enabling you to gradually
get practicing with the tool. First, you will learn to do all kind
of data manipulation and work with simple plain files. Then, the
book teaches you how you can work with relational databases inside
PDI. Moreover, you will be given a primer on data warehouse
concepts and you will learn how to load data in a data warehouse.
During the course of this book, you will be familiarized with its
intuitive, graphical and drag-and-drop design environment. By the
end of this book, you will learn everything you need to know in
order to meet your data manipulation requirements. Besides, your
will be given best practices and advises for designing and
deploying your projects. Style and approach Step by step guide
filled with practical, real world scenarios and examples.
Harness the power of SQL Server 2017 Integration Services to build
your data integration solutions with ease About This Book *
Acquaint yourself with all the newly introduced features in SQL
Server 2017 Integration Services * Program and extend your packages
to enhance their functionality * This detailed, step-by-step guide
covers everything you need to develop efficient data integration
and data transformation solutions for your organization Who This
Book Is For This book is ideal for software engineers, DW/ETL
architects, and ETL developers who need to create a new, or enhance
an existing, ETL implementation with SQL Server 2017 Integration
Services. This book would also be good for individuals who develop
ETL solutions that use SSIS and are keen to learn the new features
and capabilities in SSIS 2017. What You Will Learn * Understand the
key components of an ETL solution using SQL Server 2016-2017
Integration Services * Design the architecture of a modern ETL
solution * Have a good knowledge of the new capabilities and
features added to Integration Services * Implement ETL solutions
using Integration Services for both on-premises and Azure data *
Improve the performance and scalability of an ETL solution *
Enhance the ETL solution using a custom framework * Be able to work
on the ETL solution with many other developers and have common
design paradigms or techniques * Effectively use scripting to solve
complex data issues In Detail SQL Server Integration Services is a
tool that facilitates data extraction, consolidation, and loading
options (ETL), SQL Server coding enhancements, data warehousing,
and customizations. With the help of the recipes in this book,
you'll gain complete hands-on experience of SSIS 2017 as well as
the 2016 new features, design and development improvements
including SCD, Tuning, and Customizations. At the start, you'll
learn to install and set up SSIS as well other SQL Server resources
to make optimal use of this Business Intelligence tools. We'll
begin by taking you through the new features in SSIS 2016/2017 and
implementing the necessary features to get a modern scalable ETL
solution that fits the modern data warehouse. Through the course of
chapters, you will learn how to design and build SSIS data
warehouses packages using SQL Server Data Tools. Additionally,
you'll learn to develop SSIS packages designed to maintain a data
warehouse using the Data Flow and other control flow tasks. You'll
also be demonstrated many recipes on cleansing data and how to get
the end result after applying different transformations. Some
real-world scenarios that you might face are also covered and how
to handle various issues that you might face when designing your
packages. At the end of this book, you'll get to know all the key
concepts to perform data integration and transformation. You'll
have explored on-premises Big Data integration processes to create
a classic data warehouse, and will know how to extend the toolbox
with custom tasks and transforms. Style and approach This cookbook
follows a problem-solution approach and tackles all kinds of data
integration scenarios by using the capabilities of SQL Server 2016
Integration Services. This book is well supplemented with
screenshots, tips, and tricks. Each recipe focuses on a particular
task and is written in a very easy-to-follow manner.
Unleash the power of serverless integration with Azure About This
Book * Build and support highly available and scalable API Apps by
learning powerful Azure-based cloud integration * Deploy and
deliver applications that integrate seamlessly in the cloud and
quickly adapt as per your integration needs * Deploy hybrid
applications that work and integrate on the cloud (using Logic Apps
and BizTalk Server) Who This Book Is For This book is for Microsoft
Enterprise developers, DevOps, and IT professionals who would like
to use Azure App Service and Microsoft Cloud Integration
technologies to create cloud-based web and mobile apps. What You
Will Learn * Explore new models of robust cloud integration in
Microsoft Azure * Create your own connector and learn how to
publish and manage it * Build reliable, scalable, and secure
business workflows using Azure Logic Apps * Simplify SaaS
connectivity with Azure using Logic Apps * Connect your on-premises
system to Azure securely * Get to know more about Logic Apps and
how to connect to on-premises "line-of-business" applications using
Microsoft BizTalk Server In Detail Microsoft is focusing heavily on
Enterprise connectivity so that developers can build scalable web
and mobile apps and services in the cloud. In short, Enterprise
connectivity from anywhere and to any device. These integration
services are being offered through powerful Azure-based services.
This book will teach you how to design and implement cloud
integration using Microsoft Azure. It starts by showing you how to
build, deploy, and secure the API app. Next, it introduces you to
Logic Apps and helps you quickly start building your integration
applications. We'll then go through the different connectors
available for Logic Apps to build your automated business process
workflow. Further on, you will see how to create a complex workflow
in Logic Apps using Azure Function. You will then add a SaaS
application to your existing cloud applications and create Queues
and Topics in Service Bus on Azure using Azure Portal. Towards the
end, we'll explore event hubs and IoT hubs, and you'll get to know
more about how to tool and monitor the business workflow in Logic
Apps. Using this book, you will be able to support your apps that
connect to data anywhere-be it in the cloud or on-premises. Style
and approach This practical hands-on tutorial shows you the full
capability of App Service and other Azure-based integration
services to build scalable and highly available web and mobile
apps. It helps you successfully build and support your applications
in the cloud or on-premises successfully. We'll debunk the popular
myth that switching to cloud is risky-it's not!
Get the most out of the rich development capabilities of SQL Server
2016 to build efficient database applications for your organization
About This Book * Utilize the new enhancements in Transact-SQL and
security features in SQL Server 2016 to build efficient database
applications * Work with temporal tables to get information about
data stored in the table at any point in time * A detailed guide to
SQL Server 2016, introducing you to multiple new features and
enhancements to improve your overall development experience Who
This Book Is For This book is for database developers and solution
architects who plan to use the new SQL Server 2016 features for
developing efficient database applications. It is also ideal for
experienced SQL Server developers who want to switch to SQL Server
2016 for its rich development capabilities. Some understanding of
the basic database concepts and Transact-SQL language is assumed.
What You Will Learn * Explore the new development features
introduced in SQL Server 2016 * Identify opportunities for
In-Memory OLTP technology, significantly enhanced in SQL Server
2016 * Use columnstore indexes to get significant storage and
performance improvements * Extend database design solutions using
temporal tables * Exchange JSON data between applications and SQL
Server in a more efficient way * Migrate historical data
transparently and securely to Microsoft Azure by using Stretch
Database * Use the new security features to encrypt or to have more
granular control over access to rows in a table * Simplify
performance troubleshooting with Query Store * Discover the
potential of R's integration with SQL Server In Detail Microsoft
SQL Server 2016 is considered the biggest leap in the data platform
history of the Microsoft, in the ongoing era of Big Data and data
science. Compared to its predecessors, SQL Server 2016 offers
developers a unique opportunity to leverage the advanced features
and build applications that are robust, scalable, and easy to
administer. This book introduces you to new features of SQL Server
2016 which will open a completely new set of possibilities for you
as a developer. It prepares you for the more advanced topics by
starting with a quick introduction to SQL Server 2016's new
features and a recapitulation of the possibilities you may have
already explored with previous versions of SQL Server. The next
part introduces you to small delights in the Transact-SQL language
and then switches to a completely new technology inside SQL Server
- JSON support. We also take a look at the Stretch database,
security enhancements, and temporal tables. The last chapters
concentrate on implementing advanced topics, including Query Store,
columnstore indexes, and In-Memory OLTP. You will finally be
introduced to R and how to use the R language with Transact-SQL for
data exploration and analysis. By the end of this book, you will
have the required information to design efficient, high-performance
database applications without any hassle. Style and approach This
book is a detailed guide to mastering the development features
offered by SQL Server 2016, with a unique learn-as-you-do approach.
All the concepts are explained in a very easy-to-understand manner
and are supplemented with examples to ensure that you-the
developer-are able to take that next step in building more
powerful, robust applications for your organization with ease.
Master the intricacies of Elasticsearch 5 and use it to create
flexible and scalable search solutions About This Book * Master the
searching, indexing, and aggregation features in ElasticSearch *
Improve users' search experience with Elasticsearch's
functionalities and develop your own Elasticsearch plugins * A
comprehensive, step-by-step guide to master the intricacies of
ElasticSearch with ease Who This Book Is For If you have some prior
working experience with Elasticsearch and want to take your
knowledge to the next level, this book will be the perfect resource
for you.If you are a developer who wants to implement scalable
search solutions with Elasticsearch, this book will also help you.
Some basic knowledge of the query DSL and data indexing is required
to make the best use of this book. What You Will Learn * Understand
Apache Lucene and Elasticsearch 5's design and architecture * Use
and configure the new and improved default text scoring mechanism
in Apache Lucene 6 * Know how to overcome the pitfalls while
handling relational data in Elasticsearch * Learn about choosing
the right queries according to the use cases and master the
scripting module including new default scripting language,
painlessly * Explore the right way of scaling production clusters
to improve the performance of Elasticsearch * Master the searching,
indexing, and aggregation features in Elasticsearch * Develop your
own Elasticsearch plugins to extend the functionalities of
Elasticsearch In Detail Elasticsearch is a modern, fast,
distributed, scalable, fault tolerant, and open source search and
analytics engine. Elasticsearch leverages the capabilities of
Apache Lucene, and provides a new level of control over how you can
index and search even huge sets of data. This book will give you a
brief recap of the basics and also introduce you to the new
features of Elasticsearch 5. We will guide you through the
intermediate and advanced functionalities of Elasticsearch, such as
querying, indexing, searching, and modifying data. We'll also
explore advanced concepts, including aggregation, index control,
sharding, replication, and clustering. We'll show you the modules
of monitoring and administration available in Elasticsearch, and
will also cover backup and recovery. You will get an understanding
of how you can scale your Elasticsearch cluster to contextualize it
and improve its performance. We'll also show you how you can create
your own analysis plugin in Elasticsearch. By the end of the book,
you will have all the knowledge necessary to master Elasticsearch
and put it to efficient use. Style and approach This comprehensive
guide covers intermediate and advanced concepts in Elasticsearch as
well as their implementation. An easy-to-follow approach means
you'll be able to master even advanced querying, searching, and
administration tasks with ease.
Technology has revolutionized the ways in which libraries store,
share, and access information. As digital resources and tools
continue to advance, so too do the opportunities for libraries to
become more efficient and house more information. E-Discovery Tools
and Applications in Modern Libraries presents critical research on
the digitization of data and how this shift has impacted knowledge
discovery, storage, and retrieval. This publication explores
several emerging trends and concepts essential to electronic
discovery, such as library portals, responsive websites, and
federated search technology. The timely research presented within
this publication is designed for use by librarians, graduate-level
students, technology developers, and researchers in the field of
library and information science.
This book introduces fundamentals and trade-offs of data
de-duplication techniques. It describes novel emerging
de-duplication techniques that remove duplicate data both in
storage and network in an efficient and effective manner. It
explains places where duplicate data are originated, and provides
solutions that remove the duplicate data. It classifies existing
de-duplication techniques depending on size of unit data to be
compared, the place of de-duplication, and the time of
de-duplication. Chapter 3 considers redundancies in email servers
and a de-duplication technique to increase reduction performance
with low overhead by switching chunk-based de-duplication and
file-based de-duplication. Chapter 4 develops a de-duplication
technique applied for cloud-storage service where unit data to be
compared are not physical-format but logical structured-format,
reducing processing time efficiently. Chapter 5 displays a network
de-duplication where redundant data packets sent by clients are
encoded (shrunk to small-sized payload) and decoded (restored to
original size payload) in routers or switches on the way to remote
servers through network. Chapter 6 introduces a mobile
de-duplication technique with image (JPEG) or video (MPEG)
considering performance and overhead of encryption algorithm for
security on mobile device.
With this textbook, Vaisman and Zimanyi deliver excellent coverage
of data warehousing and business intelligence technologies ranging
from the most basic principles to recent findings and applications.
To this end, their work is structured into three parts. Part I
describes "Fundamental Concepts" including multi-dimensional
models; conceptual and logical data warehouse design and MDX and
SQL/OLAP. Subsequently, Part II details "Implementation and
Deployment," which includes physical data warehouse design; data
extraction, transformation, and loading (ETL) and data analytics.
Lastly, Part III covers "Advanced Topics" such as spatial data
warehouses; trajectory data warehouses; semantic technologies in
data warehouses and novel technologies like Map Reduce,
column-store databases and in-memory databases. As a key
characteristic of the book, most of the topics are presented and
illustrated using application tools. Specifically, a case study
based on the well-known Northwind database illustrates how the
concepts presented in the book can be implemented using Microsoft
Analysis Services and Pentaho Business Analytics. All chapters are
summarized using review questions and exercises to support
comprehensive student learning. Supplemental material to assist
instructors using this book as a course text is available at
http://cs.ulb.ac.be/DWSDIbook/, including electronic versions of
the figures, solutions to all exercises, and a set of slides
accompanying each chapter. Overall, students, practitioners and
researchers alike will find this book the most comprehensive
reference work on data warehouses, with key topics described in a
clear and educational style.
Over 70 practical recipes to analyze multi-dimensional data in SQL
Server 2016 Analysis Services cubes About This Book * Updated for
SQL Server 2016, this book helps you take advantage of the new MDX
commands and the new features introduced in SSAS * Perform
time-related, context-aware, and business related-calculations with
ease to enrich your Business Intelligence solutions * Collection of
techniques to write flexible and high performing MDX queries in
SSAS with carefully structured examples Who This Book Is For This
book is for anyone who has been involved in working with
multidimensional data. If you are a multidimensional cube
developer, a multidimensional database administrator, or a report
developer who writes MDX queries to access multidimensional cube,
this book will help you. If you are a power cube user or an
experienced business analyst, you will also find this book
invaluable in your data analysis. This book is for you are
interested in doing more data analysis so that the management can
make timely and accurate business decisions. What You Will Learn *
Grasp the fundamental MDX concepts, features, and techniques * Work
with sets * Work with Time dimension and create time-aware
calculations * Make analytical reports compact, concise, and
efficient * Navigate cubes * Master MDX for reporting with
Reporting Services (new) * Perform business analytics * Design
efficient cubes and efficient MDX queries * Create metadata-driven
calculations (new) * Capture MDX queries and many other techniques
In Detail If you're often faced with MDX challenges, this is a book
for you. It will teach you how to solve various real-world business
requirements using MDX queries and calculations. Examples in the
book introduce an idea or a problem and then guide you through the
process of implementing the solution in a step-by-step manner,
inform you about the best practices and offer a deep knowledge in
terms of how the solution works. Recipes are organized by chapters,
each covering a single topic. They start slowly and logically
progress to more advanced techniques. In case of complexity, things
are broken down. Instead of one, there are series of recipes built
one on top of another. This way you are able to see intermediate
results and debug potential errors faster. Finally, the cookbook
format is here to help you quickly identify the topic of interest
and in it a wide range of practical solutions, that is - MDX
recipes for your success. Style and approach This book is written
in a cookbook format, where you can browse through and look for
solutions to a particular problem in one place. Each recipe is
short, to the point and grouped by relevancy. All the recipes are
sequenced in a logical progression; you will be able to build up
your understanding of the topic incrementally.
A handy reference guide for data analysts and data scientists to
help to obtain value from big data analytics using Spark on Hadoop
clusters About This Book * This book is based on the latest 2.0
version of Apache Spark and 2.7 version of Hadoop integrated with
most commonly used tools. * Learn all Spark stack components
including latest topics such as DataFrames, DataSets, GraphFrames,
Structured Streaming, DataFrame based ML Pipelines and SparkR. *
Integrations with frameworks such as HDFS, YARN and tools such as
Jupyter, Zeppelin, NiFi, Mahout, HBase Spark Connector,
GraphFrames, H2O and Hivemall. Who This Book Is For Though this
book is primarily aimed at data analysts and data scientists, it
will also help architects, programmers, and practitioners.
Knowledge of either Spark or Hadoop would be beneficial. It is
assumed that you have basic programming background in Scala,
Python, SQL, or R programming with basic Linux experience. Working
experience within big data environments is not mandatory. What You
Will Learn * Find out and implement the tools and techniques of big
data analytics using Spark on Hadoop clusters with wide variety of
tools used with Spark and Hadoop * Understand all the Hadoop and
Spark ecosystem components * Get to know all the Spark components:
Spark Core, Spark SQL, DataFrames, DataSets, Conventional and
Structured Streaming, MLLib, ML Pipelines and Graphx * See batch
and real-time data analytics using Spark Core, Spark SQL, and
Conventional and Structured Streaming * Get to grips with data
science and machine learning using MLLib, ML Pipelines, H2O,
Hivemall, Graphx, SparkR and Hivemall. In Detail Big Data Analytics
book aims at providing the fundamentals of Apache Spark and Hadoop.
All Spark components - Spark Core, Spark SQL, DataFrames, Data
sets, Conventional Streaming, Structured Streaming, MLlib, Graphx
and Hadoop core components - HDFS, MapReduce and Yarn are explored
in greater depth with implementation examples on Spark + Hadoop
clusters. It is moving away from MapReduce to Spark. So, advantages
of Spark over MapReduce are explained at great depth to reap
benefits of in-memory speeds. DataFrames API, Data Sources API and
new Data set API are explained for building Big Data analytical
applications. Real-time data analytics using Spark Streaming with
Apache Kafka and HBase is covered to help building streaming
applications. New Structured streaming concept is explained with an
IOT (Internet of Things) use case. Machine learning techniques are
covered using MLLib, ML Pipelines and SparkR and Graph Analytics
are covered with GraphX and GraphFrames components of Spark.
Readers will also get an opportunity to get started with web based
notebooks such as Jupyter, Apache Zeppelin and data flow tool
Apache NiFi to analyze and visualize data. Style and approach This
step-by-step pragmatic guide will make life easy no matter what
your level of experience. You will deep dive into Apache Spark on
Hadoop clusters through ample exciting real-life examples.
Practical tutorial explains data science in simple terms to help
programmers and data analysts get started with Data Science
This book presents Hyper-lattice, a new algebraic model for
partially ordered sets, and an alternative to lattice. The authors
analyze some of the shortcomings of conventional lattice structure
and propose a novel algebraic structure in the form of
Hyper-lattice to overcome problems with lattice. They establish how
Hyper-lattice supports dynamic insertion of elements in a partial
order set with a partial hierarchy between the set members. The
authors present the characteristics and the different properties,
showing how propositions and lemmas formalize Hyper-lattice as a
new algebraic structure.
|
|