![]() |
Welcome to Loot.co.za!
Sign in / Register |Wishlists & Gift Vouchers |Help | Advanced search
|
Your cart is empty |
||
|
Books > Computing & IT > Applications of computing > Databases > Data warehousing
The Data Vault was invented by Dan Linstedt at the U.S. Department of Defense, and the standard has been successfully applied to data warehousing projects at organizations of different sizes, from small to large-size corporations. Due to its simplified design, which is adapted from nature, the Data Vault 2.0 standard helps prevent typical data warehousing failures. "Building a Scalable Data Warehouse" covers everything one needs to know to create a scalable data warehouse end to end, including a presentation of the Data Vault modeling technique, which provides the foundations to create a technical data warehouse layer. The book discusses how to build the data warehouse incrementally using the agile Data Vault 2.0 methodology. In addition, readers will learn how to create the input layer (the stage layer) and the presentation layer (data mart) of the Data Vault 2.0 architecture including implementation best practices. Drawing upon years of practical experience and using numerous examples and an easy to understand framework, Dan Linstedt and Michael Olschimke discuss: How to load each layer using SQL Server Integration Services (SSIS), including automation of the Data Vault loading processes. Important data warehouse technologies and practices. Data Quality Services (DQS) and Master Data Services (MDS) in the context of the Data Vault architecture.
This guide shows how to combine data science with social science to gain unprecedented insight into customer behavior, so you can change it. Joanne Rodrigues-Craig bridges the gap between predictive data science and statistical techniques that reveal why important things happen -- why customers buy more, or why they immediately leave your site -- so you can get more behaviors you want and less you don't. Drawing on extensive enterprise experience and deep knowledge of demographics and sociology, Rodrigues-Craig shows how to create better theories and metrics, so you can accelerate the process of gaining insight, altering behavior, and earning business value. You'll learn how to: Develop complex, testable theories for understanding individual and social behavior in web products Think like a social scientist and contextualize individual behavior in today's social environments Build more effective metrics and KPIs for any web product or system Conduct more informative and actionable A/B tests Explore causal effects, reflecting a deeper understanding of the differences between correlation and causation Alter user behavior in a complex web product Understand how relevant human behaviors develop, and the prerequisites for changing them Choose the right statistical techniques for common tasks such as multistate and uplift modeling Use advanced statistical techniques to model multidimensional systems Do all of this in R (with sample code available in a separate code manual)
A practical guide to making good decisions in a world of missing data In the era of big data, it is easy to imagine that we have all the information we need to make good decisions. But in fact the data we have are never complete, and may be only the tip of the iceberg. Just as much of the universe is composed of dark matter, invisible to us but nonetheless present, the universe of information is full of dark data that we overlook at our peril. In Dark Data, data expert David Hand takes us on a fascinating and enlightening journey into the world of the data we don't see. Dark Data explores the many ways in which we can be blind to missing data and how that can lead us to conclusions and actions that are mistaken, dangerous, or even disastrous. Examining a wealth of real-life examples, from the Challenger shuttle explosion to complex financial frauds, Hand gives us a practical taxonomy of the types of dark data that exist and the situations in which they can arise, so that we can learn to recognize and control for them. In doing so, he teaches us not only to be alert to the problems presented by the things we don't know, but also shows how dark data can be used to our advantage, leading to greater understanding and better decisions. Today, we all make decisions using data. Dark Data shows us all how to reduce the risk of making bad ones.
A practical guide to making good decisions in a world of missing data In the era of big data, it is easy to imagine that we have all the information we need to make good decisions. But in fact the data we have are never complete, and may be only the tip of the iceberg. Just as much of the universe is composed of dark matter, invisible to us but nonetheless present, the universe of information is full of dark data that we overlook at our peril. In Dark Data, data expert David Hand takes us on a fascinating and enlightening journey into the world of the data we don't see. Dark Data explores the many ways in which we can be blind to missing data and how that can lead us to conclusions and actions that are mistaken, dangerous, or even disastrous. Examining a wealth of real-life examples, from the Challenger shuttle explosion to complex financial frauds, Hand gives us a practical taxonomy of the types of dark data that exist and the situations in which they can arise, so that we can learn to recognize and control for them. In doing so, he teaches us not only to be alert to the problems presented by the things we don't know, but also shows how dark data can be used to our advantage, leading to greater understanding and better decisions. Today, we all make decisions using data. Dark Data shows us all how to reduce the risk of making bad ones.
Building upon his earlier book that detailed agile data warehousing programming techniques for the Scrum master, Ralph's latest work illustrates the agile interpretations of the remaining software engineering disciplines: Requirements management benefits from streamlined templates that not only define projects quickly, but ensure nothing essential is overlooked. Data engineering receives two new "hyper modeling" techniques, yielding data warehouses that can be easily adapted when requirements change without having to invest in ruinously expensive data-conversion programs. Quality assurance advances with not only a stereoscopic top-down and bottom-up planning method, but also the incorporation of the latest in automated test engines. Use this step-by-step guide to deepen your own application development skills through self-study, show your teammates the world's fastest and most reliable techniques for creating business intelligence systems, or ensure that the IT department working for you is building your next decision support system the right way.
Publisher's Note: Products purchased from Third Party sellers are not guaranteed by the publisher for quality, authenticity, or access to any online entitlements included with the product. Develop a custom, agile data warehousing and business intelligence architectureEmpower your users and drive better decision making across your enterprise with detailed instructions and best practices from an expert developer and trainer. The Data Warehouse Mentor: Practical Data Warehouse and Business Intelligence Insights shows how to plan, design, construct, and administer an integrated end-to-end DW/BI solution. Learn how to choose appropriate components, build an enterprise data model, configure data marts and data warehouses, establish data flow, and mitigate risk. Change management, data governance, and security are also covered in this comprehensive guide. Understand the components of BI and data warehouse systems Establish project goals and implement an effective deployment plan Build accurate logical and physical enterprise data models Gain insight into your company's transactions with data mining Input, cleanse, and normalize data using ETL (Extract, Transform, and Load) techniques Use structured input files to define data requirements Employ top-down, bottom-up, and hybrid design methodologies Handle security and optimize performance using data governance tools Robert Laberge is the founder of several Internet ventures and a principle consultant for the IBM Industry Models and Assets Lab, which has a focus on data warehousing and business intelligence solutions.
This double volumes LNCS 11229-11230 constitutes the refereed proceedings of the Confederated International Conferences: Cooperative Information Systems, CoopIS 2018, Ontologies, Databases, and Applications of Semantics, ODBASE 2018, and Cloud and Trusted Computing, C&TC, held as part of OTM 2018 in October 2018 in Valletta, Malta. The 64 full papers presented together with 22 short papers were carefully reviewed and selected from 173 submissions. The OTM program every year covers data and Web semantics, distributed objects, Web services, databases, informationsystems, enterprise workflow and collaboration, ubiquity, interoperability, mobility, grid and high-performance computing.
This two-volume set LNCS 11196 and LNCS 11197 constitutes the refereed proceedings of the 7th International Conference on Digital Heritage, EuroMed 2018, held in Nicosia, Cyprus, in October/November 2018. The 21 full papers, 47 project papers, and 29 short papers presented were carefully reviewed and selected from 537 submissions. The papers are organized in topical sections on 3D Digitalization, Reconstruction, Modeling, and HBIM; Innovative Technologies in Digital Cultural Heritage; Digital Cultural Heritage -Smart Technologies; The New Era of Museums and Exhibitions; Digital Cultural Heritage Infrastructure; Non Destructive Techniques in Cultural Heritage Conservation; E-Humanities; Reconstructing the Past; Visualization, VR and AR Methods and Applications; Digital Applications for Materials Preservation in Cultural Heritage; and Digital Cultural Heritage Learning and Experiences.
Hive makes life much easier for developers who work with stored and managed data in Hadoop clusters, such as data warehouses. With this example-driven guide, you'll learn how to use the Hive infrastructure to provide data summarization, query, and analysis - particularly with HiveQL, the query language dialect of SQL. You'll learn how to set up Hive in your environment and optimize its use, and how it interoperates with other tools, such as HBase. You'll also learn how to extend Hive with custom code written in Java or scripting languages. Ideal for developers with prior SQL experience, this book shows you how Hive simplifies many tasks that would be much harder to implement in the lower-level MapReduce API provided by Hadoop.
Dive into the world of SQL on Hadoop and get the most out of your Hive data warehouses. This book is your go-to resource for using Hive: authors Scott Shaw, Ankur Gupta, David Kjerrumgaard, and Andreas Francois Vermeulen take you through learning HiveQL, the SQL-like language specific to Hive, to analyze, export, and massage the data stored across your Hadoop environment. From deploying Hive on your hardware or virtual machine and setting up its initial configuration to learning how Hive interacts with Hadoop, MapReduce, Tez and other big data technologies, Practical Hive gives you a detailed treatment of the software. In addition, this book discusses the value of open source software, Hive performance tuning, and how to leverage semi-structured and unstructured data. What You Will Learn Install and configure Hive for new and existing datasets Perform DDL operations Execute efficient DML operations Use tables, partitions, buckets, and user-defined functions Discover performance tuning tips and Hive best practices Who This Book Is For Developers, companies, and professionals who deal with large amounts of data and could use software that can efficiently manage large volumes of input. It is assumed that readers have the ability to work with SQL.
This SpringerBrief reviews the knowledge engineering problem of engineering objectivity in top-k query answering; essentially, answers must be computed taking into account the user's preferences and a collection of (subjective) reports provided by other users. Most assume each report can be seen as a set of scores for a list of features, its author's preferences among the features, as well as other information is discussed in this brief. These pieces of information for every report are then combined, along with the querying user's preferences and their trust in each report, to rank the query results. Everyday examples of this setup are the online reviews that can be found in sites like Amazon, Trip Advisor, and Yelp, among many others. Throughout this knowledge engineering effort the authors adopt the Datalog+/- family of ontology languages as the underlying knowledge representation and reasoning formalism, and investigate several alternative ways in which rankings can b e derived, along with algorithms for top-k (atomic) query answering under these rankings. This SpringerBrief also investigate assumptions under which our algorithms run in polynomial time in the data complexity. Since this SpringerBrief contains a gentle introduction to the main building blocks (OBDA, Datalog+/-, and reasoning with preferences), it should be of value to students, researchers, and practitioners who are interested in the general problem of incorporating user preferences into related formalisms and tools. Practitioners also interested in using Ontology-based Data Access to leverage information contained in reviews of products and services for a better customer experience will be interested in this brief and researchers working in the areas of Ontological Languages, Semantic Web, Data Provenance, and Reasoning with Preferences.
How to build and maintain strong data organizations--the Dummies way Data Governance For Dummies offers an accessible first step for decision makers into understanding how data governance works and how to apply it to an organization in a way that improves results and doesn't disrupt. Prep your organization to handle the data explosion (if you know, you know) and learn how to manage this valuable asset. Take full control of your organization's data with all the info and how-tos you need. This book walks you through making accurate data readily available and maintaining it in a secure environment. It serves as your step-by-step guide to extracting every ounce of value from your data. Identify the impact and value of data in your business Design governance programs that fit your organization Discover and adopt tools that measure performance and need Address data needs and build a more data-centric business culture This is the perfect handbook for professionals in the world of data analysis and business intelligence, plus the people who interact with data on a daily basis. And, as always, Dummies explains things in terms anyone can understand, making it easy to learn everything you need to know.
This book presents Hyper-lattice, a new algebraic model for partially ordered sets, and an alternative to lattice. The authors analyze some of the shortcomings of conventional lattice structure and propose a novel algebraic structure in the form of Hyper-lattice to overcome problems with lattice. They establish how Hyper-lattice supports dynamic insertion of elements in a partial order set with a partial hierarchy between the set members. The authors present the characteristics and the different properties, showing how propositions and lemmas formalize Hyper-lattice as a new algebraic structure.
There is growing recognition of the need to address the fragility of digital information, on which our society heavily depends for smooth operation in all aspects of daily life. This has been discussed in many books and articles on digital preservation, so why is there a need for yet one more? Because, for the most part, those other publications focus on documents, images and webpages - objects that are normally rendered to be simply displayed by software to a human viewer. Yet there are clearly many more types of digital objects that may need to be preserved, such as databases, scientific data and software itself. David Giaretta, Director of the Alliance for Permanent Access, and his contributors explain why the tools and techniques used for preserving rendered objects are inadequate for all these other types of digital objects, and they provide the concepts, techniques and tools that are needed. The book is structured in three parts. The first part is on theory, i.e., the concepts and techniques that are essential for preserving digitally encoded information. The second part then shows practice, i.e., the use and validation of these tools and techniques. Finally, the third part concludes by addressing how to judge whether money is being well spent, in terms of effectiveness and cost sharing. Various examples of digital objects from many sources are used to explain the tools and techniques presented. The presentation style mainly aims at practitioners in libraries, archives and industry who are either directly responsible for preservation or who need to prepare for audits of their archives. Researchers in digital preservation and developers of preservation tools and techniques will also find valuable practical information here. Researchers creating digitally encoded information of all kinds will also need to be aware of these topics so that they can help to ensure that their data is usable and can be valued by others now and in the future. To further assist the reader, the book is supported by many hours of videos and presentations from the CASPAR project and by a set of open source software.
Multi-Modal User Interactions in Controlled Environments investigates the capture and analysis of user's multimodal behavior (mainly eye gaze, eye fixation, eye blink and body movements) within a real controlled environment (controlled-supermarket, personal environment) in order to adapt the response of the computer/environment to the user. Such data is captured using non-intrusive sensors (for example, cameras in the stands of a supermarket) installed in the environment. This multi-modal video based behavioral data will be analyzed to infer user intentions while assisting users in their day-to-day tasks by adapting the system's response to their requirements seamlessly. This book also focuses on the presentation of information to the user. Multi-Modal User Interactions in Controlled Environments is designed for professionals in industry, including professionals in the domains of security and interactive web television. This book is also suitable for graduate-level students in computer science and electrical engineering.
The two-volume set LNCS 7382 and 7383 constiutes the refereed proceedings of the 13th International Conference on Computers Helping People with Special Needs, ICCHP 2012, held in Linz, Austria, in July 2012. The 147 revised full papers and 42 short papers were carefully reviewed and selected from 364 submissions. The papers included in the second volume are organized in the following topical sections: portable and mobile systems in assistive technology; assistive technology, HCI and rehabilitation; sign 2.0: ICT for sign language users: information sharing, interoperability, user-centered design and collaboration; computer-assisted augmentative and alternative communication; easy to Web between science of education, information design and speech technology; smart and assistive environments: ambient assisted living; text entry for accessible computing; tactile graphics and models for blind people and recognition of shapes by touch; mobility for blind and partially sighted people; and human-computer interaction for blind and partially sighted people.
The two-volume set LNCS 7382 and 7383 constiutes the refereed proceedings of the 13th International Conference on Computers Helping People with Special Needs, ICCHP 2012, held in Linz, Austria, in July 2012. The 147 revised full papers and 42 short papers were carefully reviewed and selected from 364 submissions. The papers included in the first volume are organized in the following topical sections: universal learning design; putting the disabled student in charge: user focused technology in education; access to mathematics and science; policy and service provision; creative design for inclusion, virtual user models for designing and using inclusive products; web accessibility in advanced technologies, website accessibility metrics; entertainment software accessibility; document and media accessibility; inclusion by accessible social media; a new era for document accessibility: understanding, managing and implementing the ISO standard PDF/UA; and human-computer interaction and usability for elderly.
Details recent research in areas such as ontology design for information integration, metadata generation and management, and representation and management of distributed ontologies. Provides decision support on the use of novel technologies, information about potential problems, and guidelines for the successful application of existing technologies.
The Semantic Web is a vision - the idea of having data on the Web defined and linked in such a way that it can be used by machines not just for display purposes but for automation, integration and reuse of data across various applications. However, there is a widespread misconception that the Semantic Web is a rehash of existing AI and database work. Kashyap, Bussler, and Moran dispel this notion by presenting the multi-disciplinary technological underpinnings such as machine learning, information retrieval, service-oriented architectures, and grid computing. Thus they combine the informational and computational aspects needed to realize the full potential of the Semantic Web vision.
Service-oriented computing is an emerging factor in IT research and development. Organizations like W3C and the EU have begun research projects to develop industrial-strength applications. This book offers a thorough, practical introduction to one of the most promising approaches - the Web Service Modeling Ontology (WSMO). After a brief review of technologies and standards of the Worldwide Web, the Semantic Web, and Web Services, the book examines WSMO from the fundamentals to applications in e-commerce, e-government and e-banking; it also describes its relation to OWL-S and WSDL-S and other applications. The book offers an up-to-date introduction, plus pointers to future applications.
This book provides a timely and first-of-its-kind collection of papers on anatomy ontologies. It is interdisciplinary in its approach, bringing together the relevant expertise from computing and biomedical studies. The book aims to provide readers with a comprehensive understanding of the foundations of anatomical ontologies and the-state-of-the-art in terms of existing tools and applications. It also highlights challenges that remain today.
Just like the industrial society of the last century depended on natural resources, today's society depends on information and its exchange. Staab and Stuckenschmidt structured the selected contributions into four parts: Part I, "Data Storage and Access," prepares the semantic foundation, i.e. data modelling and querying in a flexible and yet scalable manner. These foundations allow for dealing with the organization of information at the individual peers. Part II, "Querying the Network," considers the routing of queries, as well as continuous queries and personalized queries under the conditions of the permanently changing topological structure of a peer-to-peer network. Part III, "Semantic Integration," deals with the mapping of heterogeneous data representations. Finally Part IV, "Methodology and Systems," reports experiences from case studies and sample applications. The overall result is a state-of-the-art description of the potential of Semantic Web and peer-to-peer technologies for information sharing and knowledge management when applied jointly.
Manage and Automate Data Analysis with Pandas in Python Today, analysts must manage data characterized by extraordinary variety, velocity, and volume. Using the open source Pandas library, you can use Python to rapidly automate and perform virtually any data analysis task, no matter how large or complex. Pandas can help you ensure the veracity of your data, visualize it for effective decision-making, and reliably reproduce analyses across multiple data sets. Pandas for Everyone, 2nd Edition, brings together practical knowledge and insight for solving real problems with Pandas, even if you're new to Python data analysis. Daniel Y. Chen introduces key concepts through simple but practical examples, incrementally building on them to solve more difficult, real-world data science problems such as using regularization to prevent data overfitting, or when to use unsupervised machine learning methods to find the underlying structure in a data set. New features to the second edition include: Extended coverage of plotting and the seaborn data visualization library Expanded examples and resources Updated Python 3.9 code and packages coverage, including statsmodels and scikit-learn libraries Online bonus material on geopandas, Dask, and creating interactive graphics with Altair Chen gives you a jumpstart on using Pandas with a realistic data set and covers combining data sets, handling missing data, and structuring data sets for easier analysis and visualization. He demonstrates powerful data cleaning techniques, from basic string manipulation to applying functions simultaneously across dataframes. Once your data is ready, Chen guides you through fitting models for prediction, clustering, inference, and exploration. He provides tips on performance and scalability and introduces you to the wider Python data analysis ecosystem. Work with DataFrames and Series, and import or export data Create plots with matplotlib, seaborn, and pandas Combine data sets and handle missing data Reshape, tidy, and clean data sets so they're easier to work with Convert data types and manipulate text strings Apply functions to scale data manipulations Aggregate, transform, and filter large data sets with groupby Leverage Pandas' advanced date and time capabilities Fit linear models using statsmodels and scikit-learn libraries Use generalized linear modeling to fit models with different response variables Compare multiple models to select the "best" one Regularize to overcome overfitting and improve performance Use clustering in unsupervised machine learning
Graph Databases in Action teaches readers everything they need to know to begin building and running applications powered by graph databases. Right off the bat, seasoned graph database experts introduce readers to just enough graph theory, the graph database ecosystem, and a variety of datastores. They also explore modelling basics in action with real-world examples, then go hands-on with querying, coding traversals, parsing results, and other essential tasks as readers build their own graph-backed social network app complete with a recommendation engine! Key Features * Graph database fundamentals * An overview of the graph database ecosystem * Relational vs. graph database modelling * Querying graphs using Gremlin * Real-world common graph use cases For readers with basic Java and application development skills building in RDBMS systems such as Oracle, SQL Server, MySQL, and Postgres. No experience with graph databases is required. About the technology Graph databases store interconnected data in a more natural form, making them superior tools for representing data with rich relationships. Unlike in relational database management systems (RDBMS), where a more rigid view of data connections results in the loss of valuable insights, in graph databases, data connections are first priority. Dave Bechberger has extensive experience using graph databases as a product architect and a consultant. He's spent his career leveraging cutting-edge technologies to build software in complex data domains such as bioinformatics, oil and gas, and supply chain management. He's an active member of the graph community and has presented on a wide variety of graph-related topics at national and international conferences. Josh Perryman is technologist with over two decades of diverse experience building and maintaining complex systems, including high performance computing (HPC) environments. Since 2014 he has focused on graph databases, especially in distributed or big data environments, and he regularly blogs and speaks at conferences about graph databases. |
You may like...
Innovations in XML Applications and…
Jose Carlos Ramalho, Alberto Simoes, …
Hardcover
R4,902
Discovery Miles 49 020
E-Discovery Tools and Applications in…
Egbert de Smet, Sangeeta Dhamdhere
Hardcover
R4,969
Discovery Miles 49 690
Intro to Python for Computer Science and…
Paul Deitel
Paperback
Emerging Perspectives in Big Data…
David Taniar, Wenny Rahayu
Hardcover
R6,050
Discovery Miles 60 500
Big Data Management, Technologies, and…
Wen-Chen Hu, Naima Kaabouch
Hardcover
R4,548
Discovery Miles 45 480
|