Welcome to Loot.co.za!
Sign in / Register |Wishlists & Gift Vouchers |Help | Advanced search
|
Your cart is empty |
|||
Books > Computing & IT > Applications of computing > Databases > Data warehousing
Hive makes life much easier for developers who work with stored and managed data in Hadoop clusters, such as data warehouses. With this example-driven guide, you'll learn how to use the Hive infrastructure to provide data summarization, query, and analysis - particularly with HiveQL, the query language dialect of SQL. You'll learn how to set up Hive in your environment and optimize its use, and how it interoperates with other tools, such as HBase. You'll also learn how to extend Hive with custom code written in Java or scripting languages. Ideal for developers with prior SQL experience, this book shows you how Hive simplifies many tasks that would be much harder to implement in the lower-level MapReduce API provided by Hadoop.
This SpringerBrief reviews the knowledge engineering problem of engineering objectivity in top-k query answering; essentially, answers must be computed taking into account the user's preferences and a collection of (subjective) reports provided by other users. Most assume each report can be seen as a set of scores for a list of features, its author's preferences among the features, as well as other information is discussed in this brief. These pieces of information for every report are then combined, along with the querying user's preferences and their trust in each report, to rank the query results. Everyday examples of this setup are the online reviews that can be found in sites like Amazon, Trip Advisor, and Yelp, among many others. Throughout this knowledge engineering effort the authors adopt the Datalog+/- family of ontology languages as the underlying knowledge representation and reasoning formalism, and investigate several alternative ways in which rankings can b e derived, along with algorithms for top-k (atomic) query answering under these rankings. This SpringerBrief also investigate assumptions under which our algorithms run in polynomial time in the data complexity. Since this SpringerBrief contains a gentle introduction to the main building blocks (OBDA, Datalog+/-, and reasoning with preferences), it should be of value to students, researchers, and practitioners who are interested in the general problem of incorporating user preferences into related formalisms and tools. Practitioners also interested in using Ontology-based Data Access to leverage information contained in reviews of products and services for a better customer experience will be interested in this brief and researchers working in the areas of Ontological Languages, Semantic Web, Data Provenance, and Reasoning with Preferences.
Dive into the world of SQL on Hadoop and get the most out of your Hive data warehouses. This book is your go-to resource for using Hive: authors Scott Shaw, Ankur Gupta, David Kjerrumgaard, and Andreas Francois Vermeulen take you through learning HiveQL, the SQL-like language specific to Hive, to analyze, export, and massage the data stored across your Hadoop environment. From deploying Hive on your hardware or virtual machine and setting up its initial configuration to learning how Hive interacts with Hadoop, MapReduce, Tez and other big data technologies, Practical Hive gives you a detailed treatment of the software. In addition, this book discusses the value of open source software, Hive performance tuning, and how to leverage semi-structured and unstructured data. What You Will Learn Install and configure Hive for new and existing datasets Perform DDL operations Execute efficient DML operations Use tables, partitions, buckets, and user-defined functions Discover performance tuning tips and Hive best practices Who This Book Is For Developers, companies, and professionals who deal with large amounts of data and could use software that can efficiently manage large volumes of input. It is assumed that readers have the ability to work with SQL.
With this textbook, Vaisman and Zimanyi deliver excellent coverage of data warehousing and business intelligence technologies ranging from the most basic principles to recent findings and applications. To this end, their work is structured into three parts. Part I describes "Fundamental Concepts" including multi-dimensional models; conceptual and logical data warehouse design and MDX and SQL/OLAP. Subsequently, Part II details "Implementation and Deployment," which includes physical data warehouse design; data extraction, transformation, and loading (ETL) and data analytics. Lastly, Part III covers "Advanced Topics" such as spatial data warehouses; trajectory data warehouses; semantic technologies in data warehouses and novel technologies like Map Reduce, column-store databases and in-memory databases. As a key characteristic of the book, most of the topics are presented and illustrated using application tools. Specifically, a case study based on the well-known Northwind database illustrates how the concepts presented in the book can be implemented using Microsoft Analysis Services and Pentaho Business Analytics. All chapters are summarized using review questions and exercises to support comprehensive student learning. Supplemental material to assist instructors using this book as a course text is available at http://cs.ulb.ac.be/DWSDIbook/, including electronic versions of the figures, solutions to all exercises, and a set of slides accompanying each chapter. Overall, students, practitioners and researchers alike will find this book the most comprehensive reference work on data warehouses, with key topics described in a clear and educational style.
Dirty data is a problem that costs businesses thousands, if not millions, every year. In organisations large and small across the globe you will hear talk of data quality issues. What you will rarely hear about is the consequences or how to fix it. Between the Spreadsheets: Classifying and Fixing Dirty Data draws on classification expert Susan Walsh's decade of experience in data classification to present a fool-proof method for cleaning and classifying your data. The book covers everything from the very basics of data classification to normalisation and taxonomies, and presents the author's proven COAT methodology, helping ensure an organisation's data is Consistent, Organised, Accurate and Trustworthy. A series of data horror stories outlines what can go wrong in managing data, and if it does, how it can be fixed. After reading this book, regardless of your level of experience, not only will you be able to work with your data more efficiently, but you will also understand the impact the work you do with it has, and how it affects the rest of the organisation. Written in an engaging and highly practical manner, Between the Spreadsheets gives readers of all levels a deep understanding of the dangers of dirty data and the confidence and skills to work more efficiently and effectively with it.
This book presents Hyper-lattice, a new algebraic model for partially ordered sets, and an alternative to lattice. The authors analyze some of the shortcomings of conventional lattice structure and propose a novel algebraic structure in the form of Hyper-lattice to overcome problems with lattice. They establish how Hyper-lattice supports dynamic insertion of elements in a partial order set with a partial hierarchy between the set members. The authors present the characteristics and the different properties, showing how propositions and lemmas formalize Hyper-lattice as a new algebraic structure.
This book constitutes the refereed proceedings of the 6th International Conference on E-Technologies, MCETECH 2015, held in Montreal, Canada, in May 2015. The 18 papers presented in this volume were carefully reviewed and selected from 42 submissions. They have been organized in topical sections on process adaptation; legal issues; social computing; eHealth; and eBusiness, eEducation and eLogistics.
There is growing recognition of the need to address the fragility of digital information, on which our society heavily depends for smooth operation in all aspects of daily life. This has been discussed in many books and articles on digital preservation, so why is there a need for yet one more? Because, for the most part, those other publications focus on documents, images and webpages - objects that are normally rendered to be simply displayed by software to a human viewer. Yet there are clearly many more types of digital objects that may need to be preserved, such as databases, scientific data and software itself. David Giaretta, Director of the Alliance for Permanent Access, and his contributors explain why the tools and techniques used for preserving rendered objects are inadequate for all these other types of digital objects, and they provide the concepts, techniques and tools that are needed. The book is structured in three parts. The first part is on theory, i.e., the concepts and techniques that are essential for preserving digitally encoded information. The second part then shows practice, i.e., the use and validation of these tools and techniques. Finally, the third part concludes by addressing how to judge whether money is being well spent, in terms of effectiveness and cost sharing. Various examples of digital objects from many sources are used to explain the tools and techniques presented. The presentation style mainly aims at practitioners in libraries, archives and industry who are either directly responsible for preservation or who need to prepare for audits of their archives. Researchers in digital preservation and developers of preservation tools and techniques will also find valuable practical information here. Researchers creating digitally encoded information of all kinds will also need to be aware of these topics so that they can help to ensure that their data is usable and can be valued by others now and in the future. To further assist the reader, the book is supported by many hours of videos and presentations from the CASPAR project and by a set of open source software.
"Information Management: Gaining a Competitive Advantage with Data" is about making smart decisions to make the most of company information. Expert author William McKnight develops the value proposition for information in the enterprise and succinctly outlines the numerous forms of data storage. "Information Management" will enlighten you, challenge your preconceived notions, and help activate information in the enterprise. Get the big picture on managing data so that your team can make smart decisions by understanding how everything from workload allocation to data stores fits together. The practical, hands-on guidance in this book includes: Part 1: The importance of information management and analytics to business, and how data warehouses are used Part 2: The technologies and data that advance an organization, and extend data warehouses and related functionality Part 3: Big Data and NoSQL, and how technologies like Hadoop enable management of new forms of data Part 4: Pulls it all together, while addressing topics of agile development, modern business intelligence, and organizational change management Read the book cover-to-cover, or keep it within reach for a
quick and useful resource. Either way, this book will enable you to
master all of the possibilities for data or the broadest view
across the enterprise.
Publisher's Note: Products purchased from Third Party sellers are not guaranteed by the publisher for quality, authenticity, or access to any online entitlements included with the product. The definitive guide to dimensional design for your data warehouseLearn the best practices of dimensional design. Star Schema: The Complete Reference offers in-depth coverage of design principles and their underlying rationales. Organized around design concepts and illustrated with detailed examples, this is a step-by-step guidebook for beginners and a comprehensive resource for experts. This all-inclusive volume begins with dimensional design fundamentals and shows how they fit into diverse data warehouse architectures, including those of W.H. Inmon and Ralph Kimball. The book progresses through a series of advanced techniques that help you address real-world complexity, maximize performance, and adapt to the requirements of BI and ETL software products. You are furnished with design tasks and deliverables that can be incorporated into any project, regardless of architecture or methodology. Master the fundamentals of star schema design and slow change processing Identify situations that call for multiple stars or cubes Ensure compatibility across subject areas as your data warehouse grows Accommodate repeating attributes, recursive hierarchies, and poor data quality Support conflicting requirements for historic data Handle variation within a business process and correlation of disparate activities Boost performance using derived schemas and aggregates Learn when it's appropriate to adjust designs for BI and ETL tools
Multi-Modal User Interactions in Controlled Environments investigates the capture and analysis of user's multimodal behavior (mainly eye gaze, eye fixation, eye blink and body movements) within a real controlled environment (controlled-supermarket, personal environment) in order to adapt the response of the computer/environment to the user. Such data is captured using non-intrusive sensors (for example, cameras in the stands of a supermarket) installed in the environment. This multi-modal video based behavioral data will be analyzed to infer user intentions while assisting users in their day-to-day tasks by adapting the system's response to their requirements seamlessly. This book also focuses on the presentation of information to the user. Multi-Modal User Interactions in Controlled Environments is designed for professionals in industry, including professionals in the domains of security and interactive web television. This book is also suitable for graduate-level students in computer science and electrical engineering.
The two-volume set LNCS 7382 and 7383 constiutes the refereed proceedings of the 13th International Conference on Computers Helping People with Special Needs, ICCHP 2012, held in Linz, Austria, in July 2012. The 147 revised full papers and 42 short papers were carefully reviewed and selected from 364 submissions. The papers included in the first volume are organized in the following topical sections: universal learning design; putting the disabled student in charge: user focused technology in education; access to mathematics and science; policy and service provision; creative design for inclusion, virtual user models for designing and using inclusive products; web accessibility in advanced technologies, website accessibility metrics; entertainment software accessibility; document and media accessibility; inclusion by accessible social media; a new era for document accessibility: understanding, managing and implementing the ISO standard PDF/UA; and human-computer interaction and usability for elderly.
The two-volume set LNCS 7382 and 7383 constiutes the refereed proceedings of the 13th International Conference on Computers Helping People with Special Needs, ICCHP 2012, held in Linz, Austria, in July 2012. The 147 revised full papers and 42 short papers were carefully reviewed and selected from 364 submissions. The papers included in the second volume are organized in the following topical sections: portable and mobile systems in assistive technology; assistive technology, HCI and rehabilitation; sign 2.0: ICT for sign language users: information sharing, interoperability, user-centered design and collaboration; computer-assisted augmentative and alternative communication; easy to Web between science of education, information design and speech technology; smart and assistive environments: ambient assisted living; text entry for accessible computing; tactile graphics and models for blind people and recognition of shapes by touch; mobility for blind and partially sighted people; and human-computer interaction for blind and partially sighted people.
The new edition of the classic bestseller that launched the data warehousing industry covers new approaches and technologies, many of which have been pioneered by Inmon himself. In addition to explaining the fundamentals of data warehouse systems, the book covers new topics such as methods for handling unstructured data in a data warehouse and storing data across multiple storage media, and discusses the pros and cons of relational versus multidimensional design and how to measure return on investment in planning data warehouse projects. It covers advanced topics, including data monitoring and testing. Although the book includes an extra 100 pages worth of valuable content, the price has actually been reduced from $65 to $55.
Details recent research in areas such as ontology design for information integration, metadata generation and management, and representation and management of distributed ontologies. Provides decision support on the use of novel technologies, information about potential problems, and guidelines for the successful application of existing technologies.
The Semantic Web is a vision - the idea of having data on the Web defined and linked in such a way that it can be used by machines not just for display purposes but for automation, integration and reuse of data across various applications. However, there is a widespread misconception that the Semantic Web is a rehash of existing AI and database work. Kashyap, Bussler, and Moran dispel this notion by presenting the multi-disciplinary technological underpinnings such as machine learning, information retrieval, service-oriented architectures, and grid computing. Thus they combine the informational and computational aspects needed to realize the full potential of the Semantic Web vision.
Service-oriented computing is an emerging factor in IT research and development. Organizations like W3C and the EU have begun research projects to develop industrial-strength applications. This book offers a thorough, practical introduction to one of the most promising approaches - the Web Service Modeling Ontology (WSMO). After a brief review of technologies and standards of the Worldwide Web, the Semantic Web, and Web Services, the book examines WSMO from the fundamentals to applications in e-commerce, e-government and e-banking; it also describes its relation to OWL-S and WSDL-S and other applications. The book offers an up-to-date introduction, plus pointers to future applications.
This book provides a timely and first-of-its-kind collection of papers on anatomy ontologies. It is interdisciplinary in its approach, bringing together the relevant expertise from computing and biomedical studies. The book aims to provide readers with a comprehensive understanding of the foundations of anatomical ontologies and the-state-of-the-art in terms of existing tools and applications. It also highlights challenges that remain today.
Questions of privacy, borders, and nationhood are increasingly shaping the way we think about all things digital. Data Centers brings together essays and photographic documentation that analyze recent and ongoing developments. Taking Switzerland as an example, the book takes a look at the country's data centers, law firms, corporations, and government institutions that are involved in the creation, maintenance, and regulation of digital infrastructures. Beneath the official storyline- Switzerland's moderate climate, political stability, and relatively clean energy mix-the book uncovers a much more varied and sometimes contradictory set of narratives.
Just like the industrial society of the last century depended on natural resources, today's society depends on information and its exchange. Staab and Stuckenschmidt structured the selected contributions into four parts: Part I, "Data Storage and Access," prepares the semantic foundation, i.e. data modelling and querying in a flexible and yet scalable manner. These foundations allow for dealing with the organization of information at the individual peers. Part II, "Querying the Network," considers the routing of queries, as well as continuous queries and personalized queries under the conditions of the permanently changing topological structure of a peer-to-peer network. Part III, "Semantic Integration," deals with the mapping of heterogeneous data representations. Finally Part IV, "Methodology and Systems," reports experiences from case studies and sample applications. The overall result is a state-of-the-art description of the potential of Semantic Web and peer-to-peer technologies for information sharing and knowledge management when applied jointly.
Graph Databases in Action teaches readers everything they need to know to begin building and running applications powered by graph databases. Right off the bat, seasoned graph database experts introduce readers to just enough graph theory, the graph database ecosystem, and a variety of datastores. They also explore modelling basics in action with real-world examples, then go hands-on with querying, coding traversals, parsing results, and other essential tasks as readers build their own graph-backed social network app complete with a recommendation engine! Key Features * Graph database fundamentals * An overview of the graph database ecosystem * Relational vs. graph database modelling * Querying graphs using Gremlin * Real-world common graph use cases For readers with basic Java and application development skills building in RDBMS systems such as Oracle, SQL Server, MySQL, and Postgres. No experience with graph databases is required. About the technology Graph databases store interconnected data in a more natural form, making them superior tools for representing data with rich relationships. Unlike in relational database management systems (RDBMS), where a more rigid view of data connections results in the loss of valuable insights, in graph databases, data connections are first priority. Dave Bechberger has extensive experience using graph databases as a product architect and a consultant. He's spent his career leveraging cutting-edge technologies to build software in complex data domains such as bioinformatics, oil and gas, and supply chain management. He's an active member of the graph community and has presented on a wide variety of graph-related topics at national and international conferences. Josh Perryman is technologist with over two decades of diverse experience building and maintaining complex systems, including high performance computing (HPC) environments. Since 2014 he has focused on graph databases, especially in distributed or big data environments, and he regularly blogs and speaks at conferences about graph databases.
Get started with Azure Synapse Analytics, Microsoft's modern data analytics platform. This book covers core components such as Synapse SQL, Synapse Spark, Synapse Pipelines, and many more, along with their architecture and implementation. The book begins with an introduction to core data and analytics concepts followed by an understanding of traditional/legacy data warehouse, modern data warehouse, and the most modern data lakehouse. You will go through the introduction and background of Azure Synapse Analytics along with its main features and key service capabilities. Core architecture is discussed, along with Synapse SQL. You will learn its main features and how to create a dedicated Synapse SQL pool and analyze your big data using Serverless Synapse SQL Pool. You also will learn Synapse Spark and Synapse Pipelines, with examples. And you will learn Synapse Workspace and Synapse Studio followed by Synapse Link and its features. You will go through use cases in Azure Synapse and understand the reference architecture for Synapse Analytics. After reading this book, you will be able to work with Azure Synapse Analytics and understand its architecture, main components, features, and capabilities. What You Will Learn Understand core data and analytics concepts and data lakehouse concepts Be familiar with overall Azure Synapse architecture and its main components Be familiar with Synapse SQL and Synapse Spark architecture components Work with integrated Apache Spark (aka Synapse Spark) and Synapse SQL engines Understand Synapse Workspace, Synapse Studio, and Synapse Pipeline Study reference architecture and use cases Who This Book Is For Azure data analysts, data engineers, data scientists, and solutions architects
Managing data continues to grow as a necessity for modern organizations. There are seemingly infinite opportunities for organic growth, reduction of costs, and creation of new products and services. It has become apparent that none of these opportunities can happen smoothly without data governance. The cost of exponential data growth and privacy / security concerns are becoming burdensome. Organizations will encounter unexpected consequences in new sources of risk. The solution to these challenges is also data governance; ensuring balance between risk and opportunity. Data Governance, Second Edition, is for any executive, manager or data professional who needs to understand or implement a data governance program. It is required to ensure consistent, accurate and reliable data across their organization. This book offers an overview of why data governance is needed, how to design, initiate, and execute a program and how to keep the program sustainable. This valuable resource provides comprehensive guidance to beginning professionals, managers or analysts looking to improve their processes, and advanced students in Data Management and related courses. With the provided framework and case studies all professionals in the data governance field will gain key insights into launching successful and money-saving data governance program. |
You may like...
E-Discovery Tools and Applications in…
Egbert de Smet, Sangeeta Dhamdhere
Hardcover
R5,151
Discovery Miles 51 510
Data Deduplication for Data Optimization…
Daehee Kim, Sejun Song, …
Hardcover
Innovations in XML Applications and…
Jose Carlos Ramalho, Alberto Simoes, …
Hardcover
R5,123
Discovery Miles 51 230
Artificial Intelligence Applications and…
Ilias Maglogiannis, Lazaros Iliadis, …
Hardcover
R2,877
Discovery Miles 28 770
Intro to Python for Computer Science and…
Paul Deitel
Paperback
Data Warehouse Requirements Engineering…
Naveen Prakash, Deepika Prakash
Hardcover
R2,501
Discovery Miles 25 010
|