![]() |
![]() |
Your cart is empty |
||
Books > Computing & IT > Applications of computing > Databases > Data capture & analysis
Why a book about logs? That's easy: the humble log is an abstraction that lies at the heart of many systems, from NoSQL databases to cryptocurrencies. Even though most engineers don't think much about them, this short book shows you why logs are worthy of your attention. Based on his popular blog posts, LinkedIn principal engineer Jay Kreps shows you how logs work in distributed systems, and then delivers practical applications of these concepts in a variety of common uses - data integration, enterprise architecture, real-time stream processing, data system design, and abstract computing models. Go ahead and take the plunge with logs; you're going love them. Learn how logs are used for programmatic access in databases and distributed systems Discover solutions to the huge data integration problem when more data of more varieties meet more systems Understand why logs are at the heart of real-time stream processing Learn the role of a log in the internals of online data systems Explore how Jay Kreps applies these ideas to his own work on data infrastructure systems at LinkedIn
With this practical book, you will learn proven methods for anonymizing health data to help your organization share meaningful datasets, without exposing patient identity. Leading experts Khaled El Emam and Luk Arbuckle walk you through a risk-based methodology, using case studies from their efforts to de-identify hundreds of datasets. Clinical data is valuable for research and other types of analytics, but making it anonymous without compromising data quality is tricky. This book demonstrates techniques for handling different data types, based on the authors' experiences with a maternal-child registry, inpatient discharge abstracts, health insurance claims, electronic medical record databases, and the World Trade Center disaster registry, among others.Understand different methods for working with cross-sectional and longitudinal datasetsAssess the risk of adversaries who attempt to re-identify patients in anonymized datasetsReduce the size and complexity of massive datasets without losing key information or jeopardizing privacyUse methods to anonymize unstructured free-form text dataMinimize the risks inherent in geospatial data, without omitting critical location-based health informationLook at ways to anonymize coding information in health dataLearn the challenge of anonymously linking related datasets
Aus ihrer Entwicklung umgibt die Verwaltungssprache eine sprachliche Normierung im Hinblick einer Allgemeinverbindlichkeit gegenuber den Adressatinnen bzw. Adressaten, wobei deren historische Kodifikation sowohl in Woerterbuchern als auch in sonstigen Aufzeichnungen niedergeschrieben wurde. Dies betrifft auch die verbindliche Einhaltung der Gendergerechten Formulierungen in der oesterreichischen Verwaltungssprache: Durch Umformulieren des Satzes soll die bzw. der Handelnde eindeutig in den Prufberichten benannt werden. Diese Arbeit zeigt, inwieweit im Hinblick einer optimalen Verstandlichkeit und Lesbarkeit der Verwaltungssprache und deren Texte fur die Adressatinnen bzw. Adressaten diese Ziele mithilfe einer EDV-Unterstutzungshilfe zu erreichen sind. Zusatzliches Infomaterial ist dem Buch auf einer CD beigefugt.
Este libro forma parte del proyecto Transformacion funcional de la literatura infantil y juvenil en la sociedad multimedia. Aplicacion de un modelo teorico de critica a las adaptaciones audiovisuales en espanol de las obras infantiles inglesas y alemanas y tiene un doble objetivo: por una parte, analizar como se adaptaron obras de literatura inglesa y alemana al medio audiovisual y como los filmes ingleses y alemanes se trasvasaron al espanol peninsular y, por otra, estudiar la calidad de los libros infantiles - y de sus traducciones al espanol -, que surgen a partir de estos productos audiovisuales. El analisis de las adaptaciones audiovisuales incluye tanto criterios tecnicos como traductologicos, y el estudio de los libros derivados se lleva a cabo siguiendo criterios literarios y traductologicos, en el caso de los analisis de las traducciones de estos productos.
If your organization is about to enter the world of big data, you not only need to decide whether Apache Hadoop is the right platform to use, but also which of its many components are best suited to your task. This field guide makes the exercise manageable by breaking down the Hadoop ecosystem into short, digestible sections. You'll quickly understand how Hadoop's projects, subprojects, and related technologies work together. Each chapter introduces a different topic-such as core technologies or data transfer-and explains why certain components may or may not be useful for particular needs. When it comes to data, Hadoop is a whole new ballgame, but with this handy reference, you'll have a good grasp of the playing field. Topics include: Core technologies-Hadoop Distributed File System (HDFS), MapReduce, YARN, and Spark Database and data management-Cassandra, HBase, MongoDB, and Hive Serialization-Avro, JSON, and Parquet Management and monitoring-Puppet, Chef, Zookeeper, and Oozie Analytic helpers-Pig, Mahout, and MLLib Data transfer-Scoop, Flume, distcp, and Storm Security, access control, auditing-Sentry, Kerberos, and Knox Cloud computing and virtualization-Serengeti, Docker, and Whirr
Construct, analyze, and visualize networks with networkx, a Python language module. Network analysis is a powerful tool you can apply to a multitude of datasets and situations. Discover how to work with all kinds of networks, including social, product, temporal, spatial, and semantic networks. Convert almost any real-world data into a complex network--such as recommendations on co-using cosmetic products, muddy hedge fund connections, and online friendships. Analyze and visualize the network, and make business decisions based on your analysis. If you're a curious Python programmer, a data scientist, or a CNA specialist interested in mechanizing mundane tasks, you'll increase your productivity exponentially. Complex network analysis used to be done by hand or with non-programmable network analysis tools, but not anymore! You can now automate and program these tasks in Python. Complex networks are collections of connected items, words, concepts, or people. By exploring their structure and individual elements, we can learn about their meaning, evolution, and resilience. Starting with simple networks, convert real-life and synthetic network graphs into networkx data structures. Look at more sophisticated networks and learn more powerful machinery to handle centrality calculation, blockmodeling, and clique and community detection. Get familiar with presentation-quality network visualization tools, both programmable and interactive--such as Gephi, a CNA explorer. Adapt the patterns from the case studies to your problems. Explore big networks with NetworKit, a high-performance networkx substitute. Each part in the book gives you an overview of a class of networks, includes a practical study of networkx functions and techniques, and concludes with case studies from various fields, including social networking, anthropology, marketing, and sports analytics. Combine your CNA and Python programming skills to become a better network analyst, a more accomplished data scientist, and a more versatile programmer. What You Need: You will need a Python 3.x installation with the following additional modules: Pandas (>=0.18), NumPy (>=1.10), matplotlib (>=1.5), networkx (>=1.11), python-louvain (>=0.5), NetworKit (>=3.6), and generalizesimilarity. We recommend using the Anaconda distribution that comes with all these modules, except for python-louvain, NetworKit, and generalizedsimilarity, and works on all major modern operating systems.
How can you get your data from frontend servers to Hadoop in near real time? With this complete reference guide, you'll learn Flume's rich set of features for collecting, aggregating, and writing large amounts of streaming data to the Hadoop Distributed File System (HDFS), Apache HBase, SolrCloud, Elastic Search, and other systems. Using Flume shows operations engineers how to configure, deploy, and monitor a Flume cluster, and teaches developers how to write Flume plugins and custom components for their specific use-cases. You'll learn about Flume's design and implementation, as well as various features that make it highly scalable, flexible, and reliable. Code examples and exercises are available on GitHub. Learn how Flume provides a steady rate of flow by acting as a buffer between data producers and consumers Dive into key Flume components, including sources that accept data and sinks that write and deliver it Write custom plugins to customize the way Flume receives, modifies, formats, and writes data Explore APIs for sending data to Flume agents from your own applications Plan and deploy Flume in a scalable and flexible way - and monitor your cluster once it's running
This book collects research works of data-driven medical diagnosis done via Artificial Intelligence based solutions, such as Machine Learning, Deep Learning and Intelligent Optimization. Physical devices powered with Artificial Intelligence are gaining importance in diagnosis and healthcare. Medical data from different sources can also be analyzed via Artificial Intelligence techniques for more effective results.
This digital electronics text focuses on "how to" design, build, operate and adapt data acquisition systems. The material begins with basic logic gates and ends with a 40 KHz voltage measurer. The approach aims to cover a minimal number of topics in detail. The data acquisition circuits described communicate with a host computer through parallel I/O ports. The fundamental idea of the book is that parallel I/O ports (available for all popular computers) offer a superior balance of simplicity, low cost, speed, flexibility and adaptability. All circuits and software are thoroughly tested. Construction details and troubleshooting guidelines are included. This book is intended to serve people who teach or study one of the following: digital electronics, circuit design, software that interacts outside hardware, the process of computer based acquisition, and the design, adaptation, construction and testing of measurement systems.
Einleitung.- I. Web 2.0 = Kommunikation 2.0?.- Web 2.0 - Einfluss auf Kommunikation und Marke.- Mehr gewinnen als verlieren.- Einfluss auf Mitarbeiter und einzelne Geschaftsbereiche.- Der Faktor Vertrauen.- Kommunikationsqualitat in sozialen Medien als Unternehmenskultuelement.- Chancen und Risiken im Unternehmenseinsatz.- Bildkommunikation - Visualisierung von Brands.- II. Social Media in der Praxis.- "In drei Jahren weg von der E-Mail" - Wie kunftig in Unternehmen kommuniziert wird.- Social Security - Gefahren auf Facebook & Co.- Social Media @ IT Governance.- Analyst Relations im Web 2.0-Zeitalter.- Social Media: "... we can't rewind we've gone too far...".- Im Web 2.0-Zeitalter mussen Unternehmn selbst zu Medien werden.- Social Media Measurement.- Appendix 1: Branchenbeispiele.- Abhangigkeit von Branchen.- IT-Services / System-Integration.- Social Media in pharmazeutischen Unternehmen.- Appendix 2: Internaitionale Entwicklungen.- Im Social Web der Mitte.- Die Autoren.- Index.
Modern vehicles have electronic control units (ECUs) to control various subsystems such as the engine, brakes, steering, air conditioning, and infotainment. These ECUs (or simply 'controllers') are networked together to share information, and output directly measured and calculated data to each other. This in-vehicle network is a data goldmine for improved maintenance, measuring vehicle performance and its subsystems, fleet management, warranty and legal issues, reliability, durability, and accident reconstruction. The focus of Data Acquisition from HD Vehicles Using J1939 CAN Bus is to guide the reader on how to acquire and correctly interpret data from the in-vehicle network of heavy-duty (HD) vehicles. The reader will learn how to convert messages to scaled engineering parameters, and how to determine the available parameters on HD vehicles, along with their accuracy and update rate. Written by two specialists in this field, Richard (Rick) P. Walter and Eric P. Walter, principals at HEM Data, located in the United States, the book provides a unique road map for the data acquisition user. The authors give a clear and concise description of the CAN protocol plus a review of all 19 parts of the SAE International J1939 standard family. Pertinent standards are illuminated with tables, graphs and examples. Practical applications covered are calculating fuel economy, duty cycle analysis, and capturing intermittent faults. A comparison is made of various diagnostic approaches including OBD-II, HD-OBD and World Wide Harmonized (WWH) OBD. Data Acquisition from HD Vehicles Using J1939 CAN Bus is a must-have reference for those interested to acquire data effectively from the SAE J1939 equipped vehicles.
Most biologists use nonlinear regression more than any other statistical technique, but there are very few places to learn about curve-fitting. This book, by the author of the very successful Intuitive Biostatistics, addresses this relatively focused need of an extraordinarily broad range of scientists.
All social and policy researchers need to synthesize data into a visual representation. Producing good visualizations combines creativity and technique. This book teaches the techniques and basics to produce a variety of visualizations, allowing readers to communicate data and analyses in a creative and effective way. Visuals for tables, time series, maps, text, and networks are carefully explained and organized, showing how to choose the right plot for the type of data being analysed and displayed. Examples are drawn from public policy, public safety, education, political tweets, and public health. The presentation proceeds step by step, starting from the basics, in the programming languages R and Python so that readers learn the coding skills while simultaneously becoming familiar with the advantages and disadvantages of each visualization. No prior knowledge of either Python or R is required. Code for all the visualizations are available from the book's website.
Every day, more and more kinds of historical data become available, opening exciting new avenues of inquiry but also new challenges. This updated and expanded book describes and demonstrates the ways these data can be explored to construct cultural heritage knowledge, for research and in teaching and learning. It helps humanities scholars to grasp Big Data in order to do their work, whether that means understanding the underlying algorithms at work in search engines or designing and using their own tools to process large amounts of information.Demonstrating what digital tools have to offer and also what 'digital' does to how we understand the past, the authors introduce the many different tools and developing approaches in Big Data for historical and humanistic scholarship, show how to use them, what to be wary of, and discuss the kinds of questions and new perspectives this new macroscopic perspective opens up. Originally authored 'live' online with ongoing feedback from the wider digital history community, Exploring Big Historical Data breaks new ground and sets the direction for the conversation into the future.Exploring Big Historical Data should be the go-to resource for undergraduate and graduate students confronted by a vast corpus of data, and researchers encountering these methods for the first time. It will also offer a helping hand to the interested individual seeking to make sense of genealogical data or digitized newspapers, and even the local historical society who are trying to see the value in digitizing their holdings.
This book is a digital electronics text focused on 'how to' design, build, operate and adapt data acquisition systems. The book is intended to serve people whose goals include teaching or learning one or more of the following: digital electronics, circuit design for computer expansion slots, software which interacts with outside hardware, the process of computer based data acquisition, and the design, adaptation, construction and testing of measurement systems. The fundamental idea of the book is that parallel I/O ports (available for all popular computers) offer a superior balance of simplicity, low cost, speed, flexibility and adaptability.
To date, statistics has tended to be neatly divided into two theoretical approaches or frameworks: frequentist (or classical) and Bayesian. Scientists typically choose the statistical framework to analyse their data depending on the nature and complexity of the problem, and based on their personal views and prior training on probability and uncertainty. Although textbooks and courses should reflect and anticipate this dual reality, they rarely do so. This accessible textbook explains, discusses, and applies both the frequentist and Bayesian theoretical frameworks to fit the different types of statistical models that allow an analysis of the types of data most commonly gathered by life scientists. It presents the material in an informal, approachable, and progressive manner suitable for readers with only a basic knowledge of calculus and statistics. Statistical Modeling with R is aimed at senior undergraduate and graduate students, professional researchers, and practitioners throughout the life sciences, seeking to strengthen their understanding of quantitative methods and to apply them successfully to real world scenarios, whether in the fields of ecology, evolution, environmental studies, or computational biology.
Der Begriff "Open-Source-Software" wurde vor knapp zehn Jahren gepragt und hat inzwischen eine feste Position im IT-Markt erobert. Heute wird Open-Source-Software von praktisch allen im IT-Markt vertretenen Unternehmen entwickelt. In dem vorliegenden Buch werden folgende Fragen erortert: Wie wirkt sich die zunehmende Verbreitung und Nutzung von Open-Source-Software und offenen Standards auf Wettbewerb aus? Konnen wirtschaftspolitische Massnahmen die Vorteile eines offenen Wettbewerbs und die Nutzung neuer Innovationspotenziale unterstutzen? Es wird ein Beitrag geleistet zum okonomischen Verstandnis der Konzepte von Open-Source-Software und offenen Standards. Theoretische Analyse und wirtschaftspolitische Praxis werden miteinander verbunden. Anhand eines internationalen Vergleichs werden verschiedene nationale Open-Source-Strategien analysiert, bewertet und weiterer Handlungsbedarf angezeigt mit Vorschlagen, wie Wettbewerb auf Softwaremarkten offener gestaltet werden kann."
Data is fundamentally changing the nature of businesses and organisations and the mechanisms for delivering products and services. This book is a practical guide to developing strategy and policy for data governance, in line with the developing ISO 38505 governance of data standards. It will assist an organisation wanting to become more of a data driven business by explaining how to assess the value, risks and constraints associated with collecting, using and distributing data.
An accessible primer on how to create effective graphics from data This book provides students and researchers a hands-on introduction to the principles and practice of data visualization. It explains what makes some graphs succeed while others fail, how to make high-quality figures from data using powerful and reproducible methods, and how to think about data visualization in an honest and effective way. Data Visualization builds the reader's expertise in ggplot2, a versatile visualization library for the R programming language. Through a series of worked examples, this accessible primer then demonstrates how to create plots piece by piece, beginning with summaries of single variables and moving on to more complex graphics. Topics include plotting continuous and categorical variables; layering information on graphics; producing effective "small multiple" plots; grouping, summarizing, and transforming data for plotting; creating maps; working with the output of statistical models; and refining plots to make them more comprehensible. Effective graphics are essential to communicating ideas and a great way to better understand data. This book provides the practical skills students and practitioners need to visualize quantitative data and get the most out of their research findings. Provides hands-on instruction using R and ggplot2 Shows how the "tidyverse" of data analysis tools makes working with R easier and more consistent Includes a library of data sets, code, and functions
Das vorliegende Buch bietet erstmalig eine fundierte Gesamtdarstellung uber den Aufbau, den Betrieb und die Funktionsweise von SAP auf Linux. Dabei werden die grundlegenden Konzepte, die Systemarchitektur und deren Implementierung von SAP auf Linux von langjahrigen Kennern der Materie praxisnah, fundiert und technisch detailliert dargestellt. Dem erfahrenen Linux-Fachmann wird damit das notige Rustzeug an die Hand gegeben, um einen soliden Einstieg in die SAP-Welt zu finden, und der SAP-Administrator findet zuverlassige Informationen, um ein SAP-System auf Linux sicher installieren und optimal betreiben zu konnen. "
There has been intense excitement in recent years around activities labeled "data science," "big data," and "analytics." However, the lack of clarity around these terms and, particularly, around the skill sets and capabilities of their practitioners has led to inefficient communication between "data scientists" and the organizations requiring their services. This lack of clarity has frequently led to missed opportunities. To address this issue, we surveyed several hundred practitioners via the Web to explore the varieties of skills, experiences, and viewpoints in the emerging data science community. We used dimensionality reduction techniques to divide potential data scientists into five categories based on their self-ranked skill sets (Statistics, Math/Operations Research, Business, Programming, and Machine Learning/Big Data), and four categories based on their self-identification (Data Researchers, Data Businesspeople, Data Engineers, and Data Creatives). Further examining the respondents based on their division into these categories provided additional insights into the types of professional activities, educational background, and even scale of data used by different types of Data Scientists. In this report, we combine our results with insights and data from others to provide a better understanding of the diversity of practitioners, and to argue for the value of clearer communication around roles, teams, and careers.
With exponentially increasing amounts of data accumulating in real-time, there is no reason why one should not turn data into a competitive advantage. While machine learning, driven by advancements in artificial intelligence, has made great strides, it has not been able to surpass a number of challenges that still prevail in the way of better success. Such limitations as the lack of better methods, deeper understanding of problems, and advanced tools are hindering progress. Challenges and Applications of Data Analytics in Social Perspectives provides innovative insights into the prevailing challenges in data analytics and its application on social media and focuses on various machine learning and deep learning techniques in improving practice and research. The content within this publication examines topics that include collaborative filtering, data visualization, and edge computing. It provides research ideal for data scientists, data analysts, IT specialists, website designers, e-commerce professionals, government officials, software engineers, social media analysts, industry professionals, academicians, researchers, and students.
In a world in which we are constantly surrounded by data, figures, and statistics, it is imperative to understand and to be able to use quantitative methods. Statistical models and methods are among the most important tools in economic analysis, decision-making and business planning. This textbook, "Exploratory Data Analysis in Business and Economics", aims to familiarise students of economics and business as well as practitioners in firms with the basic principles, techniques, and applications of descriptive statistics and data analysis. Drawing on practical examples from business settings, it demonstrates the basic descriptive methods of univariate and bivariate analysis. The textbook covers a range of subject matter, from data collection and scaling to the presentation and univariate analysis of quantitative data, and also includes analytic procedures for assessing bivariate relationships. It does not confine itself to presenting descriptive statistics, but also addresses the use of computer programmes such as Excel, SPSS, and STATA, thus treating all of the topics typically covered in a university course on descriptive statistics. The German edition of this textbook is one of the "bestsellers" on the German market for literature in statistics.
Quantitative Intelligence Analysis describes the model-based method of intelligence analysis that represents the analyst's mental models of a subject, as well as the analyst's reasoning process exposing what the analyst believes about the subject, and how they arrived at those beliefs and converged on analytic judgments. It includes: *Specific methods of explicitly representing the analyst's mental models as computational models; *dynamic simulations and interactive analytic games; *the structure of an analyst's mental model and the theoretical basis for capturing and representing the tacit knowledge of these models explicitly as computational models detailed description of the use of these models in rigorous, structured analysis of difficult targets; *model illustrations and simulation descriptions; *the role of models in support of collection and operations; *case studies that illustrate a wide range of intelligence problems; *And a recommended curriculum for technical analysts. |
![]() ![]() You may like...
Managing and Processing Big Data in…
Rajkumar Kannan, Raihan Ur Rasool, …
Hardcover
R5,521
Discovery Miles 55 210
Big Data Analytics for Sustainable…
Anandakumar Haldorai, Arulmurugan Ramu
Hardcover
R7,392
Discovery Miles 73 920
Machine Learning and Data Analytics for…
Manikant Roy, Lovi Raj Gupta
Hardcover
R11,772
Discovery Miles 117 720
Data Analytics for Social Microblogging…
Soumi Dutta, Asit Kumar Das, …
Paperback
R3,454
Discovery Miles 34 540
Cloud-Based Big Data Analytics in…
Ram Shringar Rao, Nanhay Singh, …
Hardcover
R7,384
Discovery Miles 73 840
|