![]() |
![]() |
Your cart is empty |
||
Books > Computing & IT > Applications of computing > Databases > Data capture & analysis
Even as big data is turning the world upside down, the next phase of the revolution is already taking shape: real-time data analysis. This hands-on guide introduces you to Storm, a distributed, JVM-based system for processing streaming data. Through simple tutorials, sample Java code, and a complete real-world scenario, you'll learn how to build fast, fault-tolerant solutions that process results as soon as the data arrives. Discover how easy it is to set up Storm clusters for solving various problems, including continuous data computation, distributed remote procedure calls, and data stream processing. Learn how to program Storm components: "spouts" for data input and "bolts" for data transformation Discover how data is exchanged between spouts and bolts in a Storm "topology" Make spouts fault-tolerant with several commonly used design strategies Explore bolts--their life cycle, strategies for design, and ways to implement them Scale your solution by defining each component's level of parallelism Study a real-time web analytics system built with Node.js, a Redis server, and a Storm topology Write spouts and bolts with non-JVM languages such as Python, Ruby, and Javascript
This book constitutes revised selected papers from the third ECML PKDD Workshop on Data Analytics for Renewable Energy Integration, DARE 2015, held in Porto, Portugal, in September 2015. The 10 papers presented in this volume were carefully reviewed and selected for inclusion in this book.
Master the skills necessary to hire and manage a team of highly skilled individuals to design, build, and implement applications and systems based on advanced analytics and AI Key Features Learn to create an operationally effective advanced analytics team in a corporate environment Select and undertake projects that have a high probability of success and deliver the improved top and bottom-line results Understand how to create relationships with executives, senior managers, peers, and subject matter experts that lead to team collaboration, increased funding, and long-term success for you and your team Book DescriptionIn Building Analytics Teams, John K. Thompson, with his 30+ years of experience and expertise, illustrates the fundamental concepts of building and managing a high-performance analytics team, including what to do, who to hire, projects to undertake, and what to avoid in the journey of building an analytically sound team. The core processes in creating an effective analytics team and the importance of the business decision-making life cycle are explored to help achieve initial and sustainable success. The book demonstrates the various traits of a successful and high-performing analytics team and then delineates the path to achieve this with insights on the mindset, advanced analytics models, and predictions based on data analytics. It also emphasizes the significance of the macro and micro processes required to evolve in response to rapidly changing business needs. The book dives into the methods and practices of managing, developing, and leading an analytics team. Once you've brought the team up to speed, the book explains how to govern executive expectations and select winning projects. By the end of this book, you will have acquired the knowledge to create an effective business analytics team and develop a production environment that delivers ongoing operational improvements for your organization. What you will learn Avoid organizational and technological pitfalls of moving from a defined project to a production environment Enable team members to focus on higher-value work and tasks Build Advanced Analytics and Artificial Intelligence (AA&AI) functions in an organization Outsource certain projects to competent and capable third parties Support the operational areas that intend to invest in business intelligence, descriptive statistics, and small-scale predictive analytics Analyze the operational area, the processes, the data, and the organizational resistance Who this book is forThis book is for senior executives, senior and junior managers, and those who are working as part of a team that is accountable for designing, building, delivering and ensuring business success through advanced analytics and artificial intelligence systems and applications. At least 5 to 10 years of experience in driving your organization to a higher level of efficiency will be helpful.
This book constitutes the proceedings of the Workshops held at the International Conference on Social Informatics, SocInfo 2014, which took place in Barcelona, Spain, in November 2014. This year SocInfo 2014 included nine satellite workshops: the City Labs Workshop, the Workshop on Criminal Network Analysis and Mining, CRIMENET, the Workshop on Interaction and Exchange in Social Media, DYAD, the Workshop on Exploration of Games and Gamers, EGG, the Workshop on HistoInformatics, the Workshop on Socio-Economic Dynamics, Networks and Agent-based Models, SEDNAM, the Workshop on Social Influence, SI, the Workshop on Social Scientists Working with Start-Ups and the Workshop on Social Media in Crowdsourcing and Human Computation, SoHuman.
The two-volume set LNCS 9014 and LNCS 9015 constitutes the refereed proceedings of the 12th International Conference on Theory of Cryptography, TCC 2015, held in Warsaw, Poland in March 2015. The 52 revised full papers presented were carefully reviewed and selected from 137 submissions. The papers are organized in topical sections on foundations, symmetric key, multiparty computation, concurrent and resettable security, non-malleable codes and tampering, privacy amplification, encryption an key exchange, pseudorandom functions and applications, proofs and verifiable computation, differential privacy, functional encryption, obfuscation.
The two-volume set LNCS 9014 and LNCS 9015 constitutes the refereed proceedings of the 12th International Conference on Theory of Cryptography, TCC 2015, held in Warsaw, Poland in March 2015. The 52 revised full papers presented were carefully reviewed and selected from 137 submissions. The papers are organized in topical sections on foundations, symmetric key, multiparty computation, concurrent and resettable security, non-malleable codes and tampering, privacy amplification, encryption an key exchange, pseudorandom functions and applications, proofs and verifiable computation, differential privacy, functional encryption, obfuscation.
This book constitutes the thoroughly refereed post conference proceedings of the First and Second International Workshops on In Memory Data Management and Analysis held in Riva del Garda, Italy, August 2013 and Hangzhou, China, in September 2014. The 11 revised full papers were carefully reviewed and selected from 18 submissions and cover topics from main-memory graph analytics platforms to main-memory OLTP applications.
Why a book about logs? That's easy: the humble log is an abstraction that lies at the heart of many systems, from NoSQL databases to cryptocurrencies. Even though most engineers don't think much about them, this short book shows you why logs are worthy of your attention. Based on his popular blog posts, LinkedIn principal engineer Jay Kreps shows you how logs work in distributed systems, and then delivers practical applications of these concepts in a variety of common uses - data integration, enterprise architecture, real-time stream processing, data system design, and abstract computing models. Go ahead and take the plunge with logs; you're going love them. Learn how logs are used for programmatic access in databases and distributed systems Discover solutions to the huge data integration problem when more data of more varieties meet more systems Understand why logs are at the heart of real-time stream processing Learn the role of a log in the internals of online data systems Explore how Jay Kreps applies these ideas to his own work on data infrastructure systems at LinkedIn
This book focuses on research aspects of ensemble approaches of machine learning techniques that can be applied to address the big data problems. In this book, various advancements of machine learning algorithms to extract data-driven decisions from big data in diverse domains such as the banking sector, healthcare, social media, and video surveillance are presented in several chapters. Each of them has separate functionalities, which can be leveraged to solve a specific set of big data applications. This book is a potential resource for various advances in the field of machine learning and data science to solve big data problems with many objectives. It has been observed from the literature that several works have been focused on the advancement of machine learning in various fields like biomedical, stock prediction, sentiment analysis, etc. However, limited discussions have been carried out on application of advanced machine learning techniques in solving big data problems.
With this practical book, you will learn proven methods for anonymizing health data to help your organization share meaningful datasets, without exposing patient identity. Leading experts Khaled El Emam and Luk Arbuckle walk you through a risk-based methodology, using case studies from their efforts to de-identify hundreds of datasets. Clinical data is valuable for research and other types of analytics, but making it anonymous without compromising data quality is tricky. This book demonstrates techniques for handling different data types, based on the authors' experiences with a maternal-child registry, inpatient discharge abstracts, health insurance claims, electronic medical record databases, and the World Trade Center disaster registry, among others.Understand different methods for working with cross-sectional and longitudinal datasetsAssess the risk of adversaries who attempt to re-identify patients in anonymized datasetsReduce the size and complexity of massive datasets without losing key information or jeopardizing privacyUse methods to anonymize unstructured free-form text dataMinimize the risks inherent in geospatial data, without omitting critical location-based health informationLook at ways to anonymize coding information in health dataLearn the challenge of anonymously linking related datasets
Discover the power of location data to build effective, intelligent data models with Geospatial ecosystems Key Features Manipulate location-based data and create intelligent geospatial data models Build effective location recommendation systems used by popular companies such as Uber A hands-on guide to help you consume spatial data and parallelize GIS operations effectively Book DescriptionData scientists, who have access to vast data streams, are a bit myopic when it comes to intrinsic and extrinsic location-based data and are missing out on the intelligence it can provide to their models. This book demonstrates effective techniques for using the power of data science and geospatial intelligence to build effective, intelligent data models that make use of location-based data to give useful predictions and analyses. This book begins with a quick overview of the fundamentals of location-based data and how techniques such as Exploratory Data Analysis can be applied to it. We then delve into spatial operations such as computing distances, areas, extents, centroids, buffer polygons, intersecting geometries, geocoding, and more, which adds additional context to location data. Moving ahead, you will learn how to quickly build and deploy a geo-fencing system using Python. Lastly, you will learn how to leverage geospatial analysis techniques in popular recommendation systems such as collaborative filtering and location-based recommendations, and more. By the end of the book, you will be a rockstar when it comes to performing geospatial analysis with ease. What you will learn Learn how companies now use location data Set up your Python environment and install Python geospatial packages Visualize spatial data as graphs Extract geometry from spatial data Perform spatial regression from scratch Build web applications which dynamically references geospatial data Who this book is forData Scientists who would like to leverage location-based data and want to use location-based intelligence in their data models will find this book useful. This book is also for GIS developers who wish to incorporate data analysis in their projects. Knowledge of Python programming and some basic understanding of data analysis are all you need to get the most out of this book.
Big data processing and analytics at speed and scale using command line tools. Key Features Perform string processing, numerical computations, and more using CLI tools Understand the essential components of data science development workflow Automate data pipeline scripts and visualization with the command line Book DescriptionThe Command Line has been in existence on UNIX-based OSes in the form of Bash shell for over 3 decades. However, very little is known to developers as to how command-line tools can be OSEMN (pronounced as awesome and standing for Obtaining, Scrubbing, Exploring, Modeling, and iNterpreting data) for carrying out simple-to-advanced data science tasks at speed. This book will start with the requisite concepts and installation steps for carrying out data science tasks using the command line. You will learn to create a data pipeline to solve the problem of working with small-to medium-sized files on a single machine. You will understand the power of the command line, learn how to edit files using a text-based and an. You will not only learn how to automate jobs and scripts, but also learn how to visualize data using the command line. By the end of this book, you will learn how to speed up the process and perform automated tasks using command-line tools. What you will learn Understand how to set up the command line for data science Use AWK programming language commands to search quickly in large datasets. Work with files and APIs using the command line Share and collect data with CLI tools Perform visualization with commands and functions Uncover machine-level programming practices with a modern approach to data science Who this book is forThis book is for data scientists and data analysts with little to no knowledge of the command line but has an understanding of data science. Perform everyday data science tasks using the power of command line tools.
Introduction to deep learning and PyTorch by building a convolutional neural network and recurrent neural network for real-world use cases such as image classification, transfer learning, and natural language processing. Key Features Clear and concise explanations Gives important insights into deep learning models Practical demonstration of key concepts Book DescriptionPyTorch is extremely powerful and yet easy to learn. It provides advanced features, such as supporting multiprocessor, distributed, and parallel computation. This book is an excellent entry point for those wanting to explore deep learning with PyTorch to harness its power. This book will introduce you to the PyTorch deep learning library and teach you how to train deep learning models without any hassle. We will set up the deep learning environment using PyTorch, and then train and deploy different types of deep learning models, such as CNN, RNN, and autoencoders. You will learn how to optimize models by tuning hyperparameters and how to use PyTorch in multiprocessor and distributed environments. We will discuss long short-term memory network (LSTMs) and build a language model to predict text. By the end of this book, you will be familiar with PyTorch's capabilities and be able to utilize the library to train your neural networks with relative ease. What you will learn Set up the deep learning environment using the PyTorch library Learn to build a deep learning model for image classification Use a convolutional neural network for transfer learning Understand to use PyTorch for natural language processing Use a recurrent neural network to classify text Understand how to optimize PyTorch in multiprocessor and distributed environments Train, optimize, and deploy your neural networks for maximum accuracy and performance Learn to deploy production-ready models Who this book is forDevelopers and Data Scientist familiar with Machine Learning but new to deep learning, or existing practitioners of deep learning who would like to use PyTorch to train their deep learning models will find this book to be useful. Having knowledge of Python programming will be an added advantage, while previous exposure to PyTorch is not needed.
Understand and build beautiful and advanced plots with Matplotlib and Python Key Features Practical guide with hands-on examples to design interactive plots Advanced techniques to constructing complex plots Explore 3D plotting and visualization using Jupyter Notebook Book DescriptionIn this book, you'll get hands-on with customizing your data plots with the help of Matplotlib. You'll start with customizing plots, making a handful of special-purpose plots, and building 3D plots. You'll explore non-trivial layouts, Pylab customization, and more about tile configuration. You'll be able to add text, put lines in plots, and also handle polygons, shapes, and annotations. Non-Cartesian and vector plots are exciting to construct, and you'll explore them further in this book. You'll delve into niche plots and visualize ordinal and tabular data. In this book, you'll be exploring 3D plotting, one of the best features when it comes to 3D data visualization, along with Jupyter Notebook, widgets, and creating movies for enhanced data representation. Geospatial plotting will also be explored. Finally, you'll learn how to create interactive plots with the help of Jupyter. Learn expert techniques for effective data visualization using Matplotlib 3 and Python with our latest offering -- Matplotlib 3.0 Cookbook What you will learn Deal with non-trivial and unusual plots Understanding Basemap methods Customize and represent data in 3D Construct Non-Cartesian and vector plots Design interactive plots using Jupyter Notebook Make movies for enhanced data representation Who this book is forThis book is aimed at individuals who want to explore data visualization techniques. A basic knowledge of Matplotlib and Python is required.
No need to spend hours ploughing through endless data - let Spark, one of the fastest big data processing engines available, do the hard work for you. Key Features Get up and running with Apache Spark and Python Integrate Spark with AWS for real-time analytics Apply processed data streams to machine learning APIs of Apache Spark Book DescriptionProcessing big data in real time is challenging due to scalability, information consistency, and fault-tolerance. This book teaches you how to use Spark to make your overall analytical workflow faster and more efficient. You'll explore all core concepts and tools within the Spark ecosystem, such as Spark Streaming, the Spark Streaming API, machine learning extension, and structured streaming. You'll begin by learning data processing fundamentals using Resilient Distributed Datasets (RDDs), SQL, Datasets, and Dataframes APIs. After grasping these fundamentals, you'll move on to using Spark Streaming APIs to consume data in real time from TCP sockets, and integrate Amazon Web Services (AWS) for stream consumption. By the end of this book, you'll not only have understood how to use machine learning extensions and structured streams but you'll also be able to apply Spark in your own upcoming big data projects. What you will learn Write your own Python programs that can interact with Spark Implement data stream consumption using Apache Spark Recognize common operations in Spark to process known data streams Integrate Spark streaming with Amazon Web Services (AWS) Create a collaborative filtering model with the movielens dataset Apply processed data streams to Spark machine learning APIs Who this book is forData Processing with Apache Spark is for you if you are a software engineer, architect, or IT professional who wants to explore distributed systems and big data analytics. Although you don't need any knowledge of Spark, prior experience of working with Python is recommended.
If your organization is about to enter the world of big data, you not only need to decide whether Apache Hadoop is the right platform to use, but also which of its many components are best suited to your task. This field guide makes the exercise manageable by breaking down the Hadoop ecosystem into short, digestible sections. You'll quickly understand how Hadoop's projects, subprojects, and related technologies work together. Each chapter introduces a different topic-such as core technologies or data transfer-and explains why certain components may or may not be useful for particular needs. When it comes to data, Hadoop is a whole new ballgame, but with this handy reference, you'll have a good grasp of the playing field. Topics include: Core technologies-Hadoop Distributed File System (HDFS), MapReduce, YARN, and Spark Database and data management-Cassandra, HBase, MongoDB, and Hive Serialization-Avro, JSON, and Parquet Management and monitoring-Puppet, Chef, Zookeeper, and Oozie Analytic helpers-Pig, Mahout, and MLLib Data transfer-Scoop, Flume, distcp, and Storm Security, access control, auditing-Sentry, Kerberos, and Knox Cloud computing and virtualization-Serengeti, Docker, and Whirr
Leverage Splunk's operational intelligence capabilities to unlock new hidden business insights and drive success Key Features Tackle any problems related to searching and analyzing your data with Splunk Get the latest information and business insights on Splunk 7.x Explore the all new machine learning toolkit in Splunk 7.x Book DescriptionSplunk makes it easy for you to take control of your data, and with Splunk Operational Cookbook, you can be confident that you are taking advantage of the Big Data revolution and driving your business with the cutting edge of operational intelligence and business analytics. With more than 80 recipes that demonstrate all of Splunk's features, not only will you find quick solutions to common problems, but you'll also learn a wide range of strategies and uncover new ideas that will make you rethink what operational intelligence means to you and your organization. You'll discover recipes on data processing, searching and reporting, dashboards, and visualizations to make data shareable, communicable, and most importantly meaningful. You'll also find step-by-step demonstrations that walk you through building an operational intelligence application containing vital features essential to understanding data and to help you successfully integrate a data-driven way of thinking in your organization. Throughout the book, you'll dive deeper into Splunk, explore data models and pivots to extend your intelligence capabilities, and perform advanced searching with machine learning to explore your data in even more sophisticated ways. Splunk is changing the business landscape, so make sure you're taking advantage of it. What you will learn Learn how to use Splunk to gather, analyze, and report on data Create dashboards and visualizations that make data meaningful Build an intelligent application with extensive functionalities Enrich operational data with lookups and workflows Model and accelerate data and perform pivot-based reporting Apply ML algorithms for forecasting and anomaly detection Summarize data for long term trending, reporting, and analysis Integrate advanced JavaScript charts and leverage Splunk's API Who this book is forThis book is intended for data professionals who are looking to leverage the Splunk Enterprise platform as a valuable operational intelligence tool. The recipes provided in this book will appeal to individuals from all facets of business, IT, security, product, marketing, and many more! Even the existing users of Splunk who want to upgrade and get up and running with Splunk 7.x will find this book to be of great value. |
![]() ![]() You may like...
Cross-Cultural Analysis of Image-Based…
Lisa Keller, Robert Keller, …
Hardcover
R3,560
Discovery Miles 35 600
Data Analytics for Social Microblogging…
Soumi Dutta, Asit Kumar Das, …
Paperback
R3,543
Discovery Miles 35 430
Demystifying Graph Data Science - Graph…
Pethuru Raj, Abhishek Kumar, …
Hardcover
|