![]() |
Welcome to Loot.co.za!
Sign in / Register |Wishlists & Gift Vouchers |Help | Advanced search
|
Your cart is empty |
||
|
Books > Computing & IT > Applications of computing > Databases > Data capture & analysis
Big Data Analytics with Spark is a step-by-step guide for learning Spark, which is an open-source fast and general-purpose cluster computing framework for large-scale data analysis. You will learn how to use Spark for different types of big data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. In addition, this book will help you become a much sought-after Spark expert. Spark is one of the hottest Big Data technologies. The amount of data generated today by devices, applications and users is exploding. Therefore, there is a critical need for tools that can analyze large-scale data and unlock value from it. Spark is a powerful technology that meets that need. You can, for example, use Spark to perform low latency computations through the use of efficient caching and iterative algorithms; leverage the features of its shell for easy and interactive Data analysis; employ its fast batch processing and low latency features to process your real time data streams and so on. As a result, adoption of Spark is rapidly growing and is replacing Hadoop MapReduce as the technology of choice for big data analytics. This book provides an introduction to Spark and related big-data technologies. It covers Spark core and its add-on libraries, including Spark SQL, Spark Streaming, GraphX, and MLlib. Big Data Analytics with Spark is therefore written for busy professionals who prefer learning a new technology from a consolidated source instead of spending countless hours on the Internet trying to pick bits and pieces from different sources. The book also provides a chapter on Scala, the hottest functional programming language, and the program that underlies Spark. You'll learn the basics of functional programming in Scala, so that you can write Spark applications in it. What's more, Big Data Analytics with Spark provides an introduction to other big data technologies that are commonly used along with Spark, like Hive, Avro, Kafka and so on. So the book is self-sufficient; all the technologies that you need to know to use Spark are covered. The only thing that you are expected to know is programming in any language. There is a critical shortage of people with big data expertise, so companies are willing to pay top dollar for people with skills in areas like Spark and Scala. So reading this book and absorbing its principles will provide a boost-possibly a big boost-to your career.
This book contains extended and revised versions of a set of selected papers from two events organized by the Euro Working Group on Decision Support Systems (EWG-DSS), which were held in Toulouse, France and Barcelona, Spain, in June and July 2014. Overall, 8 papers were accepted for publication in this edition after a rigorous review process through at least three internationally known experts from the EWG-DSS Program Committee and external invited reviewers. The selected papers focus on knowledge management and sharing, and on information models developed to support various decision processes.
This book constitutes the proceedings of the Workshops held at the International Conference on Social Informatics, SocInfo 2014, which took place in Barcelona, Spain, in November 2014. This year SocInfo 2014 included nine satellite workshops: the City Labs Workshop, the Workshop on Criminal Network Analysis and Mining, CRIMENET, the Workshop on Interaction and Exchange in Social Media, DYAD, the Workshop on Exploration of Games and Gamers, EGG, the Workshop on HistoInformatics, the Workshop on Socio-Economic Dynamics, Networks and Agent-based Models, SEDNAM, the Workshop on Social Influence, SI, the Workshop on Social Scientists Working with Start-Ups and the Workshop on Social Media in Crowdsourcing and Human Computation, SoHuman.
Construct and implement a data warehousing plan.
This book constitutes the thoroughly refereed post conference proceedings of the First and Second International Workshops on In Memory Data Management and Analysis held in Riva del Garda, Italy, August 2013 and Hangzhou, China, in September 2014. The 11 revised full papers were carefully reviewed and selected from 18 submissions and cover topics from main-memory graph analytics platforms to main-memory OLTP applications.
The two-volume set LNCS 9014 and LNCS 9015 constitutes the refereed proceedings of the 12th International Conference on Theory of Cryptography, TCC 2015, held in Warsaw, Poland in March 2015. The 52 revised full papers presented were carefully reviewed and selected from 137 submissions. The papers are organized in topical sections on foundations, symmetric key, multiparty computation, concurrent and resettable security, non-malleable codes and tampering, privacy amplification, encryption an key exchange, pseudorandom functions and applications, proofs and verifiable computation, differential privacy, functional encryption, obfuscation.
The two-volume set LNCS 9014 and LNCS 9015 constitutes the refereed proceedings of the 12th International Conference on Theory of Cryptography, TCC 2015, held in Warsaw, Poland in March 2015. The 52 revised full papers presented were carefully reviewed and selected from 137 submissions. The papers are organized in topical sections on foundations, symmetric key, multiparty computation, concurrent and resettable security, non-malleable codes and tampering, privacy amplification, encryption an key exchange, pseudorandom functions and applications, proofs and verifiable computation, differential privacy, functional encryption, obfuscation.
In dem Buch werden Methoden vorgestellt, mit denen ubersehenes IT-Potenzial in Organisation genutzt werden kann. Dabei geht die Autorin davon aus, dass das Wissen bereits vorhanden ist und nur gehoben werden muss. Mit Checklisten und Tipps fur die Umsetzung."
This book constitutes revised selected papers from the second ECML PKDD Workshop on Data Analytics for Renewable Energy Integration, DARE 2014, held in Nancy, France, in September 2014. The 11 papers presented in this volume were carefully reviewed and selected for inclusion in this book.
Data Scientists at Work is a collection of interviews with sixteen of the world's most influential and innovative data scientists from across the spectrum of this hot new profession. "Data scientist is the sexiest job in the 21st century," according to the Harvard Business Review. By 2018, the United States will experience a shortage of 190,000 skilled data scientists, according to a McKinsey report. Through incisive in-depth interviews, this book mines the what, how, and why of the practice of data science from the stories, ideas, shop talk, and forecasts of its preeminent practitioners across diverse industries: social network (Yann LeCun, Facebook); professional network (Daniel Tunkelang, LinkedIn); venture capital (Roger Ehrenberg, IA Ventures); enterprise cloud computing and neuroscience (Eric Jonas, formerly Salesforce.com); newspaper and media (Chris Wiggins, The New York Times); streaming television (Caitlin Smallwood, Netflix); music forecast (Victor Hu, Next Big Sound); strategic intelligence (Amy Heineike, Quid); environmental big data (Andre Karpis ts enko, Planet OS); geospatial marketing intelligence (Jonathan Lenaghan, PlaceIQ); advertising (Claudia Perlich, Dstillery); fashion e-commerce (Anna Smith, Rent the Runway); specialty retail (Erin Shellman, Nordstrom); email marketing (John Foreman, MailChimp); predictive sales intelligence (Kira Radinsky, SalesPredict); and humanitarian nonprofit (Jake Porway, DataKind). The book features a stimulating foreword by Google's Director of Research, Peter Norvig. Each of these data scientists shares how he or she tailors the torrent-taming techniques of big data, data visualization, search, and statistics to specific jobs by dint of ingenuity, imagination, patience, and passion. Data Scientists at Work parts the curtain on the interviewees' earliest data projects, how they became data scientists, their discoveries and surprises in working with data, their thoughts on the past, present, and future of the profession, their experiences of team collaboration within their organizations, and the insights they have gained as they get their hands dirty refining mountains of raw data into objects of commercial, scientific, and educational value for their organizations and clients.
Data assimilation is a hugely important mathematical technique, relevant in fields as diverse as geophysics, data science, and neuroscience. This modern book provides an authoritative treatment of the field as it relates to several scientific disciplines, with a particular emphasis on recent developments from machine learning and its role in the optimisation of data assimilation. Underlying theory from statistical physics, such as path integrals and Monte Carlo methods, are developed in the text as a basis for data assimilation, and the author then explores examples from current multidisciplinary research such as the modelling of shallow water systems, ocean dynamics, and neuronal dynamics in the avian brain. The theory of data assimilation and machine learning is introduced in an accessible and unified manner, and the book is suitable for undergraduate and graduate students from science and engineering without specialized experience of statistical physics.
Why a book about logs? That's easy: the humble log is an abstraction that lies at the heart of many systems, from NoSQL databases to cryptocurrencies. Even though most engineers don't think much about them, this short book shows you why logs are worthy of your attention. Based on his popular blog posts, LinkedIn principal engineer Jay Kreps shows you how logs work in distributed systems, and then delivers practical applications of these concepts in a variety of common uses - data integration, enterprise architecture, real-time stream processing, data system design, and abstract computing models. Go ahead and take the plunge with logs; you're going love them. Learn how logs are used for programmatic access in databases and distributed systems Discover solutions to the huge data integration problem when more data of more varieties meet more systems Understand why logs are at the heart of real-time stream processing Learn the role of a log in the internals of online data systems Explore how Jay Kreps applies these ideas to his own work on data infrastructure systems at LinkedIn
Aus ihrer Entwicklung umgibt die Verwaltungssprache eine sprachliche Normierung im Hinblick einer Allgemeinverbindlichkeit gegenuber den Adressatinnen bzw. Adressaten, wobei deren historische Kodifikation sowohl in Woerterbuchern als auch in sonstigen Aufzeichnungen niedergeschrieben wurde. Dies betrifft auch die verbindliche Einhaltung der Gendergerechten Formulierungen in der oesterreichischen Verwaltungssprache: Durch Umformulieren des Satzes soll die bzw. der Handelnde eindeutig in den Prufberichten benannt werden. Diese Arbeit zeigt, inwieweit im Hinblick einer optimalen Verstandlichkeit und Lesbarkeit der Verwaltungssprache und deren Texte fur die Adressatinnen bzw. Adressaten diese Ziele mithilfe einer EDV-Unterstutzungshilfe zu erreichen sind. Zusatzliches Infomaterial ist dem Buch auf einer CD beigefugt.
Many applications depend on the effective acquisition of semantic metadata, and this state-of-the-art volume provides extensive coverage of the field of semantics acquisition games (SAGs). SAGs are a part of the crowdsourcing approach family and the authors analyze their role as tools for acquisition of resource metadata and domain models. Three case studies of SAG-based semantics acquisition methods are shown, along with other existing SAGs: 1. the Little Search Game - a search query formulation game using negative search, serving for acquisition of lightweight semantics. 2. the PexAce - a card game acquiring annotations to images. 3. the CityLights - a SAG used for validation of music metadata. The authors also look at the SAGs from their design perspectives covering SAG design issues and existing patterns, including several novel patterns. For solving cold start problems, a "helper artifact" scheme is presented, and for dealing with malicious player behavior, a posteriori cheating detection scheme is given. The book also presents methods for assessing information about player expertise, which can be used to make SAGs more effective in terms of useful output.
This textbook grew out of notes for the ECE143 Programming for Data Analysis class that the author has been teaching at University of California, San Diego, which is a requirement for both graduate and undergraduate degrees in Machine Learning and Data Science. This book is ideal for readers with some Python programming experience. The book covers key language concepts that must be understood to program effectively, especially for data analysis applications. Certain low-level language features are discussed in detail, especially Python memory management and data structures. Using Python effectively means taking advantage of its vast ecosystem. The book discusses Python package management and how to use third-party modules as well as how to structure your own Python modules. The section on object-oriented programming explains features of the language that facilitate common programming patterns. After developing the key Python language features, the book moves on to third-party modules that are foundational for effective data analysis, starting with Numpy. The book develops key Numpy concepts and discusses internal Numpy array data structures and memory usage. Then, the author moves onto Pandas and details its many features for data processing and alignment. Because strong visualizations are important for communicating data analysis, key modules such as Matplotlib are developed in detail, along with web-based options such as Bokeh, Holoviews, Altair, and Plotly. The text is sprinkled with many tricks-of-the-trade that help avoid common pitfalls. The author explains the internal logic embodied in the Python language so that readers can get into the Python mindset and make better design choices in their codes, which is especially helpful for newcomers to both Python and data analysis. To get the most out of this book, open a Python interpreter and type along with the many code samples.
Este libro forma parte del proyecto Transformacion funcional de la literatura infantil y juvenil en la sociedad multimedia. Aplicacion de un modelo teorico de critica a las adaptaciones audiovisuales en espanol de las obras infantiles inglesas y alemanas y tiene un doble objetivo: por una parte, analizar como se adaptaron obras de literatura inglesa y alemana al medio audiovisual y como los filmes ingleses y alemanes se trasvasaron al espanol peninsular y, por otra, estudiar la calidad de los libros infantiles - y de sus traducciones al espanol -, que surgen a partir de estos productos audiovisuales. El analisis de las adaptaciones audiovisuales incluye tanto criterios tecnicos como traductologicos, y el estudio de los libros derivados se lleva a cabo siguiendo criterios literarios y traductologicos, en el caso de los analisis de las traducciones de estos productos.
Big Data Analytics Using Splunk is a hands-on book showing how to process and derive business value from big data in real time. Examples in the book draw from social media sources such as Twitter (tweets) and Foursquare (check-ins). You also learn to draw from machine data, enabling you to analyze, say, web server log files and patterns of user access in real time, as the access is occurring. Gone are the days when you need be caught out by shifting public opinion or sudden changes in customer behavior. Splunk's easy to use engine helps you recognize and react in real time, as events are occurring. Splunk is a powerful, yet simple analytical tool fast gaining traction in the fields of big data and operational intelligence. Using Splunk, you can monitor data in real time, or mine your data after the fact. Splunk's stunning visualizations aid in locating the needle of value in a haystack of a data. Geolocation support spreads your data across a map, allowing you to drill down to geographic areas of interest. Alerts can run in the background and trigger to warn you of shifts or events as they are taking place. With Splunk you can immediately recognize and react to changing trends and shifting public opinion as expressed through social media, and to new patterns of eCommerce and customer behavior. The ability to immediately recognize and react to changing trends provides a tremendous advantage in today's fast-paced world of Internet business. Big Data Analytics Using Splunk opens the door to an exciting world of real-time operational intelligence.Built around hands-on projects Shows how to mine social media Opens the door to real-time operational intelligence What you'll learn Monitor and mine social media for trends affecting your business Know how you are perceived, and when that perception is rising or falling Detect changing customer behavior from mining your operational data Collect and analyze in real time, or from historical files Apply basic analytical metrics to better understand your data Create compelling visualizations and easily communicate your findings Who this book is for Big Data Analytics Using Splunk is for those who are interested in exploring the heaps of data they have available, but don't know where to start. It is for the people who have knowledge of the data they want to analyze and are developers or SQL programmers at a level anywhere between beginners and intermediate. Expert developers also benefit from learning how to use such a simple and powerful tool as Splunk.
This book focuses on research aspects of ensemble approaches of machine learning techniques that can be applied to address the big data problems. In this book, various advancements of machine learning algorithms to extract data-driven decisions from big data in diverse domains such as the banking sector, healthcare, social media, and video surveillance are presented in several chapters. Each of them has separate functionalities, which can be leveraged to solve a specific set of big data applications. This book is a potential resource for various advances in the field of machine learning and data science to solve big data problems with many objectives. It has been observed from the literature that several works have been focused on the advancement of machine learning in various fields like biomedical, stock prediction, sentiment analysis, etc. However, limited discussions have been carried out on application of advanced machine learning techniques in solving big data problems.
If your organization is about to enter the world of big data, you not only need to decide whether Apache Hadoop is the right platform to use, but also which of its many components are best suited to your task. This field guide makes the exercise manageable by breaking down the Hadoop ecosystem into short, digestible sections. You'll quickly understand how Hadoop's projects, subprojects, and related technologies work together. Each chapter introduces a different topic-such as core technologies or data transfer-and explains why certain components may or may not be useful for particular needs. When it comes to data, Hadoop is a whole new ballgame, but with this handy reference, you'll have a good grasp of the playing field. Topics include: Core technologies-Hadoop Distributed File System (HDFS), MapReduce, YARN, and Spark Database and data management-Cassandra, HBase, MongoDB, and Hive Serialization-Avro, JSON, and Parquet Management and monitoring-Puppet, Chef, Zookeeper, and Oozie Analytic helpers-Pig, Mahout, and MLLib Data transfer-Scoop, Flume, distcp, and Storm Security, access control, auditing-Sentry, Kerberos, and Knox Cloud computing and virtualization-Serengeti, Docker, and Whirr
How can you get your data from frontend servers to Hadoop in near real time? With this complete reference guide, you'll learn Flume's rich set of features for collecting, aggregating, and writing large amounts of streaming data to the Hadoop Distributed File System (HDFS), Apache HBase, SolrCloud, Elastic Search, and other systems. Using Flume shows operations engineers how to configure, deploy, and monitor a Flume cluster, and teaches developers how to write Flume plugins and custom components for their specific use-cases. You'll learn about Flume's design and implementation, as well as various features that make it highly scalable, flexible, and reliable. Code examples and exercises are available on GitHub. Learn how Flume provides a steady rate of flow by acting as a buffer between data producers and consumers Dive into key Flume components, including sources that accept data and sinks that write and deliver it Write custom plugins to customize the way Flume receives, modifies, formats, and writes data Explore APIs for sending data to Flume agents from your own applications Plan and deploy Flume in a scalable and flexible way - and monitor your cluster once it's running
What do you need to become a data-driven organization? Far more than having big data or a crack team of unicorn data scientists, it requires establishing an effective, deeply-ingrained data culture. This practical book shows you how true data-drivenness involves processes that require genuine buy-in across your company, from analysts and management to the C-Suite and the board. Through interviews and examples from data scientists and analytics leaders in a variety of industries, author Carl Anderson explains the analytics value chain you need to adopt when building predictive business models-from data collection and analysis to the insights and leadership that drive concrete actions. You'll learn what works and what doesn't, and why creating a data-driven culture throughout your organization is essential.Start from the bottom up: learn how to collect the right data the right way Hire analysts with the right skills, and organize them into teams Examine statistical and visualization tools, and fact-based story-telling methods Collect and analyze data while respecting privacy and ethics Understand how analysts and their managers can help spur a data-driven culture Learn the importance of data leadership and C-level positions such as chief data officer and chief analytics officer
There has been intense excitement in recent years around activities labeled "data science," "big data," and "analytics." However, the lack of clarity around these terms and, particularly, around the skill sets and capabilities of their practitioners has led to inefficient communication between "data scientists" and the organizations requiring their services. This lack of clarity has frequently led to missed opportunities. To address this issue, we surveyed several hundred practitioners via the Web to explore the varieties of skills, experiences, and viewpoints in the emerging data science community. We used dimensionality reduction techniques to divide potential data scientists into five categories based on their self-ranked skill sets (Statistics, Math/Operations Research, Business, Programming, and Machine Learning/Big Data), and four categories based on their self-identification (Data Researchers, Data Businesspeople, Data Engineers, and Data Creatives). Further examining the respondents based on their division into these categories provided additional insights into the types of professional activities, educational background, and even scale of data used by different types of Data Scientists. In this report, we combine our results with insights and data from others to provide a better understanding of the diversity of practitioners, and to argue for the value of clearer communication around roles, teams, and careers.
Even as big data is turning the world upside down, the next phase of the revolution is already taking shape: real-time data analysis. This hands-on guide introduces you to Storm, a distributed, JVM-based system for processing streaming data. Through simple tutorials, sample Java code, and a complete real-world scenario, you'll learn how to build fast, fault-tolerant solutions that process results as soon as the data arrives. Discover how easy it is to set up Storm clusters for solving various problems, including continuous data computation, distributed remote procedure calls, and data stream processing. Learn how to program Storm components: "spouts" for data input and "bolts" for data transformation Discover how data is exchanged between spouts and bolts in a Storm "topology" Make spouts fault-tolerant with several commonly used design strategies Explore bolts--their life cycle, strategies for design, and ways to implement them Scale your solution by defining each component's level of parallelism Study a real-time web analytics system built with Node.js, a Redis server, and a Storm topology Write spouts and bolts with non-JVM languages such as Python, Ruby, and Javascript
Advanced Data Science and Analytics with Python enables data scientists to continue developing their skills and apply them in business as well as academic settings. The subjects discussed in this book are complementary and a follow-up to the topics discussed in Data Science and Analytics with Python. The aim is to cover important advanced areas in data science using tools developed in Python such as SciKit-learn, Pandas, Numpy, Beautiful Soup, NLTK, NetworkX and others. The model development is supported by the use of frameworks such as Keras, TensorFlow and Core ML, as well as Swift for the development of iOS and MacOS applications. Features: Targets readers with a background in programming, who are interested in the tools used in data analytics and data science Uses Python throughout Presents tools, alongside solved examples, with steps that the reader can easily reproduce and adapt to their needs Focuses on the practical use of the tools rather than on lengthy explanations Provides the reader with the opportunity to use the book whenever needed rather than following a sequential path The book can be read independently from the previous volume and each of the chapters in this volume is sufficiently independent from the others, providing flexibility for the reader. Each of the topics addressed in the book tackles the data science workflow from a practical perspective, concentrating on the process and results obtained. The implementation and deployment of trained models are central to the book. Time series analysis, natural language processing, topic modelling, social network analysis, neural networks and deep learning are comprehensively covered. The book discusses the need to develop data products and addresses the subject of bringing models to their intended audiences - in this case, literally to the users' fingertips in the form of an iPhone app. About the Author Dr. Jesus Rogel-Salazar is a lead data scientist in the field, working for companies such as Tympa Health Technologies, Barclays, AKQA, IBM Data Science Studio and Dow Jones. He is a visiting researcher at the Department of Physics at Imperial College London, UK and a member of the School of Physics, Astronomy and Mathematics at the University of Hertfordshire, UK. |
You may like...
Human Face Recognition Using Third-Order…
Okechukwu A. Uwechue, Abhijit S. Pandya
Hardcover
R4,081
Discovery Miles 40 810
Implicit Objects in Computer Graphics
Luiz Velho, Jonas Gomes, …
Hardcover
R2,761
Discovery Miles 27 610
Video Content Analysis Using Multimodal…
Ying Li, C.-C.Jay Kuo
Hardcover
R2,770
Discovery Miles 27 700
Representation and Retrieval of Visual…
HongJiang Zhang, Philippe Aigrain, …
Hardcover
R2,703
Discovery Miles 27 030
Fundamentals of Spatial Information…
Robert Laurini, Derek Thompson
Hardcover
R1,451
Discovery Miles 14 510
Semantic Multimedia and Ontologies…
Yiannis Kompatsiaris, Paola Hobson
Hardcover
R4,172
Discovery Miles 41 720
|