|
Showing 1 - 6 of
6 matches in All Departments
Building a simple but powerful recommendation system is much easier
than you think. Approachable for all levels of expertise, this
report explains innovations that make machine learning practical
for business production settings--and demonstrates how even a
small-scale development team can design an effective large-scale
recommendation system. Apache Mahout committers Ted Dunning and
Ellen Friedman walk you through a design that relies on careful
simplification. You'll learn how to collect the right data, analyze
it with an algorithm from the Mahout library, and then easily
deploy the recommender using search technology, such as Apache Solr
or Elasticsearch. Powerful and effective, this efficient
combination does learning offline and delivers rapid response
recommendations in real time. Understand the tradeoffs between
simple and complex recommenders Collect user data that tracks user
actions--rather than their ratings Predict what a user wants based
on behavior by others, using Mahoutfor co-occurrence analysis Use
search technology to offer recommendations in real time, complete
with item metadata Watch the recommender in action with a music
service example Improve your recommender with dithering, multimodal
recommendation, and other techniques
If you're a business team leader, CIO, business analyst, or
developer interested in how Apache Hadoop and Apache HBase-related
technologies can address problems involving large-scale data in
cost-effective ways, this book is for you. Using real-world stories
and situations, authors Ted Dunning and Ellen Friedman show Hadoop
newcomers and seasoned users alike how NoSQL databases and Hadoop
can solve a variety of business and research issues. You'll learn
about early decisions and pre-planning that can make the process
easier and more productive. If you're already using these
technologies, you'll discover ways to gain the full range of
benefits possible with Hadoop. While you don't need a deep
technical background to get started, this book does provide expert
guidance to help managers, architects, and practitioners succeed
with their Hadoop projects.Examine a day in the life of big data:
India's ambitious Aadhaar project; review tools in the Hadoop
ecosystem such as Apache's Spark, Storm, and Drill to learn how
they can help you; pick up a collection of technical and strategic
tips that have helped others succeed with Hadoop; learn from
several prototypical Hadoop use cases, based on how organizations
have actually applied the technology. You can explore real-world
stories that reveal how MapR customers combine use cases when
putting Hadoop and NoSQL to work, including in production.
Anomaly detection is the detective work of machine learning:
finding the unusual, catching the fraud, discovering strange
activity in large and complex datasets. But, unlike Sherlock
Holmes, you may not know what the puzzle is, much less what
"suspects" you're looking for. This O'Reilly report uses practical
examples to explain how the underlying concepts of anomaly
detection work. From banking security to natural sciences,
medicine, and marketing, anomaly detection has many useful
applications in this age of big data. And the search for anomalies
will intensify once the Internet of Things spawns even more new
types of data. The concepts described in this report will help you
tackle anomaly detection in your own project. Use probabilistic
models to predict what's normal and contrast that to what you
observe Set an adaptive threshold to determine which data falls
outside of the normal range, using the t-digest algorithm Establish
normal fluctuations in complex systems and signals (such as an EKG)
with a more adaptive probablistic model Use historical data to
discover anomalies in sporadic event streams, such as web traffic
Learn how to use deviations in expected behavior to trigger fraud
alerts
More and more data-driven companies are looking to adopt stream
processing and streaming analytics. With this concise ebook, you ll
learn best practices for designing a reliable architecture that
supports this emerging big-data paradigm.Authors Ted Dunning and
Ellen Friedman (Real World Hadoop) help you explore some of the
best technologies to handle stream processing and analytics, with a
focus on the upstream queuing or message-passing layer. To
illustrate the effectiveness of these technologies, this book also
includes specific use cases.Ideal for developers and non-technical
people alike, this book describes: Key elements in good design for
streaming analytics, focusing on the essential characteristics of
the messaging layerNew messaging technologies, including Apache
Kafka and MapR Streams, with links to sample codeTechnology choices
for streaming analytics: Apache Spark Streaming, Apache Flink,
Apache Storm, and Apache ApexHow stream-based architectures are
helpful to support microservicesSpecific use cases such as fraud
detection and geo-distributed data streamsTed Dunning is Chief
Applications Architect at MapR Technologies, and active in the open
source community. He currently serves as VP for Incubator at the
Apache Foundation, as a champion and mentor for a large number of
projects, and as committer and PMC member of the Apache ZooKeeper
and Drill projects. Ted is on Twitter as @ted_dunning.Ellen
Friedman, a committer for the Apache Drill and Apache Mahout
projects, is a solutions consultant and well-known speaker and
author, currently writing mainly about big data topics. With a PhD
in Biochemistry, she has years of experience as a research
scientist and has written about a variety of technical topics.
Ellen is on Twitter as @Ellen_Friedman."
Many big data-driven companies today are moving to protect certain
types of data against intrusion, leaks, or unauthorized eyes. But
how do you lock down data while granting access to people who need
to see it? In this practical book, authors Ted Dunning and Ellen
Friedman offer two novel and practical solutions that you can
implement right away.Ideal for both technical and non-technical
decision makers, group leaders, developers, and data scientists,
this book shows you how to: Share original data in a controlled way
so that different groups within your organization only see part of
the whole. You'll learn how to do this with the new open source SQL
query engine Apache Drill.Provide synthetic data that emulates the
behavior of sensitive data. This approach enables external advisors
to work with you on projects involving data that you can't show
them.If you're intrigued by the synthetic data solution, explore
the log-synth program that Ted Dunning developed as open source
code (available on GitHub), along with how-to instructions and tips
for best practice. You'll also get a collection of use
cases.Providing lock-down security while safely sharing data is a
significant challenge for a growing number of organizations. With
this book, you'll discover new options to share data safely without
sacrificing security.
Time series data is of growing importance, especially with the
rapid expansion of the Internet of Things. This concise guide shows
you effective ways to collect, persist, and access large-scale time
series data for analysis. You'll explore the theory behind time
series databases and learn practical methods for implementing them.
Authors Ted Dunning and Ellen Friedman provide a detailed
examination of open source tools such as OpenTSDB and new
modifications that greatly speed up data ingestion. You'll learn: A
variety of time series use cases The advantages of NoSQL databases
for large-scale time series data NoSQL table design for
high-performance time series databases The benefits and limitations
of OpenTSDB How to access data in OpenTSDB using R, Go, and Ruby
How time series databases contribute to practical machine learning
projects How to handle the added complexity of geo-temporal data
For advice on analyzing time series data, check out Practical
Machine Learning: A New Look at Anomaly Detection, also from Ted
Dunning and Ellen Friedman.
|
You may like...
Loot
Nadine Gordimer
Paperback
(2)
R205
R168
Discovery Miles 1 680
Loot
Nadine Gordimer
Paperback
(2)
R205
R168
Discovery Miles 1 680
|