|
Showing 1 - 1 of
1 matches in All Departments
The last decade has seen a huge and growing interest in processing
large data sets on large distributed clusters. This trend began
with the MapReduce framework, and has been widely adopted by
several other systems, including PigLatin, Hive, Scope, Dremmel,
Spark and Myria to name a few. While the applications of such
systems are diverse (for example, machine learning, data
analytics), most involve relatively standard data processing tasks
like identifying relevant data, cleaning, filtering, joining,
grouping, transforming, extracting features, and evaluating
results. This has generated great interest in the study of
algorithms for data processing on large distributed clusters.
Algorithmic Aspects of Parallel Data Processing discusses recent
algorithmic developments for distributed data processing. It uses a
theoretical model of parallel processing called the Massively
Parallel Computation (MPC) model, which is a simplification of the
BSP model where the only cost is given by the amount of
communication and the number of communication rounds. The survey
studies several algorithms for multi-join queries, sorting, and
matrix multiplication. It discusses their relationships and common
techniques applied across the different data processing tasks.
|
You may like...
Operation Joktan
Amir Tsarfati, Steve Yohn
Paperback
(1)
R250
R211
Discovery Miles 2 110
Widows
Viola Davis, Michelle Rodriguez, …
Blu-ray disc
R22
R19
Discovery Miles 190
|