Welcome to Loot.co.za!
Sign in / Register |Wishlists & Gift Vouchers |Help | Advanced search
|
Your cart is empty |
|||
Showing 1 - 3 of 3 matches in All Departments
This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.
This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.
This book constitutes the refereed proceedings of the 14th European PVM/MPI Users' Group Meeting held in Paris, France, September 30 - October 3, 2007. The 40 revised full papers presented together with abstracts of 6 invited contributions, 3 tutorial papers and 6 poster papers were carefully reviewed and selected from 68 submissions. The papers are organized in topical sections on collective communication, communication protocols, debugging and verification, fault tolerance, metacomputing and grid, parallel I/O, implementation issues, object-oriented message passing, limitations and extensions, performance, and are completed with 6 contributions to the special ParSim session on current trends in numerical simulation for parallel engineering environments.
|
You may like...
Discovering Daniel - Finding Our Hope In…
Amir Tsarfati, Rick Yohn
Paperback
|