Books > Computing & IT > Computer programming
|
Buy Now
Synopses for Massive Data - Samples, Histograms, Wavelets, Sketches (Paperback)
Loot Price: R2,257
Discovery Miles 22 570
|
|
Synopses for Massive Data - Samples, Histograms, Wavelets, Sketches (Paperback)
Series: Foundations and Trends (R) in Databases
Expected to ship within 10 - 15 working days
|
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
describes basic principles and recent developments in building
approximate synopses (i.e., lossy, compressed representations) of
massive data. Such synopses enable approximate query processing, in
which the user's query is executed against the synopsis instead of
the original data. The monograph focuses on the four main families
of synopses: random samples, histograms, wavelets, and sketches. A
random sample comprises a "representative" subset of the data
values of interest, obtained via a stochastic mechanism. Samples
can be quick to obtain, and can be used to approximately answer a
wide range of queries. A histogram summarizes a data set by
grouping the data values into subsets, or "buckets," and then, for
each bucket, computing a small set of summary statistics that can
be used to approximately reconstruct the data in the bucket.
Histograms have been extensively studied and have been incorporated
into the query optimizers of virtually all commercial relational
DBMSs. Wavelet-based synopses were originally developed in the
context of image and signal processing. The data set is viewed as a
set of M elements in a vector - i.e., as a function defined on the
set {0, 1, 2, . . ., M-1} - and the wavelet transform of this
function is found as a weighted sum of wavelet "basis functions."
The weights, or coefficients, can then be "thresholded," for
example, by eliminating coefficients that are close to zero in
magnitude. The remaining small set of coefficients serves as the
synopsis. Wavelets are good at capturing features of the data set
at various scales. Sketch summaries are particularly well suited to
streaming data. Linear sketches, for example, view a numerical data
set as a vector or matrix, and multiply the data by a fixed matrix.
Such sketches are massively parallelizable. They can accommodate
streams of transactions in which data is both inserted and removed.
Sketches have also been used successfully to estimate the answer to
COUNT DISTINCT queries, a notoriously hard problem. Synopses for
Massive Data describes and compares the different synopsis methods.
It also discusses the use of AQP within research systems, and
discusses challenges and future directions. It is essential reading
for anyone working with, or doing research on massive data.
General
Is the information for this product incomplete, wrong or inappropriate?
Let us know about it.
Does this product have an incorrect or missing image?
Send us a new image.
Is this product missing categories?
Add more categories.
Review This Product
No reviews yet - be the first to create one!
|
You might also like..
|