Approximation Spaces in Data Mining

Dr. Ole Nielsen
Computer Sciences Laboratory, Research School of Information Sciences and Engineering (RSISE)

Darran Edmundson
ANU Supercomputer Facility Vizlab

The left-hand image illustrates a "scale decomposition" of a function defined on the unit cube: The large blocks correspond to fine scales and the small blocks to coarse scales. The largest block, for example, represents fine scales in all three dimensions, whereas the rectangular blocks represent scales that are captured differently in different dimensions. The small block in the front lower left corner captures the coarsest scales and constitutes a very crude approximation on which all the other blocks add successive refinements.

The entire collection of blocks contains all information needed to recover the original function. However, for most realistic functions, it turns out that blocks that capture the finest scale in more than one dimension contribute only very little to the original function. These blocks are also the largest, so by discarding them one can obtain a much more compact representation, which approximates the original function well using only a fraction of the original information. This is illustrated in the right-hand image where discarded blocks have been peeled away.

This process, known as a-priori or data-independent compression, generalizes to higher dimensions where it is the basis of several algorithms that would not have been feasible otherwise.