Coresets
A coreset is a subset Q of input points P that well approximates P. Unlike \eps-net, \eps-approximation that approximate the input set combinatorially, a coreset approximates the input set under geometric measures. Since a coreset usually has a small size, 1/\eps^{O(1)}, it can be used to speed up many geometric algorithms.
The basic idea is to find a small subset whose convex hull approximates the convex hull of P. A simple algorithm is to put a grid and select one point in each extremal cell. A better algorithm is to put a large ball enclosing all the points, find a nice (uniform) sampling of points on the surface of the ball, and find the (approximate) nearest neighbor in P of each sample point. The collection of (approximate) nearest neighbors is a coreset. The size of the coreset is 1/\eps^{(d-1)/2} (The big enclosing ball is a (d-1)-dim manifold, the sampling rate is about \sqrt{\eps}).
The above algorithm, however, has an exponential dependence on the dimension. For high dimensional data, special techniques are used to find a coreset whose size is only polynomial dependent on dimension.
The basic idea is to find a small subset whose convex hull approximates the convex hull of P. A simple algorithm is to put a grid and select one point in each extremal cell. A better algorithm is to put a large ball enclosing all the points, find a nice (uniform) sampling of points on the surface of the ball, and find the (approximate) nearest neighbor in P of each sample point. The collection of (approximate) nearest neighbors is a coreset. The size of the coreset is 1/\eps^{(d-1)/2} (The big enclosing ball is a (d-1)-dim manifold, the sampling rate is about \sqrt{\eps}).
The above algorithm, however, has an exponential dependence on the dimension. For high dimensional data, special techniques are used to find a coreset whose size is only polynomial dependent on dimension.

0 Comments:
Post a Comment
<< Home