Accounting for boundary effects in nearest-neighbor searching

TitleAccounting for boundary effects in nearest-neighbor searching
Publication TypeJournal Articles
Year of Publication1996
AuthorsArya S, Mount D, Narayan O
JournalDiscrete & Computational Geometry
Pagination155 - 176
Date Published1996///

Given n data points ind-dimensional space, nearest-neighbor searching involves determining the nearest of these data points to a given query point. Most averagecase analyses of nearest-neighbor searching algorithms are made under the simplifying assumption thatd is fixed and thatn is so large relative tod thatboundary effects can be ignored. This means that for any query point the statistical distribution of the data points surrounding it is independent of the location of the query point. However, in many applications of nearest-neighbor searching (such as data compression by vector quantization) this assumption is not met, since the number of data pointsn grows roughly as 2 d .Largely for this reason, the actual performances of many nearest-neighbor algorithms tend to be much better than their theoretical analyses would suggest. We present evidence of why this is the case. We provide an accurate analysis of the number of cells visited in nearest-neighbor searching by the bucketing andk-d tree algorithms. We assumem dpoints uniformly distributed in dimensiond, wherem is a fixed integer ≥2. Further, we assume that distances are measured in theL ∞ metric. Our analysis is tight in the limit asd approaches infinity. Empirical evidence is presented showing that the analysis applies even in low dimensions.