Dimensionality Reduction Techniques
Dimensionality Reduction Techniques#
The goal of dimensionality reduction is to capture the most important structure in the data using fewer dimensions than the original data. Reducing the dimensionality is mainly desirable for improving performance, filtering out the noise in the data and enable visualization. Dimensionality reduction is potentially boosting performance because of the curse of dimensionality.
Observation 3 (Curse of Dimensionality)
The curse of dimensionality refers to the phenomenon that in (very) high dimensional spaces, data points tend to be approximately equidistant.
The reason for the curse of dimensionality phenomenon is that we can generally assume the existence of noise in the data, and many features mean many possibilities to add noise. Since all geometric machine learning models rely on a notion of similarity (nearest neighbor similarity, inner product similarity, kernel similarity), the loss of meaning of similarity measures in high dimensional spaces is detrimental to the performance of machine learning models. Dimensionality reduction methods try to filter out the noise as much as possible, which makes consecutive data analyses more powerful.