Circumstances when to apply Dimensionality Reduction
- Vusi Kubheka
- Nov 19, 2024
- 2 min read
1. High-Dimensional Data (Curse of Dimensionality)
When datasets have a large number of features (e.g., genomic data, text data, or image data), the model's performance can suffer due to sparsity in feature space, making it harder to generalise to unseen data. Dimensionality reduction mitigates the risk of overfitting and poor performance caused by the curse of dimensionality.
Example: Reducing the number of predictors in a genomic study to avoid overfitting and improve model performance.

2. Correlated Features
If features are highly correlated (e.g., humidity and rainfall in weather prediction), dimensionality reduction techniques like PCA can combine correlated variables into a single principal component to eliminate redundancy and simplify the dataset.
Example: Combining measurements of body length, weight, and diameter in snake classification to create a unified feature.
3. Reducing Model Complexity
High-dimensional models increase computational costs and complexity. Dimensionality reduction simplifies the model by reducing the number of input features without significantly sacrificing information, improving interpretability and efficiency.
Example: Compressing data for real-time applications, such as facial recognition or fraud detection.
4. Visualisation of High-Dimensional Data
Visualising datasets with more than three dimensions is difficult. Dimensionality reduction techniques like PCA can project data into 2D or 3D for easier interpretation.
Example: Reducing a dataset with hundreds of features to two principal components for a scatterplot to identify clusters.
5. Feature Selection or Feature Extraction as Preprocessing
Dimensionality reduction can serve as a preprocessing step before applying machine learning models to improve performance and accuracy.
Example: In email spam detection, dimensionality reduction can filter out irrelevant words/features, retaining only the most relevant ones.
6. Overfitting Prevention
With too many features, models may fit noise in the training data rather than the underlying pattern. Dimensionality reduction removes irrelevant or noisy features, helping the model generalise better.
Example: Simplifying data for a predictive model in stock price forecasting, where some predictors are irrelevant or weakly correlated.
7. Data Compression
When storage space or bandwidth is a concern, dimensionality reduction compresses data without substantial loss of information.
Example: Reducing the dimensionality of high-resolution images for storage or transmission while retaining visual quality.
8. Improving Signal-to-Noise Ratio
Dimensionality reduction helps remove noisy, irrelevant, or redundant features, improving the signal-to-noise ratio in the dataset.
Example: Processing EEG signals for brain activity studies by isolating meaningful components from background noise.
9. Improving Interpretability
A smaller number of variables or components is easier to interpret and analyse, especially in exploratory data analysis.
Example: Understanding socioeconomic factors influencing health outcomes by reducing multiple survey items into composite indices.
10. Reducing Computational Costs
Algorithms like PCA reduce computational expenses, especially in scenarios involving large-scale data and limited resources.
Example: Reducing the complexity of data for a machine learning pipeline in real-time fraud detection.
Situations to Avoid or Be Cautious:
When interpretability of individual features is critical (e.g., in regulatory domains).
If dimensionality reduction discards features containing vital information.
References
GeeksforGeeks. (2023, May 6). Introduction to dimensionality reduction. GeeksforGeeks. https://www.geeksforgeeks.org/dimensionality-reduction/
Murel, J., & Kavlakoglu, E. (n.d.). What is Dimensionality Reduction? Ibm.Com. Retrieved November 19, 2024, from https://www.ibm.com/topics/dimensionality-reduction
Comments