Loading...
Abstract
Principal Component Analysis (PCA) is a method of compressing high-dimensional data into a lower-dimensional format that captures the essence of the original structure. PCA is a matrix decomposition technique based on eigen decomposition. It quantifies relationships between variables using covariance matrices, captures the shape of the data distribution, and evaluates the importance of directions using eigenvalues. Therefore, the accuracy of the variance-covariance estimation is crucial for reliable PCA. In high-dimensional settings where the number of observations (n) is much smaller than the number of variables (p) (i.e., n << p), the conventional Maximum Likelihood Estimator (MLE) of covariance becomes poorly conditioned and yields unreliable principal components. To address these limitations, we propose a novel estimation framework called Pairwise Differences Covariance (PDC), along with four regularized extensions: Standardized PDC (SPDC), Local Scaled PDC (LSPDC), Maximum Absolute Scaled PDC (MAXPDC), and Range
Scaled PDC (RPDC). These estimators increase the effective sample size by utilizing all pairwise differences within the data, thereby enhancing estimation stability without requiring additional data collection. Extensive experiments on synthetic and real datasets demonstrate that the proposed
estimators, particularly SPDC, significantly reduce the over-dispersion of the first principal component and improve directional accuracy. On average, SPDC reduced cosine similarity error by approximately 10–30% and narrowed eigenvalue spread by 10–20% compared to MLE and Ledoit-Wolf estimators in n << p HDLSS scenarios. Real-world applications confirm the practical utility and robustness of these methods for analyzing high-dimensional data.
Type
Thesis
Type of thesis
Series
Citation
Date
2025
Publisher
The University of Waikato
Supervisors
Rights
All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.