Subspace clustering is the problem of clustering data drawn from a union of multiple subspaces. The most popular subspace clustering framework in recent years is the spectral clustering-based approach, which performs subspace clustering by first computing an affinity matrix and then applying spectral clustering to it. One of the representative methods for computing an affinity matrix is the least square regression (LSR) model, which is based on the idea of self-representation. Although its efficiency and effectiveness have been empirically validated, it lacks some theoritical analysis and practicality, e.g.: absense of interpretations, lack of theoretical analysis on its robustness, absense of guidelines for choosing the hyper-parameter, and the scalability. This paper aims at providing novel insights for better understanding on LSR, and also improving its practicality. For this purpose, we present four contributions: first, we present a novel interpretation of LSR, which is based on random sampling perspective. Second, we provide novel theoretical analysis on LSR's robustness toward outliers. Third, we theoretically and empirically demonstrate that selecting a larger value for the hyper-parameter tends to result in good clustering results. Finally, we derive another equivalent form of the LSR's solution, which can be computed with less time complexity than the original form regarding the data size.
Supplementary material (ZIP)