We will summarize, compare and discuss in particular the following references:
SVD and PCA
[Baldi and Hornik 1989] Baldi, Pierre and Kurt Hornik. “Neural networks and principal component analysis: Learning from examples without local minima.” Neural Networks 2 (1989).
[Refinetti and Goldt 2022] Refinetti, Maria and Sebastian Goldt. “The dynamics of representation learning in shallow, non-linear autoencoders.” ArXiv abs/2201.02115 (2022)
[Tzeng et al. 2022] Tzeng, R., Wang, P., Adriaens, F., Gionis, A., & Lu, C. (2022). Improved analysis of randomized SVD for top-eigenvector approximation. ArXiv, abs/2202.07992, accepted at AISTATS 22.
[Baglama & Reichel 2005] Baglama, J., & Reichel, L. (2005). Augmented implicitly restarted Lanczos bidiagonalization methods. SIAM Journal on Scientific Computing.
[Bai et al. 2021] Bai, X., Buccini, A., & Reichel, L. (2021). Golub–Kahan vs. Monte Carlo: a comparison of bidiagonlization and a randomized SVD method for the solution of linear discrete ill-posed problems. BIT Numerical Mathematics.
[Cavazza et al. 2018] Cavazza, J., Morerio, P., Haeffele, B.D., Lane, C., Murino, V., & Vidal, R. (2018). Dropout as a Low-Rank Regularizer for Matrix Factorization. AISTATS.
[Fan 1949] Fan, K. (1949). On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations I. Proceedings of the National Academy of Sciences of the United States of America, 35 11, 652-5 .
NMF
[Aggarwal 2018] Aggarwal, C.C. (2018). Matrix Factorization and Topic Modeling.
[Bolte et al. 2014] Bolte, J., Sabach, S., & Teboulle, M. (2014). Proximal alternating linearized minimization or nonconvex and nonsmooth problems. Mathematical Programming.
[Copar et al. 2019] Copar, A., Zupan, B., & Zitnik, M. (2019). Fast optimization of non-negative matrix tri-factorization. PloS one.
[Driggs et al. 2020] Driggs, D., Tang, J., Liang, J., Davies, M., & Schönlieb, C.B. (2020). Spring: A fast stochastic proximal alternating method for non-smooth non-convex optimization. arXiv preprint arXiv:2002.12266.
[Gaussier & Goutte 2005] Gaussier, É., & Goutte, C. (2005). Relation between PLSA and NMF and implications. SIGIR ‘05.
[Gillis & Glineur 2012] Gillis, N., & Glineur, F. (2012). Accelerated multiplicative updates and hierarchical ALS algorithms for nonnegative matrix factorization. Neural computation.
[Grippo & Sciandrone 2000] Grippo, L., & Sciandrone, M. (2000). On the convergence of the block nonlinear Gauss–Seidel method under convex constraints. Operations research letters.
[Kim et al. 2014] Kim, J., He, Y., & Park, H. (2014). Algorithms for nonnegative matrix and tensor factorizations: A unified view based on block coordinate descent framework. Journal of Global Optimization.
[Le et al. 2020] Le, H., Gillis, N., & Patrinos, P. (2020). Inertial block proximal methods for non-convex non-smooth optimization. International Conference on Machine Learning (ICML).
[Lee & Seung 1999] Lee, D., & Seung, H. (1999). Learning the parts of objects by non-negative matrix factorization. Nature.
[Lee & Seung 2001] Lee, D., & Seung, H. (2001). Algorithms for non-negative matrix factorization. Advances in neural information processing systems (NeurIPS).
[Lin 2007] Lin, C.J. (2007). Projected gradient methods for nonnegative matrix factorization. Neural computation.
[Li & Ding 2006] Li, T., & Ding, C. (2006). The relationships among various nonnegative matrix factorization methods for clustering. International Conference on Data Mining (ICDM).
[Mair & Brefeld 2019] Mair, S., & Brefeld, U. (2019). Coresets for Archetypal Analysis. NeurIPS.
[Parikh & Boyd 2014] Parikh, N., & Boyd, S. (2014). Proximal algorithms. Foundations and Trends in Optimization.
[Peharz et al. 2010] Peharz, R., Stark, M., & Pernkopf, F. (2010). Sparse nonnegative matrix factorization using ℓ0-constraints. IEEE International Workshop on Machine Learning for Signal Processing.
[Pock & Sabach 2016] Pock, T., & Sabach, S. (2016). Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM Journal on Imaging Sciences.
[Sivaprasad et al. 2021] Sivaprasad, S., Singh, A., Manwani, N., & Gandhi, V. (2021). The Curious Case of Convex Neural Networks. ECML/PKDD.
[Wang & Zhang 2013] Wang, Y.X., & Zhang, Y.J. (2013). Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on Knowledge and Data Engineering (TKDE).
k-means
[Bauckhage 2015] Bauckhage, C. (2015). K-means clustering is matrix factorization. arXiv preprint arXiv:1512.07548.
[Fernsel 2021] Fernsel, P. (2021). Spatially Coherent Clustering Based on Orthogonal Nonnegative Matrix Factorization. Journal of Imaging.
[Lloyd 1982] Lloyd, S. (1982). Least squares quantization in PCM. IEEE transactions on information theory.
[Pompili et al. 2014] Pompili, F., Gillis, N., Absil, P.A., & Glineur, F. (2014). Two algorithms for orthogonal nonnegative matrix factorization with application to clustering. Neurocomputing.
[Whang et al. 2015] Whang, J., Dhillon, I., & Gleich, D. (2015). Non-exhaustive, overlapping k-means. Proceedings of the SIAM International Conference on Data Mining (SDM).
[Zha et al. 2002] Zha, H., He, X., Ding, C., Gu, M., & Simon, H. (2002). Spectral relaxation for k-means clustering. Advances in neural information processing systems (NeurIPS).
Nonconvex Clustering
[Dhillon et al. 2004] Dhillon, I., Guan, Y., & Kulis, B. (2004). Kernel k-means: spectral clustering and normalized cuts. Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining (SIGKDD).
[Ding et al. 2005] Ding, C., He, X., & Simon, H. (2005). On the equivalence of nonnegative matrix factorization and spectral clustering. Proceedings of the SIAM International Conference on Data Mining (SDM).
[Ding et al. 2008] Ding, C., Li, T., & Jordan, M. (2008). Nonnegative matrix factorization for combinatorial optimization: Spectral clustering, graph matching, and clique finding. IEEE International Conference on Data Mining (ICDM).
[Hess et al. 2019] Hess, S., Duivesteijn, W., Honysz, P., & Morik, K. (2019). The SpectACl of Nonconvex Clustering: A Spectral Approach to Density-Based Clustering. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).
[Schölkopf et al. 1998] Schölkopf, B., Smola, A., & Müller, K.R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural computation.
[Von Luxburg 2007] Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing.
Biclustering
[Del Buono & Pio 2015] Del Buono, N., & Pio, G. (2015). Non-negative matrix tri-factorization for co-clustering: an analysis of the block matrix. Information Sciences.
[Dhillon 2001] Dhillon, I. (2001). Co-clustering documents and words using bipartite spectral graph partitioning. Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining (SIGKDD).
[Ding et al. 2006] Ding, C., Li, T., Peng, W., & Park, H. (2006). Orthogonal nonnegative matrix t-factorizations for clustering. Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining (SIGKDD).
[Fischer & Vreeken 2021] Fischer, J., & Vreeken, J. (2021). Differentiable Pattern Set Mining. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.
[Han et al. 2017] Han, J., Song, K., Nie, F., & Li, X. (2017). Bilateral k-means algorithm for fast co-clustering. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).
[Henriques et al. 2015] Henriques, R., Antunes, C., & Madeira, S. (2015). A structured view on pattern mining-based biclustering. Pattern Recognition.
[Hess et al. 2017] Hess, S., Morik, K., & Piatkowski, N. (2017). The PRIMPING routine: tiling through proximal alternating linearized minimization. Data Mining and Knowledge Discovery.
[Hess et al. 2018] Hess, S., Piatkowski, N., & Morik, K. (2018). The Trustworthy Pal: controlling the False Discovery Rate in Boolean Matrix Factorization. Proceedings of the SIAM International Conference on Data Mining (SDM).
[Hess et al. 2021] Hess, S., Pio, G., Hochstenbach, M.E., & Ceci, M. (2021). BROCCOLI: overlapping and outlier-robust biclustering through proximal stochastic gradient descent. Data Min. Knowl. Discov., 35, 2542-2576.
[Kriegel et al. 2009] Kriegel, H.P., Kröger, P., & Zimek, A. (2009). Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data (TKDD).
[Madeira & Oliveira 2004] Madeira, S., & Oliveira, A. (2004). Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB).
[Miettinen et al. 2008] Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., & Mannila, H. (2008). The discrete basis problem. IEEE Transactions on Knowledge and Data Engineering (TKDE).
[Miettinen & Neumann 2020] Miettinen, P., & Neumann, S. (2020). Recent Developments in Boolean Matrix Factorization. IJCAI.
[Nie et al. 2017] Nie, F., Wang, X., Deng, C., & Huang, H. (2017). Learning a structured optimal bipartite graph for co-clustering. Advances in Neural Information Processing Systems (NeurIPS).
[Song et al. 2021] Song, K., Yao, X., Nie, F., Li, X., & Xu, M. (2021). Weighted bilateral K-means algorithm for fast co-clustering and fast spectral clustering. Pattern Recognition.
[Wang et al. 2011]Wang, H., Nie, F., Huang, H., & Makedon, F. (2011). Fast nonnegative matrix tri-factorization for large-scale data co-clustering. IJCAI Proceedings-International Joint Conference on Artificial Intelligence (IJCAI).
[Zha et al. 2001] Zha, H., He, X., Ding, C., Simon, H., & Gu, M. (2001). Bipartite graph partitioning and data clustering. Proceedings of the international conference on Information and knowledge management.
[Zhang et al. 2010] Zhang, Z.Y., Li, T., Ding, C., Ren, X.W., & Zhang, X.S. (2010). Binary matrix factorization for analyzing gene expression data. Data Mining and Knowledge Discovery.
Deep Learning as Kernel k-means
[Hess et al. 2020] Hess, S., Duivesteijn, W., & Mocanu, D.C. (2020). Softmax-based Classification is k-means Clustering: Formal Proof, Consequences for Adversarial Attacks, and Improvement through Centroid Based Tailoring. ArXiv, abs/2001.01987.
[Niepert et al. 2021] Niepert, M., Minervini, P., & Franceschi, L. (2021). Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions. ArXiv, abs/2106.01798.
[Sinhamahapatra et al. 2022] Sinhamahapatra, P., Koner, R., Roscher, K., & Günnemann, S. (2022). Is it all a cluster game? - Exploring Out-of-Distribution Detection based on Clustering in the Embedding Space. ArXiv, abs/2203.08549.