As part of a school assignment, I was writing RobustPCA to separate the background (low-rank, still) and the player (sparse, moving) from a simple basketball video. As one would expect, SVD was the most time consuming part of a iteration. The video matrix (M, m pixels x n frames) was thin (I'm jealous) and long. So, I decided to write this routine to see if I can speed up the process. In the end, I decided to simply throw more cpus at scipy.linalg.svd
.
I implement this research paper here.
Speed comparison and some graphs maybe.