The actual Word2Vec paper is a list of interesting experimental results, with two (somewhat hard to understand) novel simplifications to adapt NCE to their setting.
Wasserstein GAN + Vincent Hermann's Expository article - WGAN is hot news in GAN training right now. This paper points out a huge theoretical flaw in the original GAN formulation, derives a principled alternative, and uses it to achieve SOTA experimental results. See also the second resource which is a fantastic expository article about WGAN.
Compressive Sensing: this is what I spend most of my actual time on. Learn more about image compression, JPEG, MRI machines, and signal processing!
A Compressed Sensing view of Unsupervised Representation Learning - Arora, another wonderful author whose work is fun to read, connects compressive sensing back to unsupervised representation learning, both having the shared goal of extracting minimal representations of meaningful information.