Getting started on Machine Learning
A little bit about me
Before going into any details on machine learning, it is important for me to give you insight into my background with regard to who I am, where I am going, and how I am going about getting there. My name is Brian and I recently graduated from University of California, Santa Cruz with a Master’s in Scientific Computing and Applied Mathematics and a Bachelor’s in Pure Mathematics at UCSC. Throughout my coursework, I primarily studied control systems, dynamics, and optimization. Now, I want to utilize what I know to study Deep Learning, especially as it pertains to video and image time series analysis. On the surface, it may seem like a fairly significant change in direction, but the more I immerse myself in the literature, the more I can find ways to leverage my background to accelerate my personal development. Over the next few months, I will take you on my journey of getting up to speed on Deep Learning.
Developing a learning roadmap:
This week was my first introduction to GANs (generative adversarial networks) and CNNs (convolutional neural networks) for computer vision. Prior to this week, my experience with neural networks was limited exclusively to fully connected MLPs (multi-layer perceptrons) for input classification and RBF (radial basis function) networks for differential system identification and control. With the goal of I decided to orient myself with a two pronged approach: (1) understand GANs and unravel the connection between [1] and [2], and (2) understand image feature detection and classification through CNNs starting with highly successful deep nets Resnet[3], VGG nets[4], and U-net[5]. My goal throughout this week has been to develop a mental roadmap connecting the original GANs paper [1] to the work done for generating deep fakes in the methods of [2]. Likewise, I also want to develop a modern understanding of computer vision techniques which leverage deep CNNs. The latter will be invaluable to developing an understanding of latent spaces and their applications. Moreover, many GANs related to image processing and generation heavily leverage deep CNNs, so there is a natural connection for these two approaches.
As with any learning process, it is necessary to start with developing a working understanding of where I want to be. In this case, I started by reading what I consider to be the original “Deep Fake” paper, [2], and in the process, identified key concepts and frameworks needed to understand their work. I quickly came to understand that the key idea motivating [2] was Siarohin et. al’s prior work developing Monkey-Net [7], the spiritual predecessor to the work in [2]. Of course “Deep Fake” [2] leverages other research and many concepts not introduced in the development of Monkey-Net [7], but developing a rich understanding of Monkey-Net puts me in a much stronger position for taking on “Deep Fake” [2] later. Monkey-Net is fundamental to the understanding of [2] because of the architecture which it proposes. It builds off of U-net’s architecture [5] and utilizes conditional GANs within three main modules: a keypoint detector, a dense motion prediction network, and a motion transfer network. Right now, the working details of this architecture is not my focus, but rather the primary source of inspiration for my learning map:
Getting started on Machine Learning
While I recognize that this is far from a full picture, it provides a framework to build on to understand modern deep learning for image animation. I have intentionally omitted RNNs (recurrent neural networks) and Variational Autoencoders (VAEs) with the idea that it will help me stay more on track towards the present goal. In the future, I plan to include code and technical details, but I wanted to keep this focussed to laying the groundwork for what is to come.
A resource I found after writing this which is very useful for framing the landscape of deep learning in particular is “Deep learning research landscape & roadmap in a nutshell: past, present and future — Towards deep cortical learning” [8].
Sources:
[1] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Wade-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio: “Generative Adversarial Networks”, 2014; [http://arxiv.org/abs/1406.2661 arXiv:1406.2661].
Getting started on Machine Learning
[2] Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, Nicu Sebe: “First Order Motion Model for Image Animation”, 2020; [http://arxiv.org/abs/2003.00196 arXiv:2003.00196].
[3] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun: “Deep Residual Learning for Image Recognition”, 2015; [http://arxiv.org/abs/1512.03385 arXiv:1512.03385].
[4] Karen Simonyan, Andrew Zisserman: “Very Deep Convolutional Networks for Large-Scale Image Recognition”, 2014; [http://arxiv.org/abs/1409.1556 arXiv:1409.1556].
[5] Olaf Ronneberger, Philipp Fischer, Thomas Brox: “U-Net: Convolutional Networks for Biomedical Image Segmentation”, 2015; [http://arxiv.org/abs/1505.04597 arXiv:1505.04597].
[7] Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, Nicu Sebe: “Animating Arbitrary Objects via Deep Motion Transfer”, 2018; [http://arxiv.org/abs/1812.08861 arXiv:1812.08861].
[9] Alec Radford, Luke Metz, Soumith Chintala: “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, 2015; [http://arxiv.org/abs/1511.06434 arXiv:1511.06434].
[8] Aras R. Dargazany: “Deep learning research landscape & roadmap in a nutshell: past, present and future — Towards deep cortical learning”, 2019; [http://arxiv.org/abs/1908.02130 arXiv:1908.02130].
— Brian Loos