Introduction to Deep Learning for Facial and Gesture Understanding


Raymond W. Ptucha


Deep learning has been revolutionizing the machine learning community. This tutorial will first review Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). After understanding what and how CNNs and RNNs work, participants will cover techniques such as fully convolutional, sequence models, and latent vector representations in preparation to understand the latest methods for face detection, people detection, body pose estimation, facial recognition, super resolution, generative networks, and video processing. Sample code will be reviewed and distributed so that upon completion, participants can run code on their own data. The final third of this tutorial is an NVIDIA Deep Learning Institute hands-on session where participants will practice building and testing their own deep models using provided cloud resources.


Raymond Ptucha is an Assistant Professor in Computer Engineering and Director of the Machine Intelligence Laboratory at Rochester Institute of Technology. His research specializes in machine learning, computer vision, and robotics. Ray was a research scientist with Eastman Kodak Company where he worked on computational imaging algorithms and has been awarded 31 U.S. patents. He graduated from SUNY/Buffalo with a B.S. in Computer Science and a B.S. in Electrical Engineering. He earned a M.S. in Image Science from RIT. He earned a Ph.D. in Computer Science from RIT in 2013. Ray was awarded an NSF Graduate Research Fellowship in 2010 and his Ph.D. research earned the 2014 Best RIT Doctoral Dissertation Award. Ray is an NVIDIA Deep Learning Institute Certified Instructor and University Ambassador. Ray is a passionate supporter of STEM education and is an active member of his local IEEE chapter and FIRST robotics organizations.