index.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0031)http://paul.rutgers.edu/~ngoyal/ -->
<HTML><HEAD><TITLE>E0 306</TITLE>
<META http-equiv=Content-Type content="text/html; charset=windows-1252">

<META content="Dec 14, 2005">
<BODY text=black bgColor=white><FONT face=arial>
<center><hl><FONT color=#3366ff size=+2>
Deep Learning: Theory and Practice (E0 306)</FONT></hl></center>

<p>


<br>
<br>
<strong>Time:</strong>  Tuesdays and Thursdays, 3:30 PM - 5:00 PM
<br>
<strong>Place: </strong> CSA (252 or 254), Indian Institute of Science
<br><br>

<strong>Instructors: <br>
</strong> 	<A href="https://www.microsoft.com/en-us/research/people/amitdesh/"> Amit Deshpande </A><br>
</strong> 	<A href="https://www.microsoft.com/en-us/research/people/navingo/">Navin Goyal </A>, 
			  	email: navin001 followed by @gmail.com, office hours: right after the class <br>

</strong> 	<A href="https://drona.csa.iisc.ac.in/~anand/"> Anand Louis </A><br>
<br><br>


<HR width="100%">
<a name="lectures">

Notes below are only lightly proof-read or not proof-read at all. <br><br>
<strong>Lecture 1</strong> (Jan 8) <A href="https://github.com/dltnp/dltnp.github.io/blob/master/L1.pdf"> Notes</A> <br>
Introduction to the course and recap of statistical learning theory. <br>
For the large part we followed the first few chapters of [SSBD]<br><br>

<strong>Lecture 2</strong> (Jan 10) Rademacher complexity (Chapter 26 of [SSBD]) <br><br>

<strong>Lecture 3</strong> (Jan 15) Complete Rademacher complexity; sample compression bounds on generalization <br>
While our coverage was somewhat different from that in [SSBD] the topics we covered are all there; there will be no notes
for Lectures 2 and 3<br><br>

<strong>Lecture 4</strong> (Jan 17) <A href="https://github.com/dltnp/dltnp.github.io/blob/master/Scribe_Notes_4.pdf">Notes</A> <br>
Begin brief review of neural networks <br>
<ul>
<li>Suggested reading: </i> For a quick self-contained mathematical introduction to neural networks we recommend Chapters 5 and 7 of the 
<A href="https://web.stanford.edu/~jurafsky/slp3/"> NLP book of Jurafsky--Martin </A>(ignore Section 7.5); 
    this only covers feedforward fully-connected networks but should suffice for our needs at least in the first part of the course  </li> 
<li>Optional reading: </i><A href="http://cs231n.github.io/"> These notes </A> are excellent hands-on introduction to neural nets 
    but are not as succinct as the previous reference </li> 
<li>You can train simple neural networks on two dimensional data 
    at <A href="https://playground.tensorflow.org/">Google Playground</A>. Great visualizations </li>
</ul>  
<strong>Lecture 5</strong> (Jan 22) <A href="https://github.com/dltnp/dltnp.github.io/blob/master/Scribe_Notes_5.pdf">Notes</A> <br>
Backpropagation algorithm 
<ul>
    <li><A href="https://arxiv.org/pdf/1502.05767.pdf">Automatic Differentiation in Machine Learning: a Survey </A>by Baydin et al. This is an excellent source on AD, also mentions some of the recent AD systems but not <A href="https://github.com/clab/dynet"> all </A> </li>  
<li><A href="https://www.researchgate.net/publication/51992325_A_mathematical_view_of_automatic_differentiation">A mathematical view of automatic differentiation </A> by Grienwank</li>
<li><A href="http://www.offconvex.org/2016/12/20/backprop/">Exposition</A> by Arora and Ma</li>
<li><A href="http://neuralnetworksanddeeplearning.com/chap2.html">Exposition</A> by Nielsen </li>
</ul>

<strong>Lecture 6</strong> (Jan 24) <A href="https://github.com/dltnp/dltnp.github.io/blob/master/Scribe_Notes_6.pdf">Notes</A> <br>
Complete backpropagation; initialization of weights (He and Xavier initialization); begin 
discussion of limitations of gradient-based learning  <br><br>
The heuristic justification for He initialization comes from <A href="https://arxiv.org/pdf/1502.01852.pdf"> Delving Deep into Rectifiers</A> by He et al.;
some more recent papers along this line will be part of the project list <br><br>
Papers on the limitations of gradient-based methods:<ul>
<li><A href="https://arxiv.org/abs/1703.07950"> Failures of Gradient-Based Deep Learning </A> by Shalev-Shwartz et al. </li>
<li><A href="https://arxiv.org/abs/1707.04615"> On the Complexity of Learning Neural Networks </A> by Song et al. </li>
<li> <A href="https://arxiv.org/abs/1812.06369"> Provable limitations of deep learning </A> by Abbe and Sandon </li>
</ul>
<br>

<strong>Lecture 7</strong> (Jan 29) <br>
Complete the discussion of limitations of gradient-based methods; begin decomposition of test error into approximation, optimization, generalization errors

<br><br>

<strong>Lecture 8</strong> (Jan 31) <A href="https://github.com/dltnp/dltnp.github.io/blob/master/Scribe_Notes_8.pdf">Notes</A> <br>
Discussion of approximation, optimization and generalization error <br>
Begin discussion of <A href="https://arxiv.org/abs/1811.03962">A Convergence Theory for Deep Learning via Over-Parameterization</A> by Allen-Zhu et al.

<br><br>

<strong>Lecture 9</strong> (Feb 5) <br>
Outline of the proof of <A href="https://arxiv.org/abs/1811.03962">A Convergence Theory for Deep Learning via Over-Parameterization</A> by Allen-Zhu et al.<br>
Statement of the two-layer result of <A href="https://arxiv.org/abs/1811.04918">this paper</A> by Allen-Zhu et al. on generalization in DL

<br><br>

<strong>Lecture 10</strong> (Feb 12) <br>
Expressive power of neural networks, universal approximation theorem 
<br><br>

<strong>Lecture 11</strong> (Feb 14) <br>
Low-degree polynomial approximations to sigmoid, ReLU neurons and feedforward neural networks, 
	discussion of depth separation, depth-width-weights trade-off results
<br><br>

<strong>Lenaic Chizat's talk </strong> (Feb 19) <br>

<br><br>

<strong>Midterm Exam</strong> (Feb 21) <br>

<br><br>

<strong>Lecture 13</strong> (Feb 26) <br>
Adversarial examples, <a href="https://homes.cs.washington.edu/~pedrod/papers/kdd04.pdf">adversarial classification</a> by Dalvi et al. 
	(KDD'04), <a href="https://arxiv.org/abs/1312.6199">intriguing properties of neural networks</a> by Szegedy et al., 
	Fast Gradient Sign Method (FGSM) attack 
<br><br>

<strong>Lecture 14</strong> (Feb 28) <br>
	Adversarial examples, <a href="https://arxiv.org/abs/1511.04599">DeepFool attack</a>, 
	<a href="https://arxiv.org/abs/1610.08401">universal adversarial perturbations</a> by Moosavi-Dezfooli et al.
<br><br>

<strong>Lecture 15</strong> (March 5) <A href="https://github.com/dltnp/dltnp.github.io/blob/master/Scribe_Notes_15.pdf">Notes</A> <br>
Gradient descent on convex functions (Chapter 1 of <A href="http://www.cs.yale.edu/homes/vishnoi/Nisheeth-VishnoiFall2014-ConvexOptimization.pdf">this</A>).
<br><br>

<strong>Lecture 16</strong> (March 7) <A href="https://github.com/dltnp/dltnp.github.io/blob/master/Scribe_Notes_16.pdf">Notes</A> <br>
Gradient descent for strongly convex functions, and for smooth convex functions (Chapter 1 of <A href="http://www.cs.yale.edu/homes/vishnoi/Nisheeth-VishnoiFall2014-ConvexOptimization.pdf">this</A>). 
<br><br>

<strong>Lecture 17</strong> (March 12) <br>
Discussion on preconditioners, AdaGrad, etc.
<ul>
<li> See chapter 3.2 of <A href="http://www.cs.yale.edu/homes/vishnoi/Nisheeth-VishnoiFall2014-ConvexOptimization.pdf">this</A> for a discussion of the Newton's method.
<li> See chapter 5.6 of <A href="http://ocobook.cs.princeton.edu/">this</A> for a discussion on AdaGrad.
	<li> See also notes from <A href="https://people.csail.mit.edu/madry/6.883/files/lecture_4.pdf">Madry's course</A> and <A href="https://www.cs.princeton.edu/courses/archive/fall18/cos597G/lecnotes/lecture5.pdf">Arora's course</A>.
</ul>

<br><br>

<strong>Lecture 18</strong> (March 14) <br>
Escaping from saddle points, discussion of <A href="http://proceedings.mlr.press/v49/lee16.html">this paper</A> by Lee et al., and <A href="https://arxiv.org/abs/1703.00887">this paper</A> by Jin et al.


<br><br>
	
<strong>Lecture 19</strong> (March 21) <br>
<a href="https://github.com/dltnp/dltnp.github.io/blob/master/gans-lecture-1.pptx">Generative Adversarial Networks, Part 1</a>
<br><br>
	
<strong>Lecture 20</strong> (March 26) <br>
<a href="https://github.com/dltnp/dltnp.github.io/blob/master/gans-lecture-2.pptx">Generative Adversarial Networks, Part 2</a>
<br><br>

<strong>Lecture 21</strong> (March 28) <br>
Telgarsky's theorem from his lecture notes <a href="https://github.com/dltnp/dltnp.github.io/blob/master/telgarsky-lecture-1.pdf">part 1</a> 
and <a href="https://github.com/dltnp/dltnp.github.io/blob/master/telgarsky-lecture-2.pdf">part 2</a>, 
	and a detour into <a href="https://openreview.net/forum?id=Bklr3j0cKX"> Representation learning</a>
<br><br>
	
<HR width="100%">

<strong>Assignments</strong> <br><br>

<A href="https://github.com/dltnp/dltnp.github.io/blob/master/A1.pdf"> Assignment 1</A> due Jan 31, beginning of class<br>
<A href="https://github.com/dltnp/dltnp.github.io/blob/master/A2.pdf"> Assignment 2</A> due March 21, beginning of class
<br><br>
<HR width="100%">

<strong>Scribe Notes</strong> <br><br>
Scribe notes are not merely reproduction of what was written on the board in class: 
You are expected to fill in some of the details that were skimmed over in the class and write in a way that the reader just by 
reading the notes is able to achieve almost the same understanding as by attending the lecture. Please turn in the notes
within a week afer the lecture. Please turn in all the files not just pdf. 
Please use the style from Lecture 4; here are the <A href="https://github.com/dltnp/dltnp.github.io/blob/master/L4.zip">source files</A>.
<br><br>

<HR width="100%">
<br><br>
<a name="description">
<strong>Course Description: </strong> 
The area of deep learning has been making rapid empirical advances, 
however this success is largely guided by intuition and trial and error and remains more of 
an art than science. We lack theory that applies "end-to-end." While the traditional theory of 
machine learning leaves much to be desired, current research to remedy this is very active. 
Besides being of interest in its own right, progress on theory has the potential to further 
improve the current deep learning methods. This course will bring students up to date to the 
current fast-moving frontier. Our primary focus will be on theoretical aspects.

<br><br>
Brief tentative list of topics: <br><br>
<li>Recap of statistical learning theory: Rademacher complexity and other generalization bounds</li>

<li>Quick introduction to the basics of neural networks</li>

<li>Generalization in deep learning</li>

<li>Expressive power of neural networks</li>

<li>Adversarial examples</li>

<li>Optimization for deep learning</li>

<li>Generative models</li>


<br><br>
<strong>Prerequisites:</strong> Probability, linear algebra and optimization. Previous exposure to machine learning and deep learning 
will be helpful. 

<br><br>
<strong>Grading:</strong> 
bi-weekly homework assignments (30%), scribe notes (10%), midterm (35%), paper presentation (25%) (a list of topics will be shared on Piazza)


<br><br>
<strong>Text:</strong> There is no required text for the class. However, the following references will be useful.
<li> [SSBD] <A href="http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/index.html">Understanding Machine Learning: From Theory to Algorithms by Shalev-Schwartz and Ben-David</A> </li>
<li> [DL] <A href="https://www.deeplearningbook.org/">Deep Learning</A> by Goodfellow, Bengio and Courville
<li> <A = href="https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.html#">High-Dimensional Probability</A> by Vershynin</li>
<li>For optimization you can look <A href ="http://www.cs.yale.edu/homes/vishnoi/Nisheeth-VishnoiFall2014-ConvexOptimization.pdf">here</A>
  and <A href="http://ocobook.cs.princeton.edu/">here</A>    </li>


</FONT>
</BODY>
</HTML>