math189r.github.io/index.html at master · math189r/math189r.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
<!doctype html>

<html lang="en">
<head>
    <meta charset="utf-8">

    <title>math189r</title>
    <meta name="description" content="Harvey Mudd Mathematics of Big Data I">

    <link rel="stylesheet" href="css/styles.css">
</head>

<body>

    <div class="header">
        <h2> <a href="index.html">math189r</a> </h2>
        <h3>
            <a href="info/">Info</a>
        </h3>
    </div>

    <div class="body">
        <h3>Mathematics of Big Data I</h3>
        <p>
            Professor Weiqing Gu</br>
            Harvey Mudd College</br>
            Fall 2016
        </p>

        <p class="left-align">
            <b>Readings</b> should be done <i>before</i> class. Reading summaries (specification in info section) are due for all non-Murphy readings at the start of class.
        </p>

<table>
    <thead>
        <tr>
            <th></th>
            <th>Monday</th>
            <th>Thursday</th>
        </tr>
    </thead>
    <tbody>

        <tr>
            <td>Aug. 29</td>
            <td></td>
            <td>
                <a href="materials/section/linear.pdf">Linear Algebra and Matrix Calculus Review</a></br></br>
                <a href="materials/section/convex.pdf">Convex Optimization</br>Overview I</a> </br></br>
                <a href="notes/week_0/linear_convex_review.pdf">(notes from review)</a>
            </td>
        </tr>

        <!-- Supervised Learning -->
        <tr>
            <td><b>Supervised Learning</b></br><a href="notes/week_1/lecture.pdf">Sept. 5</a></td>
            <td>
                Introduction to Big Data. Linear Regression. Normal
                Equations and Optimization Techniques. Solving the Normal
                Equations efficiently (Cholesky Decomposition).
                Various forms of Linear regression.
                 </br></br>

                <b>Read:</b> Murphy 1.{all} (DEFINITELY READ BEFORE FIRST CLASS)</br>
                <b>Read:</b> Murphy, 7.{1,...,5}
            </td>
            <td>
                <a href="http://cs229.stanford.edu/section/cs229-prob.pdf">Probability Review</a></br></br>
                <a href="notes/week_1/probability_review.pdf">(notes from review)</a></br></br>
                (also Murphy 2.{all} is a great resource)
            </td>
        </tr>
        <tr>
            <td><a href="notes/week_2/lecture.pdf">Sept. 12</a></td>
            <td>
                Classification. K-Nearest Neighbors.
                Logistic Regression. Exponential
                Family and Generalized Linear Models. Logistic
                Regression as a GLM.</br></br>

                <b>Read:</b> Murphy, 8.{1,2,3,5} \ 8.{3.4,3.5}, 9.{1,2.2,2.4,3}
                <b>Due:</b> <a href="hw/pset/sept_19.pdf">Homework 1 (first third)</a> <a href="hw/pset/sept_19_sol.pdf">(solutions)</a></br>
            </td>
            <td>
                <a href="materials/section/convex.pdf">Convex Optimization</br>Overview II</a></br></br>
                Tutorial on Scientific Python
            </td>
        </tr>
        <tr>
            <td><a href="notes/week_2/lecture.pdf">Sept. 19</a></td>
            <td>
                Generalized Linear Models continued.
                Poisson Regression. Softmax Regression.
                Covariance matrix; Multivariate Gaussian
                Distribution. Marginalized Gaussian and the Schur
                Complement.
                Regularization continued. Big Data as
                a Regularizer.
                </br></br>

                <b>Read:</b> Murphy 9.7, </br>4.{1,2,3,4,5,6} (important background)</br>
                <b>Due:</b> <a href="hw/pset/sept_19.pdf">Homework 1 (second third)</a></br>
            </td>
            <td>
            </td>
        </tr>
        <tr>
            <td><a href="notes/week_3/lecture.pdf">Sept. 26</a></td>
            <td>
                Dimensionality Reduction; Spectral Decomposition, Singular Value Decomposition,
                and Principal Component Analysis.
                </br></br>

                Generative Learning Algorithms.
                Gaussian Discriminant Analysis.
                </br></br>

                <b>Due:</b> Final Project Proposal
                <b>Due:</b> <a href="hw/pset/sept_19.pdf">Homework 1 (last third)</a>
            </td>
            <td>
                Scientific Computing Review / Help Session
            </td>
        </tr>
        <tr>
            <td><a href="notes/week_4/lecture.pdf">Oct.  3</a></td>
            <td>
                Naive Bayes. L1 Regularization and Sparsity.
                Lasso. Support Vector Machines. Kernels.</br></br>

                <b>Read:</b> Murphy 14.{1,2,3,4} \ 14.{4.4}</br>
                <b>Read:</b> <a href="http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf">MapReduce: Simplified Data Processing on Large Clusters</a></br>
                <b>Due:</b> <a href="hw/pset/oct_10.pdf">Homework 2 (first half)</a> <a href="hw/pset/oct_10_sol.pdf">(solutions)</a>
            </td>
            <td>
                Midterm Review
            </td>
        </tr>

        <!-- Unsupervised Learning -->
        <tr>
            <td><b> Unsupervised Learning </b></br><a href="notes/week_5/lecture.pdf">Oct.  10</a></td>
            <td>
                Introduction to Unsupervised Learning.
                Clustering. K-Means. Mixture of Gaussians.
                Expectation-Maximization (EM) Algorithm.</br></br>

                <b>Read:</b> Murphy 11.{1,2,3,4} \ 11.{4.6,4.9}</br>
                <b>Read:</b> <a href="http://www.ee.oulu.fi/research/imag/courses/Vedaldi/ShalevSiSr07.pdf">Pegasos: Primal Estimated sub-GrAdient SOlver for SVM</a></br>
                <b>Read:</b> <a href="https://people.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf">Random Features for Large-Scale Kernel Machines</a> (deep insight)</br>
                <b>Due:</b> <a href="hw/pset/oct_10.pdf">Homework 2 (second half)</a>
            </td>
            <td>
                <a href="materials/section/constrained_optimization.pdf">Constrained Optimization</a>
            </td>
        </tr>
        <tr>
            <td><a href="">Oct.  17</a></td>
            <td>Fall Break</td>
            <td>
                <a href="materials/section/mapreduce.pdf">MapReduce and</br>Distributed Computation</br> and Learning</a>
            </td>
        </tr>
        <tr>
            <td><a href="notes/week_7/lecture.pdf">Oct.  24</a></td>
            <td>
                Principal Component Analysis (PCA) Review.
                Kernel PCA. One Class Support Vector
                Machines. </br></br>

                <b>Read:</b> Murphy 12.2.{0,1,2,3} 14.4.4</br>
                <b>Read:</b> <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.675.575&rep=rep1&type=pdf"> Support Vector Method for Novelty Detection</a></br>
                <b>Due:</b> <a href="exam/math189r_fall_2016_midterm.pdf">Midterm</a>. (<a href="exam/math189r_fall_2016_midterm_solution.pdf">solutions</a>) (<a href="exam/results.txt">results</a>)</br>
                <b>Due:</b> Final Project Progress Report.
            </td>
            <td>
                <a href="materials/section/deployment.pdf">Evaluating Models; Deployment</a>
            </td>
        </tr>
        <tr>
            <td><b> Learning Theory </b></br><a href="notes/week_8/lecture.pdf">Oct.  31</a></td>
            <td>
                Learning Theory. VC Dimension. Bias/Variance
                Trade-off. Union and Chernoff/Hoeffding
                Bounds.</br></br>

                <b>Read:</b> <a href="http://papers.nips.cc/paper/4337-large-scale-sparse-principal-component-analysis-with-application-to-text-data.pdf">Large-Scale Sparse Principal Component Analysis with Application to Text Data</a></br>
                <b>Read:</b> <a href="http://projecteuclid.org/download/pdf_1/euclid.aos/1176346060">On the Convergence Properties of the EM Algorithm</a></br>
                <b>Due:</b> <a href="hw/pset/nov_7.pdf">Homework 3 (first half)</a> (<a href="hw/pset/nov_7_sol.pdf">solutions</a>)
            </td>
            <td>
                <a href="materials/section/hmm.pdf">Applications of EM: Hidden Markov Models</a>
            </td>
        </tr>

        <!-- Recommender Systems -->
        <tr>
            <td><b> Recommender Systems </b></br><a href="">Nov.  7</a></td>
            <td>
                Introduction to Recommender Systems. Collaborative
                Filtering. Non-Negative Matrix Factorization.
                Using Non-Negative Matrix Factorization for Topic
                Modelling.</br></br>

                <b>Read:</b> Murphy 27.6.2</br>
                <b>Read:</b> <a href="http://sifter.org/~simon/journal/20061211.html">Netflix Update: Try This at Home</a></br>
                <b>Due:</b> <a href="hw/pset/nov_7.pdf">Homework 3 (second half)</a>
            </td>
            <td>
            </td>
        </tr>

        <!-- Graph Methods -->
        <tr>
            <td><b>Graph Methods</b></br><a href="">Nov.  14</a></td>
            <td>
                Graphs. Graph representations as data. The Laplacian
                and usage of Spectral (Eigenvalue-Eigenvector) information.
                </br></br>
                Directed Graphical Models (Bayesian Networks). Conditional
                Independence. Naive Bayes
                as a Graphical Model. Plate Notation.
                </br></br>

                <b>Read:</b> Murphy 10.{1,2,3,4,5,6}</br>
                <b>Due:</b> <a href="hw/pset/nov_21.pdf">Homework 4 (first half)</a> (<a href="hw/pset/nov_21_sol.pdf">solutions</a>)
            </td>
            <td>
                <a href="materials/section/xsede.pdf">XSEDE Demo</a>
            </td>
        </tr>

        <!-- Bayesian Learning -->
        <tr>
            <td><b>Bayesian Learning</b></br><a href="">Nov.  21</a></td>
            <td>
                Recap of Bayesian Reasoning.
                Bayesian Linear Regression (which we've already seen).
                Bayesian Logistic Regression. Intractable Integrals and
                Motivation for Approximate Methods.</br></br>

                <b>Read:</b> Murphy 5.{1,2,3.0,3.2} 7.6, 8.4</br>
                <b>Read:</b> <a href="https://hips.seas.harvard.edu/files/adams-changepoint-tr-2007.pdf">Bayesian Online Changepoint Detection</a></br>
                <b>Due:</b> <a href="hw/pset/nov_21.pdf">Homework 4 (second half)</a>
            </td>
            <td>Thanksgiving</td>
        </tr>
        <tr>
            <td><a href="">Nov.  28</a></td>
            <td>
                Monte-Carlo Methods. Rejection Sampling.
                Importance Sampling. Intro to Markov-Chain
                Monte-Carlo.
                Gibbs Sampling. The Metropolis-Hastings
                Algorithm. </br></br>

                <b>Read:</b> Murphy 23.{1,2,3,4} \ 23.4.3,
                                    24.{1,2.(1,2,3,4), 3,4} \ 24.{3.7}</br>
                <b>Optional Read:</b> Murphy 24.{5,6}
                <b>Due:</b> <a href="hw/pset/nov_28.pdf">Homework 5 (all)</a> (<a href="hw/pset/nov_28_sol.pdf">solutions</a>)
            </td>
            </td>
            <td>
                <a href="materials/section/gp.pdf">Application: Identifying Rapidly Deteriorating Water Quality Locations With Gaussian Processes</a></br></br>

                Some (~10) Final Project Presentations will be given
                one day this week.
            </td>
        </tr>
        <tr>
            <td><a href="">Dec.  5</a></td>
            <td>
                Latent Dirichlet Allocation.
                Nonparametric Models. K-Nearest-Neighbors as
                a Nonparametric Model. Gaussian Processes.
                Dirichlet Processes and the infinite mixture of Gaussians.</br></br>

                <b>Read:</b> Murphy 15.{1,2,3,4}, 25.2, 27.{1,2,3}</br>
                <b>Read:</b> <a href="http://jmlr.org/proceedings/papers/v28/wilson13.pdf">Gaussian Process Kernels for Pattern Discovery and Extrapolation</a></br>
                <b>Due:</b> Final Project!
            </td>
            <td>
                Final Project Presentations. We will distribute
                these across 2 days, where on the final day we will watch
                presentations of
                class and instructor chosen extraordinary projects.
            </td>
        </tr>
        <tr>
            <td>Dec.  12</td>
            <td>Finals</td>
            <td>Finals</td>
        </tr>
    </tbody>
</table>

    </div>

</body>
</html>