This is to facilitate the “Machine Learning in Physics” course that I am teaching at Sharif University of Technology for winter-20 semester. For more information, see the course page at
-
Mehta, Pankaj, et al. "A high-bias, low-variance introduction to machine learning for physicists." Physics Reports (2019).
-
Nielsen, Michael A. Neural networks and deep learning. Vol. 25. San Francisco, CA, USA:: Determination press, 2015. (Available online )
-
Chollet, François, "Deep Learning with Python." (2018).
The course material is posted here. If you come across a mistake or problem, please let me know.
Also, the videos of some(most) of the lectures are posted here. These videos are in Farsi.
| Topic | Lecture notes | Notebook(s) |
|---|---|---|
| Introduction: Dipping a toe in the water | Introduction to ML | |
| Clustering | Clustering | |
| Regression and Classification | Regression and Classification |
| Topic | Lecture notes | Notebook(s) |
|---|---|---|
| Data: the basics | Data |
| Topic | Lecture notes | Notebook(s) |
|---|---|---|
| Model Evaluation | Metrics | |
| Model Selection | Statisticl Learning |
| Topic | Lecture notes | Notebook(s) |
|---|---|---|
| Introduction | Intro | |
| Feedforward | Feedforward | |
| Training: Back-propagation | Back-propagation |
| Milestone | Due date | Submission Link |
|---|---|---|
| Data | March 20th | Submit here. |
| Data | April 20th | Submit here. |
-
Decent understanding of programming and python and the following libraries
-
Numpy
-
Pandas
-
Plotting and graphical presentation tools in python
-
-
Git and Github (if you not familiar, let me know.)
-
Basic understanding of quantum mechanics and statistics.
-
Basic understanding of machine learning
This is a tentative plan and we may change it as we move on.
-
Course Project: 40%
-
Assignments: 30%
-
In-class exercises 10%
-
Final exam (set for Thursday, June 20th, 9AM): 30%
These add up to 110% which include the bonus as well.
This is a group project and counts towards 40% of the final grade.
The idea is that each group decides on a project at the beginning of the course and apply everything that we cover to their project. Here are some of the expectations for the course project:
-
Some initial proposal: Clear statement of the problem and some primary assessment of why using ML could help answer this problem. (Due Feb 28th)
-
Data collection/generation and preparation: (Due March 15th => Extended to March 20th )
- Create a folder for this part
- Have a description (readme file) for the data
- Describe your data: Where it comes from, different feautres and their physical significance, your target value(s)
- Create a notebook and implement the following in different sections:
- Clean up the data (remove the missing data and convert everything to numerical values)
- Scale your data
- Analysis of features and target (Histograms and )
- Feature selection (Try different techniques and assess how well they work on your data)
- Feature extraction (Try different techniques and assess how well they work on your data)
-
Application of the basic ML techniques: (Due April 15th)
-
A table of assessment (Will give an example later.)
-
Investigation of variance and bias of the techniques investigated.
-
Learning and validation curves
-
-
Application of NN and setting the hyperparameters (Due April 30th)
-
Oral presentation (See me to set up the time, it should be before June 24th.)
-
Written term paper (It should be submitted by July 5th.)
Some notes:
- Make sure you include citations to all the resources you use!
- You should submit your work as a group rather than separate individual submissions.
- Scripts, notebooks and figures without description would not count toward your grade.
- Your codes should include enough comments and information that can be easily followed.
- It is essential that all group members contribute (make commits) to their repositories, this is the only way I can make sure that everyone participated in their project.
See the files in the CheatSheet folder.
| Item | Description |
|---|---|
| Jupyter | Jupyter provides an interactive environment for programming. We will be mostly using the python 3 kernel. |
| Git and Github | Git provides a strong infrastructure for version control. Github is web-based hosting service for version control and it also provides services for collaboration. |
| Python | It is the programming language that we will be mostly using for this course. |
| NumPy | It’s a python library that provides strong and efficient tools for manipulation of high-dimensional arrays. |
| SciPy | It’s a python library, built on NumPy for mathematical and scientific computing. |
| Pandas_basics Pandas 2 Importing data |
It’s a python library, built on NumPy that provides efficient tools for handling and analysis of data. |
| Matplotlib Seaborn |
These are two of the most common python library for visualization. |
| Scikit-Learn | It’s a python library that provides a nice and fairly efficient implementation of most the machine learning techniques and ideas. |
| Keras | It is python library that provides a high-level and easy-to-use interface for Tensorflow and some other deep learning libraries. |