Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,10 @@ See [dev branch](https://github.com/Exabyte-io/rewotes/tree/dev) also.

## Notes

Examples listed here are only meant as guidelines and do not necessarily reflect on the type of work to be performed at the company.
Examples listed here are only meant as guidelines and do not necessarily reflect on the type of work to be performed at the company. Modifications to the individual assignments with an advance notice are encouraged.

Modifications to the individual assignments with an advance notice are encouraged. Candidates are free to share the results.
We will screen for the ability to (1) pick up new concepts quickly, (2) implement a working proof-of-concept solution, and (3) outline how the PoC can become more mature. We value attention to details and modularity.

We will screen for the ability to pick up new concepts quickly and implement a working solution. We value attention to details and modularity.

## Hiring process

Expand Down
7,053 changes: 7,053 additions & 0 deletions vitalypro/.ipynb_checkpoints/code_implementation_example-checkpoint.ipynb

Large diffs are not rendered by default.

43 changes: 43 additions & 0 deletions vitalypro/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
**Prediction of Electronic Band-structure in Ultra-thin SiGe Superlattices with Tree-Based Machine Learning Models**

**Overview**

The idea of this task is to train an ML model to predict electronic band-structure of a SiGe superlattice with any kind of interfacial disorder, external strain, and composition*.
(**please see the list of assumptions that will be used in this work*)

Superlattices are periodic structures that contain layers of different materials. The interfaces significantly impact the electronic and phonon transport and make superlattices ideal candidates as thermoelectric materials. The efficiency or figure of merit of thermoelectric material is calculated as:

*zT=(S^2/ρk)T*,

where *zT* is the dimensionless figure of merit, S, ρ, k are the Seebeck coefficient, electrical resistivity, and thermal conductivity. In order to improve the efficiency of thermoelectric material, we need to increase the Seebeck coefficient and electrical conductivity and at the same time decrease the thermal conductivity. The former two parameters can be easily and quickly computed from the energy bands using Boltzmann equations (see https://arxiv.org/pdf/cond-mat/0602203.pdf eq. 12-16). The most time-consuming part in the computation of S is the bandstructure calculation that can be significantly speed up with ML modeling.


**Assumptions**

This task is very interesting, but it’s also complicated, and it would take a significant amount of time to complete. For this reason, I will make some assumptions that would simplify this problem:
1. I will only study superlattices that have a constant number of atoms to avoid band structure folding. As example, the electronic band structure unfolding can be done with GPAW (https://wiki.fysik.dtu.dk/gpaw/tutorialsexercises/electronic/unfold/unfold.html)
2. I will use a smaller number of atoms in the cell to speed up calculations (8 atom cells)
3. For the transport we only care about the bands that are close to the Fermi level. Therefore, I will only predict 2 valence and 2 conduction bands.
4. In this work, I will only consider superlattices with the ideal interfaces. It would be interesting to investigate disordered interfaces and introduce some defects far away from the interface as well in the future work. This would require studying larger cells and introduce Voronoi tessellations to correctly describe the neighboring atoms. In this work, however, I will study the effect of composition and external strain only.
5. I will predict bandstructure along a short path (e.g. Gamma -Z) with a small number of K-points
6. I will use the PBE exchange-correlation functional for the DFT calculations. The band gap will be underestimated but the bands shape should not be affected significantly.
7. To build a good ML model we need to provide both global and local properties of the system. Global properties are lattice constants, compositions, number of atoms (if not a constant) etc. Local properties are such as local strain, nearest neighbor environment etc. In this work I will only use global properties to build an ML model


**Project details**

1. **Step 1: DFT calculations**

In the first step I performed DFT calculations of ultra-thin SixGe1-x superlattices at different external strain (x=1,2,3 is the number of monolayers). I will provide more details on this step during our Monday call.

2. **Step 2: Develop a module for ML modeling**

This module consist of 3 classes: *read_data*, *data_preparation*, and *modeling*. The first class is written to get all necessary data for modeling from QE output files. This includes lattice constants, compositions, fermi level (read from xlm file) and band energies (read from bands.x output). The *data_preparation* module can be used to split the data into training / testing sets. Options with manual or random selection are included. The final module consist of two tree-based ML models: Random forest and XGboost. Random forest suppots multioutput regression, however, for XGboost I had to use sklearn MultiOutputRegressor to make predictions (see https://github.com/dmlc/xgboost/issues/2087). The modeling is perfromed with automatic hyperparameter search using Bayesian optimization. To reduce possibility of overfitting, an option for n-fold cross validation is added. More details will be provided on Monday

3. **Step 3: Predict a SL bandstructure with the ML model**

Finaly, when the structures are calculated, the data is extracted and prepared, we test the model performance. In the figure below, I present the final results for two SiGe SL bandstructures predicted with tree-based ML models. The black curves show the bands calculated with the DFT method, the predictions with random forest and XGboost models are shown with the red and blue dots, respectively. In general, both methods provide with good predictions, however, random forest model outperfroms the XGBoost one. I believe this mostly happens due to inability for XGBoost to deal with multioutputs. We will discuss this on Monday as well.

![ml model performance](https://user-images.githubusercontent.com/64281595/139505384-c190b36a-62a3-48ee-9360-b1d56e0efeea.png)


Binary file not shown.
34 changes: 34 additions & 0 deletions vitalypro/bands/si1ge3_0.dat
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
&plot nbnd= 20, nks= 11 /
0.000000 0.000000 0.000000
-6.556 -3.007 -2.690 -2.689 -2.613 -2.610 -1.918 2.801 2.872 2.873
2.951 3.318 3.320 5.665 6.219 6.227 6.330 6.538 6.540 6.886
0.000001 0.000001 0.046906
-6.547 -3.210 -2.688 -2.687 -2.608 -2.395 -1.915 2.782 2.849 2.850
2.923 3.327 3.329 5.496 6.168 6.175 6.387 6.588 6.591 6.927
0.000003 0.000003 0.093811
-6.518 -3.521 -2.683 -2.681 -2.601 -2.042 -1.908 2.726 2.784 2.785
2.844 3.356 3.358 5.125 6.029 6.036 6.551 6.736 6.738 6.931
0.000004 0.000004 0.140717
-6.470 -3.831 -2.675 -2.673 -2.591 -1.897 -1.661 2.643 2.690 2.691
2.730 3.404 3.406 4.686 5.835 5.841 6.806 6.929 6.968 6.971
0.000005 0.000005 0.187622
-6.403 -4.130 -2.664 -2.663 -2.579 -1.883 -1.264 2.544 2.579 2.580
2.599 3.470 3.472 4.223 5.613 5.618 6.940 7.136 7.271 7.274
0.000007 0.000007 0.234528
-6.317 -4.413 -2.653 -2.651 -2.564 -1.867 -0.854 2.441 2.466 2.467
2.467 3.555 3.557 3.751 5.379 5.384 6.967 7.525 7.634 7.636
0.000008 0.000008 0.281433
-6.213 -4.680 -2.641 -2.639 -2.550 -1.851 -0.434 2.342 2.345 2.361
2.362 3.279 3.657 3.659 5.146 5.150 7.011 7.920 7.963 8.044
0.000009 0.000009 0.328339
-6.091 -4.929 -2.630 -2.628 -2.536 -1.836 -0.007 2.242 2.258 2.271
2.272 2.809 3.776 3.778 4.922 4.926 7.070 7.908 8.382 8.441
0.000011 0.000011 0.375244
-5.953 -5.158 -2.621 -2.619 -2.525 -1.824 0.423 2.164 2.193 2.204
2.204 2.349 3.908 3.910 4.713 4.716 7.138 7.807 8.577 8.953
0.000012 0.000012 0.422150
-5.807 -5.360 -2.615 -2.614 -2.518 -1.816 0.838 1.915 2.116 2.152
2.161 2.162 4.041 4.042 4.532 4.536 7.201 7.711 8.800 9.230
0.000013 0.000013 0.469056
-5.710 -5.474 -2.613 -2.612 -2.516 -1.814 1.096 1.651 2.100 2.139
2.147 2.147 4.117 4.118 4.441 4.445 7.230 7.670 8.970 9.099
34 changes: 34 additions & 0 deletions vitalypro/bands/si1ge3_1.dat
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
&plot nbnd= 20, nks= 11 /
0.000000 0.000000 0.000000
-6.607 -3.052 -2.828 -2.827 -2.751 -2.654 -2.057 2.726 2.799 2.799
2.876 3.115 3.116 5.605 6.013 6.013 6.247 6.448 6.449 6.620
0.000001 0.000001 0.047724
-6.598 -3.254 -2.826 -2.826 -2.748 -2.438 -2.055 2.705 2.775 2.775
2.847 3.125 3.125 5.417 5.963 5.963 6.305 6.500 6.500 6.683
0.000003 0.000003 0.095449
-6.569 -3.564 -2.820 -2.820 -2.742 -2.087 -2.048 2.648 2.708 2.708
2.766 3.154 3.154 5.028 5.827 5.827 6.471 6.649 6.650 6.691
0.000004 0.000004 0.143173
-6.521 -3.874 -2.812 -2.812 -2.732 -2.036 -1.708 2.562 2.611 2.611
2.649 3.202 3.202 4.583 5.635 5.635 6.686 6.730 6.886 6.886
0.000005 0.000005 0.190897
-6.453 -4.172 -2.801 -2.801 -2.719 -2.022 -1.313 2.460 2.497 2.497
2.515 3.269 3.269 4.119 5.414 5.414 6.695 7.065 7.194 7.195
0.000007 0.000007 0.238622
-6.367 -4.456 -2.789 -2.789 -2.704 -2.006 -0.907 2.353 2.380 2.381
2.381 3.354 3.354 3.651 5.182 5.182 6.720 7.458 7.561 7.562
0.000008 0.000008 0.286346
-6.263 -4.724 -2.777 -2.777 -2.689 -1.989 -0.490 2.252 2.256 2.272
2.272 3.183 3.457 3.458 4.949 4.949 6.764 7.701 7.901 7.977
0.000009 0.000009 0.334070
-6.140 -4.973 -2.766 -2.765 -2.676 -1.974 -0.067 2.151 2.166 2.181
2.181 2.720 3.577 3.577 4.725 4.725 6.823 7.671 8.166 8.383
0.000011 0.000011 0.381795
-6.002 -5.203 -2.757 -2.756 -2.665 -1.962 0.357 2.072 2.099 2.111
2.111 2.266 3.710 3.710 4.515 4.515 6.892 7.559 8.371 8.900
0.000012 0.000012 0.429519
-5.854 -5.405 -2.751 -2.751 -2.658 -1.954 0.766 1.839 2.023 2.057
2.068 2.068 3.844 3.844 4.333 4.333 6.957 7.457 8.599 9.019
0.000013 0.000013 0.477243
-5.758 -5.520 -2.749 -2.749 -2.655 -1.951 1.017 1.583 2.006 2.043
2.053 2.053 3.921 3.921 4.240 4.241 6.988 7.414 8.768 8.895
34 changes: 34 additions & 0 deletions vitalypro/bands/si1ge3_2.dat
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
&plot nbnd= 20, nks= 11 /
0.000000 0.000000 0.000000
-6.651 -3.088 -2.948 -2.948 -2.871 -2.686 -2.182 2.664 2.741 2.741
2.817 2.935 2.935 5.556 5.829 5.830 6.175 6.373 6.373 6.382
0.000001 0.000001 0.048528
-6.641 -3.288 -2.947 -2.946 -2.869 -2.472 -2.179 2.643 2.716 2.716
2.787 2.944 2.944 5.345 5.780 5.781 6.234 6.425 6.426 6.466
0.000003 0.000003 0.097055
-6.612 -3.597 -2.941 -2.941 -2.862 -2.172 -2.123 2.583 2.647 2.647
2.703 2.973 2.974 4.940 5.647 5.647 6.403 6.476 6.578 6.579
0.000004 0.000004 0.145583
-6.564 -3.908 -2.932 -2.932 -2.852 -2.160 -1.746 2.494 2.546 2.546
2.583 3.022 3.022 4.489 5.457 5.457 6.468 6.667 6.819 6.819
0.000005 0.000005 0.194110
-6.496 -4.206 -2.921 -2.921 -2.839 -2.146 -1.353 2.389 2.429 2.429
2.446 3.089 3.089 4.027 5.238 5.238 6.474 7.006 7.133 7.133
0.000007 0.000007 0.242638
-6.409 -4.491 -2.909 -2.909 -2.824 -2.129 -0.950 2.280 2.307 2.309
2.309 3.175 3.175 3.562 5.006 5.007 6.498 7.390 7.405 7.506
0.000008 0.000008 0.291166
-6.304 -4.759 -2.896 -2.896 -2.808 -2.112 -0.537 2.176 2.180 2.197
2.197 3.099 3.279 3.279 4.775 4.775 6.540 7.507 7.852 7.863
0.000009 0.000009 0.339693
-6.181 -5.009 -2.885 -2.885 -2.795 -2.097 -0.118 2.073 2.087 2.103
2.103 2.641 3.399 3.399 4.550 4.550 6.599 7.458 7.982 8.340
0.000010 0.000010 0.388221
-6.041 -5.240 -2.875 -2.875 -2.783 -2.084 0.301 1.992 2.019 2.032
2.032 2.193 3.532 3.532 4.340 4.340 6.669 7.338 8.197 8.861
0.000012 0.000012 0.436748
-5.893 -5.443 -2.869 -2.869 -2.776 -2.076 0.703 1.774 1.942 1.976
1.987 1.987 3.667 3.667 4.158 4.158 6.737 7.231 8.427 8.840
0.000013 0.000013 0.485276
-5.795 -5.559 -2.867 -2.867 -2.773 -2.073 0.947 1.525 1.925 1.961
1.972 1.972 3.745 3.745 4.065 4.065 6.769 7.185 8.593 8.724
34 changes: 34 additions & 0 deletions vitalypro/bands/si1ge3_3.dat
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
&plot nbnd= 20, nks= 11 /
0.000000 0.000000 0.000000
-6.691 -3.124 -3.061 -3.061 -2.984 -2.719 -2.297 2.606 2.686 2.686
2.762 2.768 2.768 5.506 5.658 5.659 6.107 6.150 6.301 6.301
0.000001 0.000001 0.049295
-6.682 -3.323 -3.059 -3.059 -2.982 -2.507 -2.294 2.584 2.661 2.661
2.731 2.777 2.778 5.266 5.610 5.611 6.167 6.263 6.354 6.355
0.000003 0.000003 0.098591
-6.652 -3.631 -3.053 -3.053 -2.975 -2.287 -2.159 2.523 2.589 2.589
2.645 2.807 2.807 4.844 5.479 5.479 6.275 6.339 6.510 6.510
0.000004 0.000004 0.147886
-6.604 -3.941 -3.044 -3.044 -2.964 -2.275 -1.785 2.432 2.485 2.485
2.522 2.855 2.856 4.390 5.292 5.292 6.264 6.607 6.755 6.755
0.000005 0.000005 0.197182
-6.536 -4.239 -3.033 -3.033 -2.951 -2.260 -1.395 2.324 2.365 2.365
2.381 2.923 2.923 3.929 5.074 5.074 6.267 6.950 7.043 7.073
0.000007 0.000007 0.246477
-6.448 -4.524 -3.020 -3.020 -2.935 -2.243 -0.995 2.211 2.240 2.241
2.241 3.009 3.010 3.468 4.844 4.844 6.290 7.216 7.354 7.451
0.000008 0.000008 0.295773
-6.343 -4.792 -3.007 -3.007 -2.920 -2.226 -0.586 2.105 2.110 2.127
2.127 3.010 3.114 3.114 4.613 4.613 6.331 7.324 7.674 7.806
0.000009 0.000009 0.345068
-6.219 -5.043 -2.995 -2.995 -2.906 -2.210 -0.172 2.000 2.013 2.030
2.030 2.558 3.235 3.235 4.388 4.389 6.390 7.258 7.813 8.297
0.000010 0.000010 0.394363
-6.078 -5.275 -2.985 -2.985 -2.894 -2.197 0.242 1.918 1.943 1.957
1.957 2.116 3.369 3.369 4.178 4.178 6.461 7.130 8.036 8.728
0.000012 0.000012 0.443659
-5.929 -5.479 -2.979 -2.979 -2.887 -2.189 0.637 1.705 1.867 1.899
1.912 1.912 3.505 3.505 3.995 3.995 6.530 7.018 8.269 8.672
0.000013 0.000013 0.492954
-5.830 -5.596 -2.977 -2.977 -2.884 -2.186 0.874 1.463 1.850 1.885
1.896 1.896 3.583 3.583 3.901 3.902 6.563 6.970 8.433 8.564
34 changes: 34 additions & 0 deletions vitalypro/bands/si1ge3_4.dat
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
&plot nbnd= 20, nks= 11 /
0.000000 0.000000 0.000000
-6.731 -3.172 -3.170 -3.163 -3.095 -2.756 -2.410 2.548 2.606 2.606
2.631 2.631 2.706 5.447 5.490 5.491 5.909 6.036 6.218 6.226
0.000001 0.000001 0.050050
-6.722 -3.361 -3.170 -3.168 -3.092 -2.546 -2.407 2.526 2.604 2.605
2.616 2.616 2.674 5.169 5.442 5.443 6.062 6.097 6.279 6.282
0.000003 0.000003 0.100100
-6.692 -3.667 -3.164 -3.162 -3.086 -2.400 -2.201 2.462 2.531 2.531
2.586 2.645 2.646 4.733 5.313 5.314 6.078 6.272 6.438 6.440
0.000004 0.000004 0.150151
-6.643 -3.976 -3.155 -3.153 -3.075 -2.388 -1.830 2.368 2.424 2.424
2.460 2.694 2.694 4.277 5.129 5.130 6.064 6.543 6.686 6.689
0.000005 0.000005 0.200201
-6.575 -4.275 -3.143 -3.141 -3.061 -2.373 -1.443 2.257 2.300 2.300
2.316 2.762 2.763 3.819 4.914 4.914 6.065 6.869 6.891 7.009
0.000006 0.000006 0.250251
-6.487 -4.559 -3.130 -3.128 -3.045 -2.355 -1.047 2.142 2.172 2.174
2.174 2.849 2.849 3.362 4.685 4.686 6.085 7.046 7.298 7.391
0.000008 0.000008 0.300301
-6.381 -4.828 -3.117 -3.115 -3.029 -2.338 -0.643 2.033 2.039 2.057
2.057 2.909 2.954 2.954 4.455 4.455 6.126 7.143 7.492 7.754
0.000009 0.000009 0.350351
-6.257 -5.079 -3.104 -3.103 -3.015 -2.321 -0.234 1.928 1.940 1.958
1.958 2.463 3.075 3.076 4.230 4.231 6.184 7.059 7.649 8.249
0.000010 0.000010 0.400402
-6.116 -5.311 -3.095 -3.093 -3.003 -2.308 0.174 1.844 1.868 1.883
1.883 2.029 3.210 3.211 4.020 4.020 6.256 6.925 7.879 8.547
0.000012 0.000012 0.450452
-5.966 -5.515 -3.088 -3.087 -2.996 -2.300 0.562 1.625 1.792 1.823
1.837 1.837 3.346 3.347 3.837 3.838 6.327 6.809 8.114 8.503
0.000013 0.000013 0.500502
-5.866 -5.633 -3.086 -3.085 -2.993 -2.297 0.792 1.391 1.775 1.808
1.821 1.821 3.425 3.427 3.742 3.744 6.361 6.760 8.274 8.404
Loading