Python-Socket-Practice/TMLE_DoubleML.txt at office-websocket · chenweichen/Python-Socket-Practice · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
#double machine learning and targeted maximum likelihood estimation

## No.:
## Title:
## https links:


1.
CV-TMLE and double machine learning
https://vanderlaan-lab.org/2019/12/24/cv-tmle-and-double-machine-learning/

Thanks for this interesting question. In the past several years,
the interest in these machine learning-based estimators has become more widespread,
since they allow for the statistical answer to a question to be framed
in terms of scientifically meaningful parameters (e.g., defined through causal inference),
incorporating machine learning in the estimation process,
while providing formal statistical inference.


2.
Cross-validated Targeted Maximum Likelihood Estimation (CV-TMLE)
Double machine learning for added robustness
Alan Hubbard
Division of Biostatistics
University of California at Berkeley
October 22 2020
https://putnamds.com/FDA_Hubbard_CVTMLE.pdf


3.
Causal Inference using Targeted Maximum Likelihood Estimation
Duc Hieu Do
http://arno.uvt.nl/show.cgi?fid=156285


4.
Targeted Maximum Likelihood for ATE, why not just GLM for ATE?
https://stats.stackexchange.com/questions/550318/targeted-maximum-likelihood-for-ate-why-not-just-glm-for-ate

The answer is double robustness.
Like all doubly-robust methods, with TMLE you get two chances to get the model correct
and the treatment effect estimate is consistent if either is correct (or approach at a given rate).
With g-computation using a GLM, you only get one chance to get the model right.
If that model is wrong (and it almost certainly is), then your treatment effect estimate is not consistent.
You can use flexible machine learning methods with TMLE that are more likely to approximate the true form of either the treatment and outcome model.
Indeed, given the inability of an outcome model GLM to even plausibly capture the true outcome model,
one should almost never use it to estimate treatment effect,
whereas TMLE should be among your first choices.


5.
An Illustrated Guide to TMLE, Part I: Introduction and Motivation
https://www.khstats.com/blog/tmle/tutorial/


https://github.com/kathoffman

https://twitter.com/kat_hoffman_


Targeted Maximum Likelihood Estimation (TMLE) is a semiparametric estimation
framework to estimate a statistical quantity of interest.
TMLE allows the use of machine learning (ML) models which place
minimal assumptions on the distribution of the data.
Unlike estimates normally obtained from ML,
the final TMLE estimate will still have valid standard errors for
statistical inference.


Table of Contents
This blog post series has three parts:

Part I: Motivation
    TMLE in three sentences 🎯
    An Analyst’s Motivation for Learning TMLE 👩🏼‍💻
    Is TMLE Causal Inference? 🤔

Part II: Algorithm
    Why the Visual Guide? 🎨
    TMLE, Step-by-Step 🚶🏽
    Using the tmle package 📦

Part III: Evaluation
    Properties of TMLE 📈
    Why does TMLE work? ✨
    Resources to learn more 🤓


https://github.com/kathoffman/causal-inference-visual-guides

https://github.com/kathoffman/lmtp


6.
Targeted Maximum Likelihood Estimation for a Binary Outcome: Tutorial and Guided Implementation
https://migariane.github.io/TMLE.nb.html


Miguel Angel Luque Fernandez, MA, MPH, MSc, Ph.D
https://maluque.netlify.com/


Last update: 5/5/2019
https://scholar.harvard.edu/malf/home


7.
Targeted Maximum Likelihood Learning
University of California, Berkeley
U.C. Berkeley Division of Biostatistics Working Paper Series
Mark J. van der Laan∗ Daniel Rubin
https://www.stat.cmu.edu/~ryantibs/journalclub/vanderlaan_2006.pdf


8.
R Package: tmle
https://cran.r-project.org/web/packages/tmle/tmle.pdf


9.
TargetedLearning.jl
https://lendle.github.io/TargetedLearning.jl/

Targeted minimum loss-based estimation (TMLE) (sometimes targeted maximum likelihood estimation) is
a general framework for constructing regular, asymptotically linear estimators for pathwise differentiable parameters
with additional properties such as asymptotic efficiency and double robustness.
For background and details, see Targeted Learning by van der Laan and Rose, or articles on TMLE.


https://github.com/lendle/TargetedLearning.jl/


10.
On doubly robust inference for double machine learning
https://arxiv.org/abs/2107.06124


11.
Handling missing data when estimating causal effects with Targeted Maximum Likelihood Estimation
https://arxiv.org/abs/2112.05274


12.
Statistics in Julia - Maximum Likelihood Estimation.ipynb
https://github.com/johnmyleswhite/julia_tutorials/blob/master/Statistics%20in%20Julia%20-%20Maximum%20Likelihood%20Estimation.ipynb


https://github.com/johnmyleswhite/julia_tutorials


13.
MAXIMUM LIKELIHOOD ESTIMATION (MLE) IN JULIA: THE OLS EXAMPLE
https://juliaeconomics.com/2014/06/16/numerical-maximum-likelihood-the-ols-example/


14.
Scalable collaborative targeted learning for high-dimensional data
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6086775/

Stat Methods Med Res. Author manuscript; available in PMC 2018 Aug 11.
Published in final edited form as:
Stat Methods Med Res. 2019 Feb; 28(2): 532–554.
Published online 2017 Sep 22. doi: 10.1177/0962280217729845


15.
A fast algorithm for maximum likelihood estimation of mixture proportions using sequential quadratic programming
https://arxiv.org/abs/1806.01412


https://arxiv.org/pdf/1806.01412.pdf

A Fast Algorithm for Maximum Likelihood
Estimation of Mixture Proportions Using
Sequential Quadratic Programming


https://github.com/stephenslab/mixsqp-paper


https://github.com/stephenslab/mixSQP


16.
Maximum likelihood estimation
https://jump.dev/JuMP.jl/stable/tutorials/nonlinear/mle/

Use nonlinear optimization to compute the maximum likelihood estimate (MLE) of
the parameters of a normal distribution, a.k.a., the sample mean and variance.


17.
1. Targeted Machine Learning for Causal Inference based on Real World Data
https://www.youtube.com/watch?v=PrPNP5RVcLg


2. An Introduction toTargeted Maximum Likelihood Estimation of Causal Effects
https://www.youtube.com/watch?v=8Q9dfW3oOi4


Cross-validated Targeted Maximum Likelihood Estimation (CV-TMLE)
https://www.youtube.com/watch?v=MDmddX267Ys


July 22nd 3 Tutorial Towards Causal Reinforcement Learning
https://www.youtube.com/watch?v=W20GWMzME5w


18.
[SER 2021 Workshop] Targeted Learning in the tlverse: Causal Inference Meets Machine Learning
https://tlverse.org/ser2021-workshop/index.html

https://github.com/tlverse/ser2021-workshop


19.
Teaching Courses taught by Mark van der Laan
https://vanderlaan-lab.org/teaching/


The Research Group of Mark van der Laan Computational Biology and Causality
https://vanderlaan-lab.org


Targeted Learning in Biomedical Big Data, offered in the Spring 2018 semester at UC Berkeley.
https://vanderlaan-group.github.io/tlbbd-sp2018/

Spring 2018 Syllabus Targeted Learning in Biomedical Big Data
https://github.com/vanderLaan-Group/tlbbd-sp2018/blob/sources/static/materials/syllabus.pdf

Course Description:
This course teaches students how to construct efficient estimators and obtain robust inference for parameters
that utilize data-adptive estimation strategies (i.e., machine learning).

Requirements and Materials:
Targeted Learning: Causal Inference for Observational and Experimental Data by Mark J.van der Laan (2011)
vdL&R (2011)

Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies by Mark J. van der Laan
vdL&R (2017)


https://github.com/orgs/tlbbd-sp2018/repositories

https://github.com/vanderLaan-Group/tlbbd-sp2018/blob/sources/content/onepage/03-causal_mod-part.md

https://github.com/tlbbd-spring2018/lab_03

http://nbviewer.jupyter.org/github/tlbbd-spring2018/lab_03/blob/master/lab_03.ipynb

https://github.com/tlbbd-sp2018/lab_01


20.
Machine learning in the estimation of causal effects: targeted minimum loss-based estimation and double/debiased machine learning
https://academic.oup.com/biostatistics/article/21/2/353/5631845?login=false

Biostatistics, Volume 21, Issue 2, April 2020, Pages 353–358, https://doi.org/10.1093/biostatistics/kxz042

A limitation of the original TMLE formulation is that it imposed an assumption on the statistical model for the nuisance parameters.
The “uniform” central limit theorem necessary to derive the asymptotic distribution for the TMLE
requires that the nuisance functions are elements of Donsker classes, which are classes of functions with limited entropy.
This assumption is problematic in the context of machine learning because
it is not satisfied by many estimation methods, e.g., those which yield functions with unbounded variation.
The solution to this problem, named cross-validated TMLE (Zheng and van der Laan, 2011), uses sample-splitting to obtain out-of-sample estimates
of the nuisance parameters, an idea that dates back to at least (Bickel, 1982).
Because the nuisance estimates are constructed using out-of-sample data, asymptotic arguments
for the validation sample can be made conditional on the training sample, and then averaged across validation samples, thus avoiding the Donsker condition.

Zheng, W. and van der Laan, M. J. (2011). Cross-validated targeted minimum-loss-based estimation.
In: Targeted Learning. New York, NY: Springer, pp. 459–474.


The TMLE Framework
https://tlverse.org/tlverse-handbook/tmle3.html


Collaborative Targeted Maximum Likelihood Estimation
https://cran.r-project.org/web/packages/ctmle/readme/README.html


An Easy Implementation of CV-TMLE
https://arxiv.org/abs/1811.04573


21.
R Guide for TMLE in Medical Research - Ehsan Karim & Hanna Frank
2021-08-24
https://ehsanx.github.io/TMLEworkshop/