TfRotationPca/ProgramLayout.Rmd at master · pourzanj/TfRotationPca · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
title: "Program Layout"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

Radford's HMC takes in U, Grad_U, epsilon, L, and current_q and outputs
a new sample.

U should just be a R function wrapper calling a TF op called U that takes in
some q as a tf.placeholder. This way we can get the gradient of U w.r.t. q

the op U will be the likelihood of data thus it needs to take a data placeholder.
This place holder will be the $N \times p$ matrix of data, $X$.
It will consist of the log likelihood for the sum of Gaussian PDFs evaluated at each
of our data points:

$$l(W, \Lambda, \sigma^2) := -\frac{N}{2} \ln |C| - \frac{N}{2} \mathrm{tr}(C^{-1} \hat{\Sigma})$$

where $C := W \Lambda^2 W^T + \sigma^2 I$ and $\hat{\Sigma} := (1/N) X^T X$. We cannot just evaluate
the density at the W, that corresponds to some $\Theta$. We also have to multiply by the appropiate
Stiefel area form. Giving us finally

$$U(W(\Theta), \Lambda, \sigma^2) := \frac{N}{2} \ln |C| + \frac{N}{2} \mathrm{tr}(C^{-1} \hat{\Sigma}) - \ln |G^T d\Theta|$$

# Constraint of Angles

Note that the angles need to be constrained to lie within some interval. For a given $i$, the first angle $\theta_{i,i+1}$ e.g. $\theta_{01}$ or $\theta_{34}$ should lie in the interval $[-\pi,\pi)$. For that same $i$, all angles after that (e.g. $\theta_{02}, \theta_{03}, \cdots$ and $\theta_{35}, \theta_{36}, \cdots$) should lie in the interval $[-\pi/2, \pi/2)$. That is if we are sampling on the entire Stiefel manifold (except for a set of measure zero); if we want to do upper orthant, we simply set all intervals to $[0, \pi/2)$.

Because the angles lie in an interval, we must transform them to an unconstrained space using the log-odds transform, see Stan manual section and transformations, specifically of a Lower and Upper bounded scalar. For each angle we define a, unconstrained variable

$$\theta' := \mathrm{logit}\left( \frac{\theta -a}{b -a} \right)$$

where $logit(u) : = \ln \frac{u}{1-u}$. The inverse of the transform is

$$\theta = a + (b-a)\,\mathrm{logit}^{-1}(\theta')$$

and the absolute value of the derivative of the inverse is

$$\left|\frac{d\theta}{d\theta'} \right| = (b-a)\cdot \mathrm{logit}^{-1}(\theta')\cdot (1 - \mathrm{logit}^{-1}(\theta'))$$

Thus when we sample angles on the logit scale, we can get back the constrained angles and plug them in where they would normally go, but our potential must include extra terms for each angle and thus becomes

$$U(W(\mathrm{logit}(\Theta)), \Lambda, \sigma^2) := \frac{N}{2} \ln |C| + \frac{N}{2} \mathrm{tr}(C^{-1} \hat{\Sigma}) - \ln |G^T d\Theta| - \sum_{i = 0}^{p-1} \sum_{j = i+1}^{n-1} \left|\frac{d\theta}{d\theta'} \right|_{\theta' = \theta_{ij}}$$

# U op Specification

U op needs to take in angles and within it

1. Compute G
2. From G compute W (just mult by I_np) and plug in to likelihood
3. From G compute stiefel area form

# Todo

1. Finish Timing Results. Try to get up to n = 100
2. Add support for Unknown Variance
3. Sample Theta on unconstrained space via sigmoid
4. Try upper orthant
5. Try true subspace that is close to end of interval

# Todo From Atzberger

1. using multiple charts
2. test datasets/applications
  + Atzberger's X
  + hierarchical
  + Brian AR
3. Compare to Byrne (TF doesn't have mexp, would require R implementation)
4. compare to QR sampling (rejection method)