ChessAI/notes.txt at master · MaxDillon/ChessAI · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135


Intellij protobuf plugin does not *compile* protobufs. It just does syntax highlighting.
Will need to invoke from the command line, or will need to configure build tools to do it.

The generated .java classes are not self-contained, and still have dependencies on the
Google protobuf distribution.  Will need to build this separately and include as an external
library in the project.

It's convenient to get the protobuf dependencies with maven. And it may be convenient
to run the proto compilier this way too.

We set up the initial maven pom.xml like so:

  mvn -B archetype:generate -DgroupId=max.dillon -DartifactId=quest

This also creates src/main/java/... and src/test/java/... directories.

To generate GameGrammar.java from game_grammar.proto:

${anacondadir}/bin/protoc src/main/proto/game_grammar.proto --java_out=src/main/java


Initialize board:
  1. create a 2d array of given board size.
  2. for each piece type, for each placement, initialize piece.
  3. create opponent's pieces based on symmetry class.

Find legal moves given board:
  0. create empty list of moves
  1. for each of player's pieces
     1. apply templates in order to determine set of position independent moves
     2. restrict to moves still on the board
     3. restrict to moves satisfying jump and land restrictions.
     4. add to set of moves

question of representation of moves.
  could be piece x moves to y
  or could just be a new board state as an array.
  (second might be easier since you do things in just one place)


Grammar considerations
  -- representation of pieces off the board and moves for such pieces
  -- representation of adjacency for where you're allowed to land (e.g., connect 4)
  -- representation of winning conditions
  -- representation of move chaining (to help with pawn moves, or queening, or checkers)
  -- representation of special rows


Bottleneck seems to be generation of training data with self-play games.
MCTS is expensive on its own. Especially when we are calling out to predict() a lot.
-- make sure we don't do this more than necessary (e.g., duplication, or for nodes we won't expand).
-- make sure we aren't re-expanding things or recomputing possible moves
-- profile.

But due to the cost of data generation, we should consider all possible augmentations.
-- exploit game symmetries to add flips, rotations, etc. where possible.  specify what is possible in grammar.

Consider whether we can do with simpler models. Not sure how to estimate how large a model we need other than
by experiemnting. But if the model can be smaller it will train faster and we'll be able to use more
parallel instances due to using less memory  on the GPU

Diagnostics:
-- standard deviation of activations or weights. is this a likely indicator of overfitting?

regularization
-- weight decay / l2
-- dropout

posible that newer optimizers than stochastic gradient descent will converge faster.
may help give a quick idea of the capability of a given network topology

early stopping ... monitor performance on test set and stop training when perf degrades.
likely that we've overtrained connect4 model.


Diagnostics.
Weight Initialization.
Learning Rate.
Activation Functions.
Network Topology.
Batches and Epochs.
Regularization.
Optimization and Loss.
Early Stopping.

How To Improve Deep Learning Performance
by Jason Brownlee on September 21, 2016 in Deep Learning
20 Tips, Tricks and Techniques That You Can Use To
Fight Overfitting and Get Better Generalization
How can you get better performance from your deep learning model?


How many layers and how many neurons do you need?
No one knows. No one. Don’t ask.
You must discover a good configuration for your problem. Experiment.

Try one hidden layer with a lot of neurons (wide).
Try a deep network with few neurons per layer (deep).
Try combinations of the above.
Try architectures from recent papers on problems similar to yours.
Try topology patterns (fan out then in) and rules of thumb from books and papers (see links below).

It’s hard. Larger networks have a greater representational capability, and maybe you need it.
More layers offer more opportunity for hierarchical re-composition of abstract features learned from the data. Maybe you need that.

Small batch sizes with large epoch size and a large number of training epochs are common in modern deep learning implementations.


// TODO: we are unnecessarily reloading the model and restoring the graph with every new game.
// TODO: we are unnecessarily printing the board when in data generation mode.
// TODO: that said, 90% of the time is spent in predict().  (i.e., computationgraph.output)
// TODO: we do *not* appear to be hitting max GPU usage, though we are using all of its memory.
// TODO: possible that we could re-use a computationgraph for multiple threads and get higher
// TODO: CPU/GPU utilization at same GPM memory footprint.

// TODO: reduce calls to predict()
//       -- cache results by state hash?
//       -- change number of MCTS calls?

// TODO: we go to much greater expense rolling out simulations than by training. cheap to
//       apply an example to traing. much more expensive in model evals to produce the example.
// TODO: leverage the training examples we *do* generate by rotating and reflecting them.