It's a readme to tell detail of the newest model, please be careful of the details that is sheerly new
模型架構可以看 quant_models.py
目前用的checkpoint是 ./work/DS_CNN/training/current/ds_cnn_9507.ckpt-24200 accuracy = 95.42 95.07 95.40 -> 95.07
example_weight 提供 weight 轉換成多維的樣子,方便知道哪軸是哪軸 Weight 在 weight.h看 (跟舊的不一樣)
example.txt是不同層結果的範例 提供1.txt, 2.txt, 3.txt 參考 (跟舊的不一樣), lots_of_case.txt 提供大量的範例來找最大值
!! 7/9 比較這次更新
- 改變模型架構
- 除了最後一層(fc)以外,其他層沒有bias,accuracy沒有掉。會保留最後一層是希望training時有bias可以幫助訓練
- 第一層convolution沒有padding,第二層convolution有padding,改變第二層accuracy會有差
- 改變 maxpool位置,挪到 depthwise convolution裡面,進而減少運算量,accuracy沒有掉
- 不再有每一層之後有quantize成int8,不過解決方法後面會提到
- quantize策略改變 之前實現的方式無法在硬體上實現,所以打掉重練,現在僅保留運算是整數
- input feature map 依然是 int,範圍是(-32~31),沒辦法再更小
- weight 全都是 -32~31,只要花6bits,但要用int8存在記憶體沒意見,計算時在改就好
- convolution完不會把值限定在某一個值域裡面,很重要,但有解決方法
本次更新重要的idea !!!!
- 訓練出來的參數會是小數,之前沒有發現這件事情,因為print出來都是整數,實際上運算的是小數,實際上weight.h跟運算的不一樣
- [a,b,c,d] 要預測語音訊號,會從 a,b,c,d 挑選最大的值,按照這個想法,不用在意a,b,c,d究竟是小數還是超大的數字
- 為了改變這個情況,先改變模型架構,嘗試把bias拿掉,這樣子weight同乘一個數字結果不會改變,為了訓練還是保留fc的bias。整個操作對accuracy沒有太大影響
- 可以想成模型變成一堆乘法跟加法,如果weight都乘上10,經過一層layer結果就會變10倍,兩層結果就會變100倍
- 為了避免需要太大的bits,以及結果幾乎是線性的,所以會在每一層之間適當的插入 tf.floor(/2, /4, /8....)可以當成maxpool,可能提升accuracy,也可以控制bits數,對於小層數的模型,也可以透過對不同層bias乘上不同倍數來解決
- 會有壓的極限,bits數沒辦法再下降
- 目前是 input -> conv -> /8(無條件捨去) -> depthwise_conv -> /16 -> maxpool -> pointwise_conv -> /256 -> fc
可能可以做的事情
- 測試對不對
- 分析值的分布,看看哪邊要設最大值,哪邊可以再壓,什麼運算需要幾位
- 加法器的bit數跟乘法器分開來看 (不用太在意加法器,原本是8*8需要16位),有幾層數字看起來很大但其實都是加法加出來的,不用太擔心
- 想一下新的 pointwise conv dataflow,可以只花768 cycle做完,就不用管2D的設計了,這樣可能不錯
模型參數: 測試accuracy (2 代表一層 conv, 一層 depthwise convolution")
DS-CNN/conv_1/weights:0 (4, 4, 1, 48) DS-CNN/conv_ds_1/dw_conv/depthwise_weights:0 (3, 3, 48, 1) DS-CNN/conv_ds_1/pw_conv/weights:0 (1, 1, 48, 16) DS-CNN/fc1/weights:0 (16, 4) DS-CNN/fc1/biases:0 (4,)
每層結果(手動驗算過前兩個點,確認會有對的值) 僅供參考,後續weight有變 每16個是一個channel,從row開始
channel 0: 2 -5 -4 -7 1 -2 -7 -12 0 1 0 -6 0 -1 -8 -9 channel 1: 0 0 -13 4 -2 5 -6 1 -3 3 4 -3 -2 1 12 2
fingerprint_input 10*30 (input feature map) [[-18 0 1 1 1 0 1 1 0 0 19 0 -3 -1 -2 1 0 -1 -2 1 20 -2 -3 0 -1 0 0 -1 -2 1 20 -2 -3 0 -1 0 0 -1 -3 1 23 -1 -4 -1 -1 0 0 0 -2 1 8 0 -4 -2 0 0 2 0 -3 0 -2 0 -2 0 1 1 1 0 -1 1 -8 0 -2 0 2 0 1 1 0 1 -14 0 -1 1 2 1 1 -1 -2 1 -8 4 0 1 0 1 2 -1 0 1 -11 2 0 1 1 0 2 0 0 0 -15 1 -1 0 0 0 1 -1 0 1 -18 0 0 2 -1 0 2 -1 1 0 -18 0 1 0 0 1 1 0 -1 0 -20 0 1 1 1 1 1 0 -1 0 -19 1 2 1 2 1 1 0 0 0 -19 1 0 0 1 0 1 0 0 0 -18 0 1 0 0 1 1 0 0 0 -16 2 1 1 2 1 0 -1 -1 1 -17 2 1 1 1 0 0 0 -1 0 -19 1 0 1 2 1 0 -1 -1 0 -19 0 1 0 2 1 0 -1 0 1 -19 0 1 1 1 0 2 0 0 0 -17 1 0 1 1 0 1 0 0 1 -19 1 2 0 1 1 1 -1 -1 1 -19 0 1 1 1 0 1 0 -1 0 -19 -1 -1 1 2 1 1 0 -1 1 -18 1 1 2 2 0 1 0 -1 0 -19 0 1 1 1 1 1 0 -1 0 -18 0 0 1 1 1 1 -1 -1 0]]
-> 變成這樣,比較大的會在第一個 -18 0 1 1 1 0 1 1 0 0 19 0 -3 -1 -2 1 0 -1 -2 1 20 -2 -3 0 -1 0 0 -1 -2 1 20 -2 -3 0 -1 0 0 -1 -3 1
result of first convolution [[[[29 -186 ......
29是 channel 0 和 上方44的內積 -186是 channel 1 和 上方44的內積
./work/DS_CNN/training/current/ds_cnn_9507.ckpt-24200 --model_size_info 2 48 4 4 2 2 16 3 3 1 1
(不會 overfitting,看過不同時間點的結果)
python train.py --model_size_info 2 48 4 4 2 2 16 3 3 1 1 --learning_rate 0.00001,0.00005,0.0006,0.0003,0.0001 --how_many_training_steps 1000,1000,12000,10000,1000
python quant_test.py --checkpoint ./work/DS_CNN/training/current/ds_cnn_9507.ckpt-24200 --model_size_info 2 48 4 4 2 2 16 3 3 1 1 94.89 -> 95.07 有做 shift反而變好(另類pool) range 32 -31
This repository consists of the tensorflow models and training scripts used in the paper: Hello Edge: Keyword spotting on Microcontrollers. The scripts are adapted from Tensorflow examples and some are repeated here for the sake of making these scripts self-contained.
To train a DNN with 3 fully-connected layers with 128 neurons in each layer, run:
python train.py --model_architecture dnn --model_size_info 128 128 128
The command line argument --model_size_info is used to pass the neural network layer dimensions such as number of layers, convolution filter size/stride as a list to models.py, which builds the tensorflow graph based on the provided model architecture and layer dimensions. For more info on model_size_info for each network architecture see models.py. The training commands with all the hyperparameters to reproduce the models shown in the paper are given here.
To run inference on the trained model from a checkpoint on train/val/test set, run:
python test.py --model_architecture dnn --model_size_info 128 128 128 --checkpoint
<checkpoint path>
To freeze the trained model checkpoint into a .pb file, run:
python freeze.py --model_architecture dnn --model_size_info 128 128 128 --checkpoint
<checkpoint path> --output_file dnn.pb
Trained models (.pb files) for different neural network architectures such as DNN, CNN, Basic LSTM, LSTM, GRU, CRNN and DS-CNN shown in this arXiv paper are added in Pretrained_models. Accuracy of the models on validation set, their memory requirements and operations per inference are also summarized in the following table.
To run an audio file through the trained model (e.g. a DNN) and get top prediction, run:
python label_wav.py --wav <audio file> --graph Pretrained_models/DNN/DNN_S.pb
--labels Pretrained_models/labels.txt --how_many_labels 1
A quick guide on quantizing the KWS neural network models is here. The example code for running a DNN model on a Cortex-M development board is also provided here.
