-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathsearch.xml
More file actions
1380 lines (1370 loc) · 148 KB
/
search.xml
File metadata and controls
1380 lines (1370 loc) · 148 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<search>
<entry>
<title>Effective Modern C++ 行为习惯列举</title>
<url>/2019/05/07/Effective-Modern-C-%E8%A1%8C%E4%B8%BA%E4%B9%A0%E6%83%AF%E5%88%97%E4%B8%BE/</url>
<content><![CDATA[<h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><blockquote>
<p>面向已经掌握了C++11之前的同学, 本文只列举了基本的几条.</p>
<p>ref: Effective Modern C++</p>
</blockquote><h2 id="用auto-代替显示声明"><a href="#用auto-代替显示声明" class="headerlink" title="用auto 代替显示声明."></a>用<strong>auto</strong> 代替显示声明.</h2><p>✘ 错误示范:</p><figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">typename</span> <span class="built_in">std</span>::iterator_traits<It>::value_type currValue = *b;</span><br></pre></td></tr></table></figure><a id="more"></a>
<p>✔ 正确示范:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">auto</span> currValue = *b;</span><br></pre></td></tr></table></figure>
<h2 id="使用nullptr-替代NULL-和0"><a href="#使用nullptr-替代NULL-和0" class="headerlink" title="使用nullptr 替代NULL 和0."></a>使用<strong>nullptr</strong> 替代<strong>NULL</strong> 和<strong>0</strong>.</h2><p>✘ 错误示范:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">if</span> (result == <span class="number">0</span>){</span><br><span class="line"> ...</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">if</span> (result == <span class="literal">NULL</span>){</span><br><span class="line"> ...</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>✔ 正确示范:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">if</span> (result == <span class="literal">nullptr</span>){</span><br><span class="line"> ...</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<h2 id="使用using-替代typedef"><a href="#使用using-替代typedef" class="headerlink" title="使用using 替代typedef."></a>使用<strong>using</strong> 替代<strong>typedef</strong>.</h2><p>✘ 错误示范:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">typedef</span> <span class="built_in">std</span>::<span class="built_in">unique_ptr</span><<span class="built_in">std</span>::<span class="built_in">unordered_map</span><<span class="built_in">std</span>::<span class="built_in">string</span>, <span class="built_in">std</span>::<span class="built_in">string</span>>> UPtrMapSS;</span><br></pre></td></tr></table></figure>
<p>✔ 正确示范:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">using</span> UPtrMapSS = <span class="built_in">std</span>::<span class="built_in">unique_ptr</span><<span class="built_in">std</span>::<span class="built_in">unordered_map</span><<span class="built_in">std</span>::<span class="built_in">string</span>, <span class="built_in">std</span>::<span class="built_in">string</span>>>;</span><br></pre></td></tr></table></figure>
<h2 id="有范围的enum-替代无范围的enum"><a href="#有范围的enum-替代无范围的enum" class="headerlink" title="有范围的enum 替代无范围的enum."></a>有范围的<strong>enum</strong> 替代无范围的<strong>enum</strong>.</h2><p>✘ 错误示范:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">enum</span> Color { black, white, red };</span><br><span class="line"><span class="keyword">auto</span> white = <span class="literal">false</span>; <span class="comment">//white 已经被声明了, error!</span></span><br></pre></td></tr></table></figure>
<p>✔ 正确示范:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">enum</span> <span class="class"><span class="keyword">class</span> <span class="title">Color</span> {</span> black, white, red };</span><br><span class="line"><span class="keyword">auto</span> white = <span class="literal">false</span>; <span class="comment">// 一切正常</span></span><br><span class="line">Color c = Color::white; <span class="comment">//规范的声明方式</span></span><br><span class="line"><span class="keyword">auto</span> c = Color::white; <span class="comment">//规范的声明方式</span></span><br></pre></td></tr></table></figure>
<h2 id="禁用函数时-用delete-替代private"><a href="#禁用函数时-用delete-替代private" class="headerlink" title="禁用函数时, 用delete 替代private."></a>禁用函数时, 用<strong>delete</strong> 替代<strong>private</strong>.</h2><p>✘ 错误示范:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">basic_ios</span> :</span> <span class="keyword">public</span> ios_base { </span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> ...</span><br><span class="line"><span class="keyword">private</span>:</span><br><span class="line"> basic_ios(<span class="keyword">const</span> basic_ios& ); </span><br><span class="line"> basic_ios& <span class="keyword">operator</span>=(<span class="keyword">const</span> basic_ios&); </span><br><span class="line"> ...</span><br><span class="line">};</span><br></pre></td></tr></table></figure>
<p>✔ 正确示范:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">basic_ios</span> :</span> <span class="keyword">public</span> ios_base { </span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> basic_ios(<span class="keyword">const</span> basic_ios& ) = <span class="keyword">delete</span>; </span><br><span class="line"> basic_ios& <span class="keyword">operator</span>=(<span class="keyword">const</span> basic_ios&) = <span class="keyword">delete</span>; </span><br><span class="line"> ...</span><br><span class="line">};</span><br></pre></td></tr></table></figure>
<h2 id="使用override-关键字标注-override函数"><a href="#使用override-关键字标注-override函数" class="headerlink" title="使用override 关键字标注 override函数."></a>使用<strong>override</strong> 关键字标注 override函数.</h2><figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Base</span> {</span> </span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="keyword">void</span> <span class="title">mf1</span><span class="params">()</span> <span class="keyword">const</span></span>; </span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="keyword">void</span> <span class="title">mf2</span><span class="params">(<span class="keyword">int</span> x)</span></span>; </span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="keyword">void</span> <span class="title">mf3</span><span class="params">()</span> &</span>; </span><br><span class="line"> <span class="function"><span class="keyword">void</span> <span class="title">mf4</span><span class="params">()</span> <span class="keyword">const</span></span>; </span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Derived</span>:</span> <span class="keyword">public</span> Base { </span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="keyword">void</span> <span class="title">mf1</span><span class="params">()</span> override</span>; </span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="keyword">void</span> <span class="title">mf2</span><span class="params">(<span class="keyword">unsigned</span> <span class="keyword">int</span> x)</span> override</span>; </span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="keyword">void</span> <span class="title">mf3</span><span class="params">()</span> && override</span>; <span class="function"><span class="keyword">virtual</span> <span class="keyword">void</span> <span class="title">mf4</span><span class="params">()</span> <span class="keyword">const</span> override</span>;</span><br><span class="line">};</span><br></pre></td></tr></table></figure>
<h2 id="使用const-iterators-替代iterators"><a href="#使用const-iterators-替代iterators" class="headerlink" title="使用const_iterators 替代iterators."></a>使用const_iterators 替代iterators.</h2><p>✘ 错误示范:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="built_in">std</span>::<span class="built_in">vector</span><<span class="keyword">int</span>> values;</span><br><span class="line">…</span><br><span class="line"><span class="keyword">auto</span> it = <span class="built_in">std</span>::find(values.begin(),values.end(), <span class="number">1983</span>); <span class="comment">//使用begin()和end()</span></span><br><span class="line">values.insert(it, <span class="number">1998</span>);</span><br></pre></td></tr></table></figure>
<p>✔ 正确示范:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="built_in">std</span>::<span class="built_in">vector</span><<span class="keyword">int</span>> values; </span><br><span class="line">…</span><br><span class="line"><span class="keyword">auto</span> it = <span class="built_in">std</span>::find(values.cbegin(),values.cend(), <span class="number">1983</span>);<span class="comment">//使用cbegin()和cend()</span></span><br><span class="line">values.insert(it, <span class="number">1998</span>);</span><br></pre></td></tr></table></figure>
<h2 id="如果函数不会抛出异常-使用noexcept进行声明"><a href="#如果函数不会抛出异常-使用noexcept进行声明" class="headerlink" title="如果函数不会抛出异常, 使用noexcept进行声明."></a>如果函数不会抛出异常, 使用<strong>noexcept</strong>进行声明.</h2><p>✘ C++98:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">f</span><span class="params">(<span class="keyword">int</span> x)</span> <span class="title">throw</span><span class="params">()</span></span>;</span><br></pre></td></tr></table></figure>
<p>✔ C++11:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">f</span><span class="params">(<span class="keyword">int</span> x)</span> <span class="keyword">noexcept</span></span>;</span><br></pre></td></tr></table></figure>
<h2 id="使用智能指针-std-unique-ptr-std-shared-ptr-std-weak-ptr替代传统指针-std-auto-淘汰了别用了"><a href="#使用智能指针-std-unique-ptr-std-shared-ptr-std-weak-ptr替代传统指针-std-auto-淘汰了别用了" class="headerlink" title="使用智能指针 std::unique_ptr, std::shared_ptr, std::weak_ptr替代传统指针 (std::auto 淘汰了别用了)."></a>使用智能指针 <code>std::unique_ptr</code>, <code>std::shared_ptr</code>, <code>std::weak_ptr</code>替代传统指针 (<code>std::auto</code> 淘汰了别用了).</h2><h2 id="能用constexpr就用constexpr"><a href="#能用constexpr就用constexpr" class="headerlink" title="能用constexpr就用constexpr."></a>能用constexpr就用constexpr.</h2><h2 id="让常成员函数线程安全-使用std-mutex-或std-atomic-等"><a href="#让常成员函数线程安全-使用std-mutex-或std-atomic-等" class="headerlink" title="让常成员函数线程安全: 使用std::mutex 或std::atomic 等."></a>让<em>常成员函数</em>线程安全: 使用<code>std::mutex</code> 或<code>std::atomic</code> 等.</h2><h2 id="善用右值-Rvalue-语义转移-Move-Semantics-完美转发-Perfect-Forwarding"><a href="#善用右值-Rvalue-语义转移-Move-Semantics-完美转发-Perfect-Forwarding" class="headerlink" title="善用右值[Rvalue], 语义转移[Move Semantics], 完美转发[Perfect Forwarding]"></a>善用右值[<em>Rvalue</em>], 语义转移[<em>Move Semantics</em>], 完美转发[<em>Perfect Forwarding</em>]</h2><h2 id="善用Lambda-表达式"><a href="#善用Lambda-表达式" class="headerlink" title="善用Lambda 表达式"></a>善用Lambda 表达式</h2><h2 id="善用并发编程API"><a href="#善用并发编程API" class="headerlink" title="善用并发编程API"></a>善用并发编程API</h2><h2 id="容器中使用emplace-back-替代push-back"><a href="#容器中使用emplace-back-替代push-back" class="headerlink" title="容器中使用emplace_back(), 替代push_back()"></a>容器中使用<code>emplace_back()</code>, 替代<code>push_back()</code></h2>]]></content>
<categories>
<category>c++</category>
</categories>
<tags>
<tag>c++</tag>
</tags>
</entry>
<entry>
<title>Hello World</title>
<url>/2018/12/23/hello-world/</url>
<content><![CDATA[<p>Hello World!</p>
<p>我回来啦~❕❕❕❕<br>开始写作,更新blog啦 🎉🎉🎉🎉</p>
]]></content>
<categories>
<category>life</category>
</categories>
<tags>
<tag>life</tag>
</tags>
</entry>
<entry>
<title>MySQL Note</title>
<url>/2020/07/31/MySQL-Note/</url>
<content><![CDATA[<h1 id="高性能MySQL-笔记"><a href="#高性能MySQL-笔记" class="headerlink" title="高性能MySQL 笔记"></a>高性能MySQL 笔记</h1><h2 id="表"><a href="#表" class="headerlink" title="表"></a>表</h2><h3 id="列"><a href="#列" class="headerlink" title="列"></a>列</h3><h4 id="过多的列"><a href="#过多的列" class="headerlink" title="过多的列"></a>过多的列</h4><p>由于MySQL的存储层与通信层之间使用的缓冲格式是列缓冲的, 过多的列转行导致开销增大.</p><h3 id="范式"><a href="#范式" class="headerlink" title="范式"></a>范式</h3><ul>
<li>1NF: 列具有原子性, 不可拆分.</li>
<li>2NF: 在满足第一范式(1NF)的基础上, 每一个非码属性(不在主键中的列)都必须<strong>完全函数依赖于</strong> <em>候选码</em>. 若某些非主属性不依赖于候选码, 那么会不符合2NF.</li>
<li>3NF: 在满足第二范式(2NF)的基础上, 每个非主属性不依赖于其它非主属性, 即在2NF基础上,消除非码属性对候选码的传递函数依赖.(即所有非主属性对主属性直接依赖, 不能传递地依赖.)</li>
<li>BCNF: 非主属性不能对主属性部分依赖.</li>
</ul><a id="more"></a>
<h3 id="数据类型"><a href="#数据类型" class="headerlink" title="数据类型"></a>数据类型</h3><h4 id="VARCHAR-vs-CHAR"><a href="#VARCHAR-vs-CHAR" class="headerlink" title="VARCHAR vs. CHAR"></a>VARCHAR vs. CHAR</h4><ul>
<li>varchar 保留尾部空格, char不保留.</li>
<li>varchar 适用于字符串长度方差大的.</li>
<li>varchar 省空间但是update成本高.</li>
</ul>
<h4 id="BLOB-amp-TEXT"><a href="#BLOB-amp-TEXT" class="headerlink" title="BLOB & TEXT"></a>BLOB & TEXT</h4><ul>
<li>都是存储指针, 需要二次访问.</li>
<li>可以做索引, 但是必须给出前缀长度.</li>
</ul>
<h2 id="事务"><a href="#事务" class="headerlink" title="事务"></a>事务</h2><h3 id="A-C-I-D"><a href="#A-C-I-D" class="headerlink" title="A.C.I.D."></a>A.C.I.D.</h3><ul>
<li>Atomicity(原子性):一个事务(transaction)中的所有操作,或者全部完成,或者全部不完成,不会结束在中间某个环节。事务在执行过程中发生错误,会被回滚(Rollback)到事务开始前的状态,就像这个事务从来没有执行过一样。即,事务不可分割、不可约简。</li>
<li>Consistency(一致性):在事务开始之前和事务结束以后,数据库的完整性没有被破坏。这表示写入的资料必须完全符合所有的预设约束、触发器、级联回滚等。</li>
<li>Isolation(隔离性):数据库允许多个并发事务同时对其数据进行读写和修改的能力,隔离性可以防止多个事务并发执行时由于交叉执行而导致数据的不一致。事务隔离分为不同级别,包括未提交读(Read uncommitted)、提交读(read committed)、可重复读(repeatable read)和串行化(Serializable)。</li>
<li>Durability(持久性):事务处理结束后,对数据的修改就是永久的,即便系统故障也不会丢失。</li>
</ul>
<table>
<thead>
<tr>
<th>出现的问题</th>
<th>原因</th>
<th>解决</th>
</tr>
</thead>
<tbody>
<tr>
<td>脏读</td>
<td>读取了其他事务未提交的数据, 导致数据错误.</td>
<td>改为提交读. —— 只能读取事务提交后的数据.</td>
</tr>
<tr>
<td>不可重复读</td>
<td>一个对同一数据的两次查询之间, 另一个事务做出了修改, 导致前后不一致.</td>
<td>改为可重复读. —— 使用 X, S锁.</td>
</tr>
<tr>
<td>幻读</td>
<td>查询结果为多个记录的时候, 前后两次这样的查询之间, 插入了新的记录, 导致前后结果不一致</td>
<td>串行化 或 Next-Key Lock</td>
</tr>
<tr>
<td>更新丢失</td>
<td>两个事务同时对同一个数据进行更新操作, 其中在read-and-write阻塞, 尽管加S, X锁仍然导致错误.</td>
<td>串行化.</td>
</tr>
</tbody>
</table>
<blockquote>
<p>更新丢失出现, 有可能并非事务隔离能避免的.</p>
<p>例如现在两个程序端, 同时对同一个人取款. 初始1000, 事务一取 100块, 事务二取 1 块.</p>
<p>共计取了101, 应该为899元, 结果却为999元.</p>
<p>串行化可以解决此类问题.</p>
</blockquote>
<p><img src="/images/Mysql_Note/0.png" alt="事务"></p>
<h4 id="事务分类"><a href="#事务分类" class="headerlink" title="事务分类"></a>事务分类</h4><ul>
<li><p>扁平事务</p>
</li>
<li><p>带保存点的扁平事务</p>
</li>
<li><p>链事务</p>
</li>
<li><p>嵌套事务</p>
</li>
</ul>
<h4 id="事务陷阱"><a href="#事务陷阱" class="headerlink" title="事务陷阱"></a>事务陷阱</h4><ul>
<li>自动回滚 —— 自动回滚不会抛出异常, 无法知道异常. (SQL Server 会抛出异常, 这个是MySQL特有的缺点)</li>
<li>自动提交.</li>
<li>在循环中提交 —— 意外导致在未知位置终止, 无法还原.</li>
</ul>
<h3 id="锁"><a href="#锁" class="headerlink" title="锁"></a>锁</h3><h4 id="行锁"><a href="#行锁" class="headerlink" title="行锁"></a>行锁</h4><h5 id="共享锁-独占锁"><a href="#共享锁-独占锁" class="headerlink" title="共享锁, 独占锁"></a>共享锁, 独占锁</h5><ul>
<li>允许多个事务读同一行.</li>
<li>当事务对同一行进行写操作, 将阻塞其他所有事务的读、写操作.</li>
</ul>
<h4 id="行锁算法"><a href="#行锁算法" class="headerlink" title="行锁算法"></a>行锁算法</h4><h5 id="Record-Lock"><a href="#Record-Lock" class="headerlink" title="Record Lock"></a>Record Lock</h5><p>单一行锁.</p>
<h5 id="Gap-Lock"><a href="#Gap-Lock" class="headerlink" title="Gap Lock"></a>Gap Lock</h5><p>锁定一个区间, 但是不包括该记录本身.</p>
<h5 id="Next-Key-Lock"><a href="#Next-Key-Lock" class="headerlink" title="Next-Key Lock"></a>Next-Key Lock</h5><p>锁定该记录本身, 并且锁定包含它的一个范围.</p>
<h4 id="粒度锁"><a href="#粒度锁" class="headerlink" title="粒度锁"></a>粒度锁</h4><h5 id="意向锁-树状锁"><a href="#意向锁-树状锁" class="headerlink" title="意向锁 : 树状锁"></a>意向锁 : 树状锁</h5><p>锁的结构呈树状, 粗粒度锁拥有几个粒度细的锁.</p>
<p>叶子的粒度细, 父节点的粒度粗.</p>
<h2 id="索引"><a href="#索引" class="headerlink" title="索引"></a>索引</h2><h3 id="索引的相关理论"><a href="#索引的相关理论" class="headerlink" title="索引的相关理论"></a>索引的相关理论</h3><h4 id="最左前缀匹配"><a href="#最左前缀匹配" class="headerlink" title="最左前缀匹配"></a>最左前缀匹配</h4><p>MySQL引擎采用最左前缀匹配的机制, 可以高效从左匹配.</p>
<p>所以最左端是模糊查询的时候, 最左前缀匹配失效, 导致性能降低.</p>
<h4 id="可选择性"><a href="#可选择性" class="headerlink" title="可选择性"></a>可选择性</h4><p>可选择性的意思是, 针对某一字段查询出的结果, 对于这个的分区度.(这一字段的重复程度)</p>
<blockquote>
<p>例如, 性别的区分度就很低: 根据某个性别查询出的结果, 性别的区分度及其低. </p>
<p>再如, 年龄、城市的区分度稍稍高于性别的区分度.</p>
<p>最高的区分度莫过于ID之类的字段. 其是唯一的, 所以它的可选择性及高.</p>
</blockquote>
<p>解决可选择性低的方法是, 将几个可选择性低的字段使用多列索引组合起来, 让其可区分度上升.</p>
<h4 id="聚集索引-vs-辅助索引"><a href="#聚集索引-vs-辅助索引" class="headerlink" title="聚集索引 vs. 辅助索引"></a>聚集索引 vs. 辅助索引</h4><ul>
<li>聚集索引: 也称主码索引, 使用主码建立的索引, 直接指向数据.</li>
<li>辅助索引: 非主码索引, 该索引指向主码. 先用该索引定位主码, 再用主码定位数据.</li>
</ul>
<h3 id="聚簇索引"><a href="#聚簇索引" class="headerlink" title="聚簇索引"></a>聚簇索引</h3><p>将某一列相同相近的记录排在一起, 进行索引.</p>
<blockquote>
<p>其提高了I/O密集型的性能, 相关数据保存在一起.</p>
</blockquote>
<h4 id="聚簇-vs-可选择性-——-UUID-vs-自增ID的选择及UUID的优化"><a href="#聚簇-vs-可选择性-——-UUID-vs-自增ID的选择及UUID的优化" class="headerlink" title="聚簇 vs 可选择性 —— UUID vs. 自增ID的选择及UUID的优化"></a>聚簇 vs 可选择性 —— UUID vs. 自增ID的选择及UUID的优化</h4><ul>
<li>UUID应该使用UNHEX存储在Binary(16)中, 而不是用字符串存储.</li>
<li>UUID的引入, 导致了索引的随机性大大增加, 索引占用空间变大.</li>
<li>因为UUID是随机的, 每一页都是随机填充的, 导致页内碎片变多. 而且因为写入是乱序的, 导致频繁的页分裂.</li>
<li>UUID是随机的, 导致大量的随机I/O, 性能大幅下降.</li>
<li>自增ID会有自增锁, 频繁的锁也会导致阻塞. </li>
<li>自增PK会成为热点数据, 并发插入导致<strong>间隙竞争</strong>可以考虑修改<code>innodb_autoinc_lock_mode</code> 进行优化.</li>
</ul>
<blockquote>
<p> 折中方案:</p>
<p>使用前缀顺序+后缀随机的组合式ID可以达到这两种极端的综合.</p>
</blockquote>
<h3 id="多列索引"><a href="#多列索引" class="headerlink" title="多列索引"></a>多列索引</h3><p>为了解决可选择性过低的问题, 多个选择性低的字段, 可用多列索引组合起来, 提高可选择性, 这就是多列索引.</p>
<p>但是选择性高的字段组合起来, 会增加负担.</p>
<blockquote>
<p>例如有一些待查询的列: <性别, 年龄, 城市></p>
<ul>
<li>性别的可选择性最低, 因为查询出的人性别无非就几种, 这列的查询结果重复度极高.</li>
<li>其次, 年龄一般在0~99之间集中分布, 重复读虽然低于性别, 但是几十万的数据集中分布在0~99这个区间, 重复度也不低.</li>
<li>同理, 城市的区分度高于前两者, 但是可选择性也是一般.</li>
</ul>
<p>此时, 可以建立一个多列索引, 将这三个可选择性低的列绑成一个多列索引, 可以大大提高可选择性.</p>
</blockquote>
<h3 id="覆盖索引"><a href="#覆盖索引" class="headerlink" title="覆盖索引***"></a>覆盖索引***</h3><blockquote>
<p>如果一个索引覆盖了查询字段的所有值, 那么称其为覆盖索引.</p>
</blockquote>
<p>优点:</p>
<ul>
<li>索引条目少.</li>
<li>索引按照列值存储, 使用顺序的磁盘I/O; 少次的顺序I/O比多次的随机I/O性能好.</li>
</ul>
<p><strong>索引失效情况</strong></p>
<ul>
<li>索引没有覆盖到查询列. —— <strong>使用延迟查询解决.</strong></li>
<li>MySQL5.5 之后可以使用最左前缀匹配的Like查询, 但是非最左前缀匹配的模糊查询将使得索引失效.</li>
<li>ORDER BY, GROUP BY所用的列未索引覆盖, 导致无法索引排序.</li>
</ul>
<h3 id="索引排序"><a href="#索引排序" class="headerlink" title="索引排序***"></a>索引排序***</h3><blockquote>
<p>利用索引对数据进行排序.</p>
</blockquote>
<p>针对 ORDER BY和GROUP BY</p>
<ul>
<li>只有当查询的列顺序和排序的列顺序相同, 且排序方向相同(都是升序或都是降序), MySQL才会对其进行索引排序.</li>
<li>关联多表的ORDER BY、GROUP BY引用的字段必须全部为首张表, 才会触发索引排序.</li>
<li>必须满足最左前缀匹配.</li>
<li>范围查询中的列与ORDER BY、GROUP BY中的列要一致, 才能出发索引排序.</li>
</ul>
<h3 id="冗余索引"><a href="#冗余索引" class="headerlink" title="冗余索引"></a>冗余索引</h3><blockquote>
<p>若有索引$(A,B)$ 之后又出现了 索引$(A)$ , 则 $(A)$ 是$(A,B)$的索引.</p>
<p>但$(A)$不是$(B,A)$ 的索引.</p>
</blockquote>
<p>因为索引多, 虽然查询快, 但是插入慢(需要维护索引的成本变高). 冗余、重复的索引会降低性能.</p>
<h3 id="索引类型"><a href="#索引类型" class="headerlink" title="索引类型"></a>索引类型</h3><blockquote>
<p>EXPLAIN中的extra字段可显示. 指检索数据的方式.</p>
</blockquote>
<table>
<thead>
<tr>
<th>Extra</th>
<th>意思</th>
<th>可能出现的情况</th>
</tr>
</thead>
<tbody>
<tr>
<td>using index</td>
<td>使用索引</td>
<td>覆盖索引, 高效.</td>
</tr>
<tr>
<td>using where</td>
<td>存储引擎收到记录后进行过滤</td>
<td>索引未覆盖, 或部分覆盖.</td>
</tr>
<tr>
<td>using condition</td>
<td>使用条件</td>
<td>查询会先条件过滤索引,过滤完索引后找到所有符合索引条件的数据行,<br>随后用 WHERE 子句中的其他条件去过滤这些数据行.</td>
</tr>
<tr>
<td>using filesort</td>
<td>使用文件排序</td>
<td>无法利用索引完成的排序操作称为“文件排序”.</td>
</tr>
<tr>
<td>using temporary</td>
<td>使用临时表</td>
<td>需要使用临时表来存储结果集,常见于排序和分组查询.</td>
</tr>
</tbody>
</table>
<h3 id="三星索引"><a href="#三星索引" class="headerlink" title="三星索引"></a>三星索引</h3><table>
<thead>
<tr>
<th>星级</th>
<th>定义</th>
<th>原理</th>
<th>实现</th>
</tr>
</thead>
<tbody>
<tr>
<td>🌟</td>
<td>这个索引与一个查询相关的索引行是相邻的, 或者是相近的</td>
<td>它最小化了必须扫描的索引片的宽度</td>
<td>把 WHERE 后的等值条件列作为索引最开头的列</td>
</tr>
<tr>
<td>🌟🌟</td>
<td>索引行的顺序与查询语句的需求一致</td>
<td>排除了排序操作</td>
<td>ORDER BY的列都在索引中, 且与查询顺序一致</td>
</tr>
<tr>
<td>🌟🌟🌟</td>
<td>如果索引行中包含查询语句中的所有列</td>
<td>覆盖索引</td>
<td>实现覆盖索引</td>
</tr>
</tbody>
</table>
<h2 id="查询"><a href="#查询" class="headerlink" title="查询"></a>查询</h2><blockquote>
<p>查询变慢的原因包括但不限于:</p>
<ul>
<li>返回了过多的数据.</li>
<li>返回了不需要的列.</li>
</ul>
</blockquote>
<h3 id="扫描类型"><a href="#扫描类型" class="headerlink" title="扫描类型"></a>扫描类型</h3><blockquote>
<p>使用EXPLAIN语句可以查看该查询的扫描类型, 在type字段. 指检索数据的集合.</p>
<p>性能高低:</p>
<p>常数引用 > 唯一索引查询 > 范围扫描 > 索引扫描 > 全表扫描</p>
</blockquote>
<p>扫描类型</p>
<table>
<thead>
<tr>
<th style="text-align:center">类型</th>
<th style="text-align:center">名称</th>
<th>成因</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center">ALL</td>
<td style="text-align:center">全表扫描</td>
<td>查询扫描了全部整个表, 性能最低</td>
</tr>
<tr>
<td style="text-align:center">index</td>
<td style="text-align:center">索引全扫描</td>
<td>查询扫描了整个索引</td>
</tr>
<tr>
<td style="text-align:center">range</td>
<td style="text-align:center">索引范围扫描</td>
<td>是范围查询</td>
</tr>
<tr>
<td style="text-align:center">ref</td>
<td style="text-align:center">非唯一索引扫描</td>
<td>非唯一索引或者使用了最左前缀匹配</td>
</tr>
<tr>
<td style="text-align:center">eq_ref</td>
<td style="text-align:center">唯一索引扫描</td>
<td>一般出现在多表连接时使用primary key或者unique index作为关联条件</td>
</tr>
<tr>
<td style="text-align:center">const,system</td>
<td style="text-align:center">单表最多有一个匹配行</td>
<td>出现在根据主键primary key, 或者唯一索引 unique index 进行的查询</td>
</tr>
<tr>
<td style="text-align:center">NULL</td>
<td style="text-align:center">不用扫描表或索引</td>
<td>不用访问表或者索引,直接就能够得到结果</td>
</tr>
</tbody>
</table>
<h3 id="简单查询-vs-复杂查询"><a href="#简单查询-vs-复杂查询" class="headerlink" title="简单查询 vs. 复杂查询"></a>简单查询 vs. 复杂查询</h3><p>如果条件允许, 应该使用多个简单查询替代一个复杂查询. </p>
<p>这样的查询结果很高效, 而且还能将查询结果缓存.</p>
<h3 id="关联查询"><a href="#关联查询" class="headerlink" title="关联查询"></a>关联查询</h3><p>MySQL的关联查询是一个左长臂二叉树, 而不是AVL树. 大量的联合查询导致左子树呈现出线性的特征, 导致效率低下.</p>
<p>需要避免过多的多表查询, 或者手动优化JOIN的方式, 使查询树的高度变低.</p>
<p> JOIN</p>
<p> / \</p>
<p> JOIN table4</p>
<p> / \</p>
<p> JOIN table3</p>
<p> / \</p>
<p>table1 table2</p>
<h3 id="延迟关联"><a href="#延迟关联" class="headerlink" title="延迟关联"></a>延迟关联</h3><p>若在查询中有索引无法覆盖的列, 应该拆分出来成为一个子查询, 使用例如left outer/inner join进行连接后查询.</p>
<blockquote>
<p>例如:</p>
<figure class="highlight plain"><table><tr><td class="code"><pre><span class="line">> SELECT id, description from tableA order by title limit 50,5; -- title没有索引覆盖</span><br><span class="line">> -- 化为</span><br><span class="line">> SELECT id, decription from tableA inner join (SELECT id from tableA order by title) as tmp USING(id); -- 缩小非索引覆盖的范围. 子查询只查pk.</span><br><span class="line">> </span><br><span class="line">> SELECT id, cu_id, name, info, biz_type, gmt_create, gmt_modified,start_time, end_time, market_type, back_leaf_category,item_status,picuture_url FROM relation where biz_type ='0' AND end_time >='2014-05-29' ORDER BY id asc LIMIT 149420 ,20;</span><br><span class="line">> </span><br><span class="line">> SELECT a.* FROM relation a, (select id from relation where biz_type ='0' AND end_time >='2014-05-29' ORDER BY id asc LIMIT 149420 ,20 ) b where using(id)</span><br><span class="line">></span><br></pre></td></tr></table></figure>
</blockquote>
<blockquote>
</blockquote>
<h3 id="可选择性-amp-WHERE的优化"><a href="#可选择性-amp-WHERE的优化" class="headerlink" title="可选择性 & WHERE的优化"></a>可选择性 & WHERE的优化</h3><ul>
<li><p>可选择性高的列放在前查询, 查询结果基数降低, 可以提高性能.</p>
</li>
<li><p>所以, WHERE <strong>精确值查询</strong>{可选择性高, …, 可选择性低} + <strong>范围查询</strong>{可选择性高(例如IN语句), >< !=,…, 可选择性低(区分度低的列)}</p>
</li>
<li><p>索引未覆盖的列应该进行拆分, 进行延迟关联.</p>
</li>
</ul>
<h3 id="范围查询"><a href="#范围查询" class="headerlink" title="范围查询"></a>范围查询</h3><p>避免多个范围查询: MySQL无法同时使用两个范围查询的索引.</p>
<p>使用IN()替代范围查询, 会提高效率.</p>
<h3 id="IN-amp-EXIST"><a href="#IN-amp-EXIST" class="headerlink" title="IN() & EXIST()"></a>IN() & EXIST()</h3><ul>
<li>EXITS的可以使得查询提前结束, 从而提高效率.</li>
<li>使用IN效率高于比较符号. 但是IN的代价很高, IN()中的值最好不要过多. 因为IN()会对其内部的值进行一个全排列. 例如IN()中写了4个值, 那么其就要对这4个值做一个全排列$4! = 24$ , 可以看出, 这个代价是连乘积复杂度, IN()也并非银弹.</li>
<li>OUTER JOIN的效率 > EXITS的效率 > INNER JOIN的效率.</li>
</ul>
<h3 id="子查询"><a href="#子查询" class="headerlink" title="子查询"></a>子查询</h3><ul>
<li>由于MySQL引擎的原因, 子查询效果很差, 使用left outer join的方式替代, 可以提升性能.</li>
<li>IN()中的子查询效率比EXIST()的子查询低很多, 最好别用IN子查询, 应该优先使用EXIST子查询. (MySQL查询引擎缺陷导致的问题)</li>
</ul>
<h3 id="UNION优化"><a href="#UNION优化" class="headerlink" title="UNION优化"></a>UNION优化</h3><ul>
<li>order by / limit语句写在每个union的子查询中, 而不是最后再order by / limit.</li>
</ul>
<h3 id="LIMIT优化"><a href="#LIMIT优化" class="headerlink" title="LIMIT优化"></a>LIMIT优化</h3><ul>
<li>延迟关联</li>
<li><del>使用between and 语句代替比较符号.</del></li>
</ul>
<h2 id="分区"><a href="#分区" class="headerlink" title="分区"></a>分区</h2><blockquote>
<p>分区也并非银弹.</p>
</blockquote>
<ul>
<li>所有对分区的操作, 第一步都会锁住分区的底层表, 然后判定操作范围, 解锁操作范围之外的分区, 再做相应的操作,.</li>
<li>B-Tree索引将失效.</li>
<li>索引维护极高.</li>
</ul>
]]></content>
</entry>
<entry>
<title>PM2.5 预测</title>
<url>/2018/12/24/PM2-5/</url>
<content><![CDATA[<h1 id="使用Linear-Regression-对PM2-5进行预测"><a href="#使用Linear-Regression-对PM2-5进行预测" class="headerlink" title="使用Linear Regression 对PM2.5进行预测"></a>使用Linear Regression 对PM2.5进行预测</h1><h2 id="数据集"><a href="#数据集" class="headerlink" title="数据集"></a>数据集</h2><blockquote>
<p><a href="https://github.com/VIXNESS/machine-learning-course/blob/master/pm25_predict/train.csv" target="_blank" rel="noopener">training data</a></p>
<p><a href="https://github.com/VIXNESS/machine-learning-course/blob/master/pm25_predict/test.csv" target="_blank" rel="noopener">testing data: samples</a></p>
<p><a href="https://github.com/VIXNESS/machine-learning-course/blob/master/pm25_predict/ans.csv" target="_blank" rel="noopener">testing data: label</a></p>
<p>Training data 和 Public testing data 的组织形式:</p>
<p>一天由18行组成,一行为一个指标,一共由18个指标,从第4列开始记录每个指标一天内24小时的变化数值,每个月连续记录前20天作为training set,后10天作为testing set,一共记录了240个小时</p>
</blockquote><a id="more"></a>
<table>
<thead>
<tr>
<th>日期</th>
<th>观测站</th>
<th>指标</th>
<th>0时</th>
<th>…</th>
<th>23时</th>
</tr>
</thead>
<tbody>
<tr>
<td>day 1</td>
<td>xxx</td>
<td>PM2.5</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>day 1</td>
<td>xxx</td>
<td>PM10</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>day 1</td>
<td>xxx</td>
<td>SO2</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>day 1</td>
<td>xxx</td>
<td>…</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>day 2</td>
<td>xxx</td>
<td>PM2.5</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>day 2</td>
<td>xxx</td>
<td>PM10</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>day 2</td>
<td>xxx</td>
<td>SO2</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>day 2</td>
<td>xxx</td>
<td>…</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<h2 id="数据处理"><a href="#数据处理" class="headerlink" title="数据处理"></a>数据处理</h2><h3 id="必要类库"><a href="#必要类库" class="headerlink" title="必要类库"></a>必要类库</h3><figure class="highlight python"><table><tr><td class="code"><pre><span class="line"><span class="keyword">import</span> os</span><br><span class="line"><span class="keyword">import</span> tensorflow <span class="keyword">as</span> tf</span><br><span class="line"><span class="keyword">from</span> tensorflow <span class="keyword">import</span> keras</span><br><span class="line"><span class="keyword">import</span> csv</span><br><span class="line"><span class="keyword">import</span> sys</span><br><span class="line"><span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br><span class="line"><span class="keyword">import</span> matplotlib.pyplot <span class="keyword">as</span> plt</span><br></pre></td></tr></table></figure>
<h3 id="载入数据"><a href="#载入数据" class="headerlink" title="载入数据"></a>载入数据</h3><figure class="highlight python"><table><tr><td class="code"><pre><span class="line">data = []</span><br><span class="line"><span class="keyword">for</span> i <span class="keyword">in</span> range(<span class="number">18</span>):</span><br><span class="line"> data.append([]) <span class="comment"># 初始化18列</span></span><br><span class="line">n_row = <span class="number">0</span></span><br><span class="line"><span class="keyword">with</span> open(<span class="string">'train.csv'</span>,<span class="string">'r'</span>,encoding = <span class="string">'big5'</span>) <span class="keyword">as</span> text: <span class="comment">#csv编码是big5</span></span><br><span class="line"> row = csv.reader(text, delimiter = <span class="string">","</span>)</span><br><span class="line"> <span class="keyword">for</span> r <span class="keyword">in</span> row:</span><br><span class="line"> <span class="keyword">if</span> n_row != <span class="number">0</span>:</span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> range(<span class="number">3</span>,<span class="number">27</span>):</span><br><span class="line"> <span class="keyword">if</span> r[i] != <span class="string">"NR"</span>: <span class="comment">#NR为未降雨,对其设置为降雨量</span></span><br><span class="line"> data[(n_row - <span class="number">1</span>) % <span class="number">18</span>].append(float(r[i]))</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="comment">#设施0降雨为一个接近0的小量,</span></span><br><span class="line"> <span class="comment">#若设为0,后续的梯度计算会有除0的风险</span></span><br><span class="line"> data[(n_row - <span class="number">1</span>) % <span class="number">18</span>].append(float(<span class="number">0.0001</span>))</span><br><span class="line"> n_row += <span class="number">1</span></span><br></pre></td></tr></table></figure>
<h3 id="重新组织数据"><a href="#重新组织数据" class="headerlink" title="重新组织数据"></a>重新组织数据</h3><blockquote>
<p>将之前的数据重新组织,对每个小时进行连续拼接</p>
</blockquote>
<table>
<thead>
<tr>
<th>Features</th>
<th>0时</th>
<th>…</th>
<th>23时</th>
<th>0时(次日)</th>
<th>…</th>
<th>23时</th>
</tr>
</thead>
<tbody>
<tr>
<td>PM2.5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>…</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>..</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>PM 10</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line">x = [] <span class="comment"># 样本矩阵</span></span><br><span class="line">y = [] <span class="comment"># 实际的值</span></span><br><span class="line"><span class="keyword">for</span> i <span class="keyword">in</span> range(<span class="number">12</span>): <span class="comment"># 12个月</span></span><br><span class="line"> <span class="keyword">for</span> j <span class="keyword">in</span> range(<span class="number">471</span>): </span><br><span class="line"> <span class="comment"># 每输入9个小时的数值,预测第10个小时的PM2.5值,</span></span><br><span class="line"> <span class="comment"># 这样连续的「10个小时」每个月有471个</span></span><br><span class="line"> x.append([]) <span class="comment">#</span></span><br><span class="line"> <span class="keyword">for</span> w <span class="keyword">in</span> range(<span class="number">18</span>): <span class="comment"># 遍历18个特征</span></span><br><span class="line"> <span class="keyword">for</span> t <span class="keyword">in</span> range(<span class="number">9</span>): <span class="comment"># 遍历前9个小时</span></span><br><span class="line"> x[<span class="number">471</span> * i + j].append(data[w][<span class="number">480</span> * i + j + t])</span><br><span class="line"> <span class="comment"># 将第10个小时的值作为实际的PM2.5的值</span></span><br><span class="line"> y.append(data[<span class="number">9</span>][<span class="number">480</span> * i + j + <span class="number">9</span>])</span><br><span class="line">x = np.array(x)</span><br><span class="line">y = np.array(y)</span><br><span class="line"></span><br><span class="line"><span class="comment">#在第一列添上一条全为1的列作为bias</span></span><br><span class="line">x = np.concatenate((np.ones((x.shape[<span class="number">0</span>],<span class="number">1</span>)),x),axis = <span class="number">1</span>) </span><br><span class="line">w = np.zeros(x.shape[<span class="number">1</span>]) <span class="comment">#weight</span></span><br></pre></td></tr></table></figure>
<h2 id="训练"><a href="#训练" class="headerlink" title="训练"></a>训练</h2><h3 id="定义loss-function"><a href="#定义loss-function" class="headerlink" title="定义loss function"></a>定义loss function</h3><blockquote>
<p>使用 error square<br><figure class="highlight python"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">lossFunction</span><span class="params">(target,weight,samples)</span>:</span></span><br><span class="line"> M = target - np.dot(weight,samples.T)</span><br><span class="line"> loss = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> m <span class="keyword">in</span> M:</span><br><span class="line"> loss += m**<span class="number">2</span></span><br><span class="line"> <span class="keyword">return</span> loss</span><br></pre></td></tr></table></figure></p>
</blockquote>
<h3 id="Gradient-Descent"><a href="#Gradient-Descent" class="headerlink" title="Gradient Descent"></a>Gradient Descent</h3><blockquote>
<p>使用Adagra 对learning rate进行控制</p>
</blockquote>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line">lr = <span class="number">8</span> <span class="comment">#learning rate 设置</span></span><br><span class="line">pre_grad = np.ones(x.shape[<span class="number">1</span>])<span class="comment"># 每个特征有独立的learning rate</span></span><br><span class="line"><span class="keyword">for</span> r <span class="keyword">in</span> range(<span class="number">10000</span>):</span><br><span class="line"> temp_loss = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> m <span class="keyword">in</span> range(<span class="number">36</span>):</span><br><span class="line"> <span class="keyword">for</span> s <span class="keyword">in</span> range(<span class="number">156</span>):</span><br><span class="line"> L = np.dot(w,x[<span class="number">157</span> * m + s].T) - y[<span class="number">157</span> * m + s]</span><br><span class="line"> grad = np.dot(x[<span class="number">157</span> * m + s].T,L)*(<span class="number">2</span>)</span><br><span class="line"> pre_grad += grad**<span class="number">2</span></span><br><span class="line"> ada = np.sqrt(pre_grad)</span><br><span class="line"> w = w - lr * grad/ada</span><br><span class="line"> temp_loss += abs(np.dot(w,x[<span class="number">157</span> * m + <span class="number">156</span>].T) - y[<span class="number">157</span> * m + <span class="number">156</span>])</span><br><span class="line"> print(<span class="string">"%.2f"</span> % (r * <span class="number">100</span> / <span class="number">10000</span>),<span class="string">'% loss:'</span>,<span class="string">"%.4f"</span> % (temp_loss / <span class="number">36</span>))</span><br></pre></td></tr></table></figure>
<blockquote>
<p>保存 weights<br><figure class="highlight python"><table><tr><td class="code"><pre><span class="line">np.save(<span class="string">'model.npy'</span>,w)</span><br></pre></td></tr></table></figure></p>
</blockquote>
<h2 id="测试"><a href="#测试" class="headerlink" title="测试"></a>测试</h2><ul>
<li>加载测试特征数据集(略)</li>
<li>加载label</li>
</ul>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line">y = []</span><br><span class="line">rr = <span class="number">0</span></span><br><span class="line"><span class="keyword">with</span> open(<span class="string">'ans.csv'</span>,<span class="string">'r'</span>,encoding = <span class="string">'big5'</span>) <span class="keyword">as</span> ans:</span><br><span class="line"> row = csv.reader(ans,delimiter = <span class="string">','</span>)</span><br><span class="line"> <span class="keyword">for</span> r <span class="keyword">in</span> row:</span><br><span class="line"> <span class="keyword">if</span> rr != <span class="number">0</span>:</span><br><span class="line"> y.append(float(r[<span class="number">1</span>]))</span><br><span class="line"> rr += <span class="number">1</span></span><br><span class="line">y = np.array(y)</span><br></pre></td></tr></table></figure>
<ul>
<li>加载weights</li>
</ul>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line">w = np.load(<span class="string">'model.npy'</span>)</span><br></pre></td></tr></table></figure>
<ul>
<li>测试</li>
</ul>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line">t = np.dot(x,w)</span><br><span class="line">L = t - y</span><br><span class="line">loss = []</span><br><span class="line">sum = <span class="number">0</span></span><br><span class="line"><span class="keyword">for</span> l <span class="keyword">in</span> L:</span><br><span class="line"> loss.append(abs(l))</span><br><span class="line"> sum += abs(l)</span><br><span class="line">print(sum / len(L))</span><br><span class="line">plt.plot(y,color = <span class="string">"red"</span>,label = <span class="string">'target'</span>)</span><br><span class="line">plt.plot(t,color = <span class="string">"blue"</span>,label = <span class="string">'hypothesis'</span>)</span><br><span class="line">plt.ylabel(<span class="string">'pm 2.5'</span>)</span><br><span class="line">plt.show()</span><br></pre></td></tr></table></figure>
<h3 id="结果"><a href="#结果" class="headerlink" title="结果"></a>结果</h3><p><img src="/images/PM25/model_1.png" alt="结论"></p>
<blockquote>
<p>PM2.5 误差 <strong>14.427</strong><br><del>参数太多过拟合</del>没有训练好,卡在了某个地方了,training时候的loss也很高</p>
</blockquote>
<h2 id="再优化"><a href="#再优化" class="headerlink" title="再优化"></a>再优化</h2><blockquote>
<p>只取18个特征中的NMHC、NO2、O3、PM10、PM2.5</p>
</blockquote>
<p><img src="/images/PM25/model_1.png" alt="model_2"></p>
<blockquote>
<p>PM2.5 误差 <strong>8.987</strong><br>一个不错的开头,继续优化</p>
<p>只考虑PM 10和PM 2.5</p>
</blockquote>
<p><img src="/images/PM25/model_3.png" alt="model_3"></p>
<blockquote>
<p>PM2.5 误差 <strong>6.281</strong></p>
</blockquote>
<blockquote>
<p>若在删减特征呢?<br>只考虑 PM2.5 </p>
</blockquote>
<p><img src="/images/PM25/model_4.png" alt="model_4"></p>
<blockquote>
<p>PM2.5 误差 <strong>5.406</strong><br>我服了,之前做的时候是会underfitting导致误差到7.4的,这回倒好更加低了</p>
</blockquote>
<h1 id="使用DNN-对PM2-5进行预测"><a href="#使用DNN-对PM2-5进行预测" class="headerlink" title="使用DNN 对PM2.5进行预测"></a>使用DNN 对PM2.5进行预测</h1><blockquote>
<p>使用的是tensorflow + keras<br>预备工作略</p>
</blockquote>
<h2 id="Feature-Scaling"><a href="#Feature-Scaling" class="headerlink" title="Feature Scaling"></a>Feature Scaling</h2><blockquote>
<p>用了两个不同的Feature Scaling的方法,结果上看差别不大,Standardization更加好一点</p>
</blockquote>
<h3 id="Standardization"><a href="#Standardization" class="headerlink" title="Standardization"></a>Standardization</h3><figure class="highlight python"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">standardization</span><span class="params">(dataMatrix)</span>:</span></span><br><span class="line"> <span class="keyword">if</span> dataMatrix.shape[<span class="number">0</span>] == <span class="number">0</span>:</span><br><span class="line"> <span class="keyword">return</span> dataMatrix</span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> range(dataMatrix.shape[<span class="number">1</span>]):</span><br><span class="line"> sum = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> _x <span class="keyword">in</span> dataMatrix:</span><br><span class="line"> sum += _x[i]</span><br><span class="line"> mean = sum / dataMatrix.shape[<span class="number">0</span>]</span><br><span class="line"> SD = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> _x <span class="keyword">in</span> dataMatrix:</span><br><span class="line"> SD += (_x[i] - mean)**<span class="number">2</span></span><br><span class="line"> SD = np.sqrt(SD / dataMatrix.shape[<span class="number">0</span>])</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">for</span> _x <span class="keyword">in</span> dataMatrix:</span><br><span class="line"> _x[i] = (_x[i] - mean) / SD</span><br><span class="line"> <span class="keyword">return</span> dataMatrix</span><br></pre></td></tr></table></figure>
<h3 id="Mean-Normalization"><a href="#Mean-Normalization" class="headerlink" title="Mean Normalization"></a>Mean Normalization</h3><figure class="highlight python"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">meanNormalization</span><span class="params">(dataMatrix)</span>:</span></span><br><span class="line"> <span class="keyword">if</span> dataMatrix.shape[<span class="number">0</span>] == <span class="number">0</span>:</span><br><span class="line"> <span class="keyword">return</span> dataMatrix</span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> range(dataMatrix.shape[<span class="number">1</span>]):</span><br><span class="line"> sum = <span class="number">0</span></span><br><span class="line"> max = <span class="number">0</span></span><br><span class="line"> min = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> data <span class="keyword">in</span> dataMatrix:</span><br><span class="line"> sum += data[i]</span><br><span class="line"> <span class="keyword">if</span> data[i] > max:</span><br><span class="line"> max = data[i]</span><br><span class="line"> <span class="keyword">if</span> data[i] < min:</span><br><span class="line"> min = data[i]</span><br><span class="line"> mean = sum / dataMatrix.shape[<span class="number">0</span>]</span><br><span class="line"> <span class="keyword">if</span> (max - min) != <span class="number">0</span>:</span><br><span class="line"> <span class="keyword">for</span> data <span class="keyword">in</span> dataMatrix:</span><br><span class="line"> data[i] = (data[i] - mean) / (max - min)</span><br><span class="line"> <span class="keyword">return</span> dataMatrix</span><br></pre></td></tr></table></figure>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line">trainX = standardization(trainX)</span><br><span class="line">testX = standardization(testX)</span><br><span class="line"><span class="comment"># trainX = meanNormalization(trainX)</span></span><br><span class="line"><span class="comment"># testX = meanNormalization(testX)</span></span><br></pre></td></tr></table></figure>
<h2 id="训练-1"><a href="#训练-1" class="headerlink" title="训练"></a>训练</h2><blockquote>
<p>使用output为8的两层layer,激活函数是ReLU(Sigmoid效果更加差)</p>
</blockquote>
<h3 id="建立模型"><a href="#建立模型" class="headerlink" title="建立模型"></a>建立模型</h3><figure class="highlight python"><table><tr><td class="code"><pre><span class="line">model = keras.Sequential([</span><br><span class="line"> keras.layers.Dense(<span class="number">8</span>, activation=tf.nn.relu),</span><br><span class="line"> keras.layers.Dense(<span class="number">8</span>, activation=tf.nn.relu),</span><br><span class="line"> keras.layers.Dense(<span class="number">1</span>)</span><br><span class="line"> ])</span><br><span class="line">model.compile(loss=<span class="string">"mse"</span>,</span><br><span class="line"> optimizer=tf.train.RMSPropOptimizer(<span class="number">0.001</span>),</span><br><span class="line"> metrics=[<span class="string">'mae'</span>, <span class="string">'mse'</span>])</span><br></pre></td></tr></table></figure>
<h3 id="训练-2"><a href="#训练-2" class="headerlink" title="训练"></a>训练</h3><blockquote>
<p>跑100个epochs基本上没什么变化了<br><figure class="highlight python"><table><tr><td class="code"><pre><span class="line">history = model.fit(trainX, </span><br><span class="line"> trainY, </span><br><span class="line"> batch_size = <span class="number">64</span>, </span><br><span class="line"> epochs = <span class="number">100</span>, </span><br><span class="line"> validation_split = <span class="number">0.2</span>, </span><br><span class="line"> verbose=<span class="number">0</span>, </span><br><span class="line"> callbacks=[PrintDot()])</span><br></pre></td></tr></table></figure></p>
</blockquote>
<blockquote>
<p>训练时候的loss<br><img src="/images/PM25/loss.png" alt="training_loss"></p>
</blockquote>
<h3 id="测试-1"><a href="#测试-1" class="headerlink" title="测试"></a>测试</h3><blockquote>
<p>loss是<strong>7.47</strong></p>
</blockquote>
<p><img src="/images/PM25/dnn.png" alt="test"></p>
<h2 id="再优化-1"><a href="#再优化-1" class="headerlink" title="再优化"></a>再优化</h2><h3 id="不使用Feature-Scaling"><a href="#不使用Feature-Scaling" class="headerlink" title="不使用Feature Scaling"></a>不使用Feature Scaling</h3><blockquote>
<p>训练时候的loss<br><img src="/images/PM25/non_fs.png" alt="nonFS"></p>
</blockquote>
<blockquote>
<p>测试时loss是<strong>5.170</strong><br><img src="/images/PM25/non_fs_rs.png" alt="noFSr"></p>
</blockquote>
<h3 id="增加layers"><a href="#增加layers" class="headerlink" title="增加layers"></a>增加layers</h3><blockquote>
<p>增加多一层layer<br> 训练时loss</p>
</blockquote>
<p><img src="/images/PM25/3layers_loss.png" alt="3layers"></p>
<blockquote>
<p>测试时loss <strong>7.5</strong></p>
</blockquote>
<blockquote>
<p>所以说DNN最优能达到5.17, Linear Model最优5.4<br>不错🎉🎉🎉🎉🎉🎉</p>
</blockquote>
<h1 id="Repo"><a href="#Repo" class="headerlink" title="Repo"></a>Repo</h1><p><a href="https://github.com/VIXNESS/machine-learning-course.git" target="_blank" rel="noopener">VIXNESS/machine-learning-course</a></p>
]]></content>
<categories>
<category>machine learning</category>
</categories>
<tags>
<tag>machine learning</tag>
<tag>linear regression</tag>
</tags>
</entry>
<entry>
<title>对批判性思维的几点见解</title>
<url>/2020/04/20/%E5%87%A0%E7%82%B9%E6%89%B9%E5%88%A4%E6%80%A7%E6%80%9D%E7%BB%B4%E7%9A%84%E8%A7%81%E8%A7%A3/</url>
<content><![CDATA[<h1 id="对批判性思维的几点见解"><a href="#对批判性思维的几点见解" class="headerlink" title="对批判性思维的几点见解"></a>对批判性思维的几点见解</h1><h2 id="论点的结构"><a href="#论点的结构" class="headerlink" title="论点的结构"></a>论点的结构</h2><p>对于接触到的大部分论点/结构/观点, 需要对其内核进行分析.</p><p>大部分情况下, 一个<strong>完整</strong>的观点是由</p><p><strong>论题 —— 论据(原因/理由) —— 结论(结果) —— 立场(价值观)</strong></p><p>组成.</p><p>一般情况下, 论点和论据会显式地给出. 而论题和立场可能需要进行推断.</p><a id="more"></a>
<p>单独给出论题的时候, 抛出的是一个开放性的讨论, 只需要包容地去讨论.</p>
<h2 id="从结论开始切入"><a href="#从结论开始切入" class="headerlink" title="从结论开始切入"></a>从结论开始切入</h2><p>结论和论据出现的先后顺序并不是固定的, 我们必须第一时间抓住其结论, 然后对结论和论据的关系作出分析.</p>
<h3 id="确立对方的结论和论据"><a href="#确立对方的结论和论据" class="headerlink" title="确立对方的结论和论据"></a>确立对方的结论和论据</h3><p>有的时候, 对方会先陈述理由, 再抛出一个结论, 或者反之.</p>
<ul>
<li>对于先讲出理由, 再抛出结论的情况, 我们可以在他陈述完观点后对其进行确认性提问:</li>
</ul>
<blockquote>
<p> “你之所以得出这个结论, 是因为你觉得是….导致的”</p>
</blockquote>
<p>提出类似以上的问题以确认对方的论点和论据.</p>
<ul>
<li>明确陈述中定义</li>
</ul>
<p>有时人们陈述的一些词是关键且模糊、定义不明确的, 或者是很主观的. </p>
<p>我们需要和对方进行一些关键定义的确认. </p>
<p>如果是主观的感受, 我们可以确定一个双方认可的范围.</p>
<blockquote>
<ul>
<li>“食堂的饭菜非常难吃, 这是对学生的不负责任”</li>
</ul>
<p>你所认为的负责是怎样的? —— 明确他所认为的「责任」定义.</p>
<ul>
<li>“这个水果能使得心情变好”</li>
</ul>
<p>你觉得的好心情是指?</p>
</blockquote>
<h3 id="确立对方价值观-立场-目的"><a href="#确立对方价值观-立场-目的" class="headerlink" title="确立对方价值观/立场/目的"></a>确立对方价值观/立场/目的</h3><p>产生争论的一大重要因素是价值观的不同导致的冲突.</p>
<p>当一个人提出他的观点, 我们需要推断他的「价值观」、「立场」和「目的」.</p>
<p>思考是怎样的立场和目的导致他得出的结论.</p>
<p>若发现是价值观的冲突, 我们会求同存异, 而不是进行激烈的争执.</p>
<p>改变他人价值观是一件困难的事情, 我们无需对他人价值观进行强行扭转.</p>
<h3 id="挣脱二元性结论的枷锁"><a href="#挣脱二元性结论的枷锁" class="headerlink" title="挣脱二元性结论的枷锁"></a>挣脱二元性结论的枷锁</h3><p><strong>结论的成立是条件性的:</strong></p>
<p>人们往往喜欢提出「是与非」的结论 —— 不是这样, 就是那样.</p>
<p>我们的第一反应, 不该是「是」或「非」而是对结论进行「目的」、「时间」、「环境」等因素的限定:</p>
<ul>
<li>某些时候结论成立与否.</li>
<li>某些时间成立与否.</li>
<li>出于某某目的是成立的, 出于其他目的是不成立的.</li>
<li>讨论利弊</li>
</ul>
<p><strong>对问题的结论进行扩充:</strong></p>
<ul>
<li>许多问题并不是非黑即白的(折衷性).</li>
<li>是否有其他解决方案(多面性) —— 解决方案是共存的, 并不是唯一的.</li>
</ul>
<h3 id="从原因-立场-目的推出其他结论"><a href="#从原因-立场-目的推出其他结论" class="headerlink" title="从原因/立场/目的推出其他结论"></a>从原因/立场/目的推出其他结论</h3><p>我们可以通过分析支撑结论的论据, 推导出在特定「目的」、「时间」、「环境」下的其他结论. </p>
<blockquote>
<p>“我们应该关停所有的网吧, 因为网吧使得青少年沉迷网络.”</p>
<p>结论: 我们应该关停网吧.</p>
<p>论据: 网吧会使得青少年沉迷网络.</p>
<p>根据目的扩充: 目的是为了解决青少年沉迷网络 —— 限制青少年的进入, 使用时长以及引导青少年. </p>
</blockquote>
<p><strong>进一步扩充讨论范围.</strong></p>
<p>明确「论题」, 从论题着手, 看看是否有其他的「解决」方法.</p>
<p>从「论题」的角度出发, 将针锋相对的争论化为包容的开放性讨论.</p>
<p>讨论的核心是<strong>解决问题</strong>, 而不是「争论」.</p>
<hr>
<h3 id="分析论据和结论的关系"><a href="#分析论据和结论的关系" class="headerlink" title="分析论据和结论的关系"></a>分析论据和结论的关系</h3><p>完成确立结论和论据的步骤之后, 我们对论据和结论进行关系分析.</p>
<ol>
<li><p>相关性检查: 这个这个论据和结论相关吗? </p>
<ul>
<li><p>人们最容易犯的错误, 将两个无关的东西绑定在一起.</p>
</li>
<li><p>人们很容易犯这种错误, 应首先对其进行检查.</p>
</li>
</ul>
</li>
<li><p>因果性检查: 论据和结论是因果关系么? 是不是其他原因导致的? 是不是和其他原因共同作用导致的? </p>
<ol>
<li>可以使用原因替换的方法, 思考其陈述的原因是否是其结论的直接/主要/决定性的.</li>
<li>想想是否会是其他原因导致的结果, 或其陈述的原因并非是与其结论有因果关系.</li>
</ol>
</li>
</ol>
<hr>
<h2 id="常见的谬误"><a href="#常见的谬误" class="headerlink" title="常见的谬误"></a>常见的谬误</h2><ul>
<li>民调, 大众观点便是正确的 —— 真理既不在多数人手里, 也不在少数人手里, 真理是客观存在, 与人无关.</li>
<li>只有完美解决某个问题的方法才能被接受, 否则就全盘否定它.</li>
<li>扣帽子 —— 因为他做了A这件事, 所以他就是那类人.</li>
<li>推导蔓延 —— 因为他做了A这件事, 所以他也会做B这件事. (禁止作为有效论据)</li>
</ul>
<hr>
<h2 id="谨防真假参半的论证"><a href="#谨防真假参半的论证" class="headerlink" title="谨防真假参半的论证"></a>谨防真假参半的论证</h2><blockquote>
<p>十句话, 九句是真的, 一句是假的.</p>
</blockquote>
<ul>
<li>来源确认.</li>
<li>统计学确认.</li>
<li>专家身份确认.</li>
</ul>
<p>大方向: 无充分证据表明, 则不相信其论点.</p>
<hr>
<h2 id="批判思维对自我的批判"><a href="#批判思维对自我的批判" class="headerlink" title="批判思维对自我的批判"></a>批判思维对自我的批判</h2><ul>
<li>慢思考: 思考多个可能存在的结论, 而不是急于证明自己想到的第一个观点.</li>
<li>尽可能的超越自己的立场和价值观, 思考反对者等其他方面的观点及其理由.</li>
<li>检查自己所用的定义是否精准, 明确自己的目的, 通过目的推演多个候选结论.</li>
<li>通过结论检查自己的论据/原因, 是否是唯一决定结论的: <ul>
<li>很多时候你的因素不是决定性的, 而是导致结论的因素「之一」.</li>
<li>可能根本就不是原因之一.</li>
</ul>
</li>
<li>结论很可能是非二元的:<ul>
<li>折衷的 —— 并不是非黑即白的.</li>
<li>多面的 —— 结论具有多面性, 结论「不唯一」, 更可能是多结论共存的.</li>
</ul>
</li>
<li>慎用类比, 类比会含潜在条件, 与原本的条件不一致.</li>
<li>谨慎的目标性论证: 「先定义一个目标, 再通过各种证据去支撑它」, 这样做会因为其目的性导致论证的片面, 需要慎重, 多思考<em>多面性</em>和<em>反面性</em>.</li>
<li>禁止蔓延式的推导: 因为事件A的发生, 事件B也会发生, 事件C也会发生. (可以考虑可能性, 禁止作为论据).</li>
<li>敢于说出「不知道」: <ul>
<li>对一个事物的不了解, 并不一定个人能力的问题.</li>
<li>而往往是当前无法获得充分的证据来支撑任何论点. </li>
<li>不妄下断言是谨慎的表现.</li>
</ul>
</li>
<li>批判性思维并不是为了和别人针锋相对, 恰恰相反, 要抱着学习和交流的心态进行交流.</li>
</ul>
]]></content>
<categories>
<category>others</category>
</categories>
<tags>
<tag>misc</tag>
</tags>
</entry>
<entry>
<title>DNN 年收入预测</title>
<url>/2018/12/24/%E5%B9%B4%E6%94%B6%E5%85%A5%E9%A2%84%E6%B5%8B/</url>
<content><![CDATA[<h1 id="收入预测"><a href="#收入预测" class="headerlink" title="收入预测"></a>收入预测</h1><blockquote>
<p>通过 职位、婚姻情况、学历、家庭角色、种族、性别对其进行年薪的预测🥳🥳</p>
</blockquote><h2 id="Data-Set"><a href="#Data-Set" class="headerlink" title="Data Set"></a>Data Set</h2><p><a href="https://github.com/VIXNESS/machine-learning-course/blob/master/winner/train.csv" target="_blank" rel="noopener">Training Data</a></p><p><a href="https://github.com/VIXNESS/machine-learning-course/blob/master/winner/test.csv" target="_blank" rel="noopener">Testing Data</a></p><h2 id="预备工作"><a href="#预备工作" class="headerlink" title="预备工作"></a>预备工作</h2><h3 id="所需类库"><a href="#所需类库" class="headerlink" title="所需类库"></a>所需类库</h3><figure class="highlight python"><table><tr><td class="code"><pre><span class="line"><span class="keyword">import</span> tensorflow <span class="keyword">as</span> tf</span><br><span class="line"><span class="keyword">from</span> tensorflow <span class="keyword">import</span> keras</span><br><span class="line"><span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br><span class="line"><span class="keyword">import</span> sys</span><br><span class="line"><span class="keyword">import</span> csv</span><br></pre></td></tr></table></figure><a id="more"></a>
<h3 id="数据装载"><a href="#数据装载" class="headerlink" title="数据装载"></a>数据装载</h3><blockquote>
<p>数据构成: [feature 1, … feature n; >= 50k or < 50k] 😵😵😵<br><figure class="highlight python"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">loadData</span><span class="params">(_x, _y, fileName)</span>:</span></span><br><span class="line"> col_1 = [<span class="string">' State-gov'</span>, </span><br><span class="line"> <span class="string">' Self-emp-not-inc'</span>, </span><br><span class="line"> <span class="string">' Private'</span>, </span><br><span class="line"> <span class="string">' Federal-gov'</span>, </span><br><span class="line"> <span class="string">' Local-gov'</span>, </span><br><span class="line"> <span class="string">' ?'</span>, </span><br><span class="line"> <span class="string">' Self-emp-inc'</span>, </span><br><span class="line"> <span class="string">' Without-pay'</span>, </span><br><span class="line"> <span class="string">' Never-worked'</span>]</span><br><span class="line"> col_3 = [<span class="string">' Bachelors'</span>, </span><br><span class="line"> <span class="string">' HS-grad'</span>, </span><br><span class="line"> <span class="string">' 11th'</span>, </span><br><span class="line"> <span class="string">' Masters'</span>, </span><br><span class="line"> <span class="string">' 9th'</span>, </span><br><span class="line"> <span class="string">' Some-college'</span>, </span><br><span class="line"> <span class="string">' Assoc-acdm'</span>, </span><br><span class="line"> <span class="string">' Assoc-voc'</span>, </span><br><span class="line"> <span class="string">' 7th-8th'</span>, </span><br><span class="line"> <span class="string">' Doctorate'</span>, </span><br><span class="line"> <span class="string">' Prof-school'</span>, </span><br><span class="line"> <span class="string">' 5th-6th'</span>, </span><br><span class="line"> <span class="string">' 10th'</span>, </span><br><span class="line"> <span class="string">' 1st-4th'</span>, </span><br><span class="line"> <span class="string">' Preschool'</span>, </span><br><span class="line"> <span class="string">' 12th'</span>]</span><br><span class="line"> col_5 = [<span class="string">' Never-married'</span>, </span><br><span class="line"> <span class="string">' Married-civ-spouse'</span>, </span><br><span class="line"> <span class="string">' Divorced'</span>, </span><br><span class="line"> <span class="string">' Married-spouse-absent'</span>, </span><br><span class="line"> <span class="string">' Separated'</span>, </span><br><span class="line"> <span class="string">' Married-AF-spouse'</span>, </span><br><span class="line"> <span class="string">' Widowed'</span>]</span><br><span class="line"> col_6 = [<span class="string">' Adm-clerical'</span>, </span><br><span class="line"> <span class="string">' Exec-managerial'</span>, </span><br><span class="line"> <span class="string">' Handlers-cleaners'</span>, </span><br><span class="line"> <span class="string">' Prof-specialty'</span>, </span><br><span class="line"> <span class="string">' Other-service'</span>, </span><br><span class="line"> <span class="string">' Sales'</span>, </span><br><span class="line"> <span class="string">' Craft-repair'</span>, </span><br><span class="line"> <span class="string">' Transport-moving'</span>, </span><br><span class="line"> <span class="string">' Farming-fishing'</span>, </span><br><span class="line"> <span class="string">' Machine-op-inspct'</span>, </span><br><span class="line"> <span class="string">' Tech-support'</span>, </span><br><span class="line"> <span class="string">' ?'</span>, </span><br><span class="line"> <span class="string">' Protective-serv'</span>, </span><br><span class="line"> <span class="string">' Armed-Forces'</span>, </span><br><span class="line"> <span class="string">' Priv-house-serv'</span>]</span><br><span class="line"> col_7 = [<span class="string">' Not-in-family'</span>, </span><br><span class="line"> <span class="string">' Husband'</span>, </span><br><span class="line"> <span class="string">' Wife'</span>, </span><br><span class="line"> <span class="string">' Own-child'</span>, </span><br><span class="line"> <span class="string">' Unmarried'</span>, </span><br><span class="line"> <span class="string">' Other-relative'</span>]</span><br><span class="line"> col_8 = [<span class="string">' White'</span>, </span><br><span class="line"> <span class="string">' Black'</span>, </span><br><span class="line"> <span class="string">' Asian-Pac-Islander'</span>, </span><br><span class="line"> <span class="string">' Amer-Indian-Eskimo'</span>, </span><br><span class="line"> <span class="string">' Other'</span>]</span><br><span class="line"> col_9 = [<span class="string">' Male'</span>, <span class="string">' Female'</span>]</span><br><span class="line"> col_13 = [<span class="string">' United-States'</span>, </span><br><span class="line"> <span class="string">' Cuba'</span>, </span><br><span class="line"> <span class="string">' Jamaica'</span>, </span><br><span class="line"> <span class="string">' India'</span>, </span><br><span class="line"> <span class="string">' ?'</span>, </span><br><span class="line"> <span class="string">' Mexico'</span>, </span><br><span class="line"> <span class="string">' South'</span>, </span><br><span class="line"> <span class="string">' Puerto-Rico'</span>, </span><br><span class="line"> <span class="string">' Honduras'</span>, </span><br><span class="line"> <span class="string">' England'</span>, </span><br><span class="line"> <span class="string">' Canada'</span>, </span><br><span class="line"> <span class="string">' Germany'</span>, </span><br><span class="line"> <span class="string">' Iran'</span>, </span><br><span class="line"> <span class="string">' Philippines'</span>, </span><br><span class="line"> <span class="string">' Italy'</span>, </span><br><span class="line"> <span class="string">' Poland'</span>, </span><br><span class="line"> <span class="string">' Columbia'</span>, </span><br><span class="line"> <span class="string">' Cambodia'</span>, </span><br><span class="line"> <span class="string">' Thailand'</span>, </span><br><span class="line"> <span class="string">' Ecuador'</span>, </span><br><span class="line"> <span class="string">' Laos'</span>, </span><br><span class="line"> <span class="string">' Taiwan'</span>, </span><br><span class="line"> <span class="string">' Haiti'</span>, </span><br><span class="line"> <span class="string">' Portugal'</span>, </span><br><span class="line"> <span class="string">' Dominican-Republic'</span>, </span><br><span class="line"> <span class="string">' El-Salvador'</span>, </span><br><span class="line"> <span class="string">' France'</span>, </span><br><span class="line"> <span class="string">' Guatemala'</span>, </span><br><span class="line"> <span class="string">' China'</span>, </span><br><span class="line"> <span class="string">' Japan'</span>, </span><br><span class="line"> <span class="string">' Yugoslavia'</span>, </span><br><span class="line"> <span class="string">' Peru'</span>, </span><br><span class="line"> <span class="string">' Outlying-US(Guam-USVI-etc)'</span>, </span><br><span class="line"> <span class="string">' Scotland'</span>, </span><br><span class="line"> <span class="string">' Trinadad&Tobago'</span>, </span><br><span class="line"> <span class="string">' Greece'</span>, </span><br><span class="line"> <span class="string">' Nicaragua'</span>, </span><br><span class="line"> <span class="string">' Vietnam'</span>, </span><br><span class="line"> <span class="string">' Hong'</span>, </span><br><span class="line"> <span class="string">' Ireland'</span>, </span><br><span class="line"> <span class="string">' Hungary'</span>, </span><br><span class="line"> <span class="string">' Holand-Netherlands'</span>]</span><br><span class="line"> col_14 = [<span class="string">' <=50K'</span>, <span class="string">' >50K'</span>, <span class="string">' <=50K.'</span>, <span class="string">' >50K.'</span>]</span><br><span class="line"> <span class="keyword">with</span> open(fileName) <span class="keyword">as</span> rawData:</span><br><span class="line"> rows = csv.reader(rawData, delimiter = <span class="string">","</span>)</span><br><span class="line"> <span class="keyword">for</span> r <span class="keyword">in</span> rows:</span><br><span class="line"> <span class="keyword">if</span> len(r) == <span class="number">0</span>:</span><br><span class="line"> <span class="keyword">continue</span></span><br><span class="line"> temp = []</span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> range(<span class="number">15</span>):</span><br><span class="line"> <span class="keyword">if</span> i == <span class="number">0</span>:</span><br><span class="line"> temp.append(float(r[i]))</span><br><span class="line"> <span class="keyword">if</span> i == <span class="number">1</span>:</span><br><span class="line"> cnt = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> c <span class="keyword">in</span> col_1:</span><br><span class="line"> <span class="keyword">if</span> c == r[i]:</span><br><span class="line"> temp.append(float(cnt))</span><br><span class="line"> <span class="keyword">break</span></span><br><span class="line"> cnt += <span class="number">1</span></span><br><span class="line"> <span class="keyword">if</span> i == <span class="number">2</span>:</span><br><span class="line"> temp.append(float(r[i]))</span><br><span class="line"> <span class="keyword">if</span> i == <span class="number">3</span>:</span><br><span class="line"> cnt = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> c <span class="keyword">in</span> col_3:</span><br><span class="line"> <span class="keyword">if</span> c == r[i]:</span><br><span class="line"> temp.append(float(cnt))</span><br><span class="line"> <span class="keyword">break</span></span><br><span class="line"> cnt += <span class="number">1</span></span><br><span class="line"> <span class="keyword">if</span> i == <span class="number">4</span>:</span><br><span class="line"> temp.append(float(r[i]))</span><br><span class="line"> <span class="keyword">if</span> i == <span class="number">5</span>:</span><br><span class="line"> cnt = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> c <span class="keyword">in</span> col_5:</span><br><span class="line"> <span class="keyword">if</span> c == r[i]:</span><br><span class="line"> temp.append(float(cnt))</span><br><span class="line"> <span class="keyword">break</span></span><br><span class="line"> cnt += <span class="number">1</span></span><br><span class="line"> <span class="keyword">if</span> i == <span class="number">6</span>:</span><br><span class="line"> cnt = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> c <span class="keyword">in</span> col_6:</span><br><span class="line"> <span class="keyword">if</span> c == r[i]:</span><br><span class="line"> temp.append(float(cnt))</span><br><span class="line"> <span class="keyword">break</span></span><br><span class="line"> cnt += <span class="number">1</span></span><br><span class="line"> <span class="keyword">if</span> i == <span class="number">7</span>:</span><br><span class="line"> cnt = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> c <span class="keyword">in</span> col_7:</span><br><span class="line"> <span class="keyword">if</span> c == r[i]:</span><br><span class="line"> temp.append(float(cnt))</span><br><span class="line"> <span class="keyword">break</span></span><br><span class="line"> cnt += <span class="number">1</span></span><br><span class="line"> <span class="keyword">if</span> i == <span class="number">8</span>:</span><br><span class="line"> cnt = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> c <span class="keyword">in</span> col_8:</span><br><span class="line"> <span class="keyword">if</span> c == r[i]:</span><br><span class="line"> temp.append(float(cnt))</span><br><span class="line"> <span class="keyword">break</span></span><br><span class="line"> cnt += <span class="number">1</span></span><br><span class="line"> <span class="keyword">if</span> i == <span class="number">9</span>:</span><br><span class="line"> cnt = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> c <span class="keyword">in</span> col_9:</span><br><span class="line"> <span class="keyword">if</span> c == r[i]:</span><br><span class="line"> temp.append(float(cnt))</span><br><span class="line"> <span class="keyword">break</span></span><br><span class="line"> cnt += <span class="number">1</span></span><br><span class="line"> <span class="keyword">if</span> i == <span class="number">10</span> <span class="keyword">or</span> i == <span class="number">11</span> <span class="keyword">or</span> i == <span class="number">12</span>:</span><br><span class="line"> temp.append(float(r[i]))</span><br><span class="line"> <span class="keyword">if</span> i == <span class="number">13</span>:</span><br><span class="line"> cnt = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> c <span class="keyword">in</span> col_13:</span><br><span class="line"> <span class="keyword">if</span> c == r[i]:</span><br><span class="line"> temp.append(float(cnt))</span><br><span class="line"> <span class="keyword">break</span></span><br><span class="line"> cnt += <span class="number">1</span></span><br><span class="line"> <span class="keyword">if</span> i == <span class="number">14</span>:</span><br><span class="line"> cnt = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> c <span class="keyword">in</span> col_14:</span><br><span class="line"> <span class="keyword">if</span> c == r[i]:</span><br><span class="line"> _y.append(float(cnt % <span class="number">2</span>))</span><br><span class="line"> <span class="keyword">break</span></span><br><span class="line"> cnt += <span class="number">1</span></span><br><span class="line"> _x.append(temp)</span><br></pre></td></tr></table></figure></p>
</blockquote>
<h2 id="模型定义"><a href="#模型定义" class="headerlink" title="模型定义"></a>模型定义</h2><h3 id="Feature-Scaling"><a href="#Feature-Scaling" class="headerlink" title="Feature Scaling"></a>Feature Scaling</h3><blockquote>
<p>4种 Feature Scaling的方法</p>
</blockquote>
<h4 id="Standardization"><a href="#Standardization" class="headerlink" title="Standardization"></a>Standardization</h4><figure class="highlight python"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">standardization</span><span class="params">(dataMatrix)</span>:</span> <span class="comment">#🤩🤩🤩</span></span><br><span class="line"> <span class="keyword">if</span> dataMatrix.shape[<span class="number">0</span>] == <span class="number">0</span>:</span><br><span class="line"> <span class="keyword">return</span> dataMatrix</span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> range(dataMatrix.shape[<span class="number">1</span>]):</span><br><span class="line"> sum = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> _x <span class="keyword">in</span> dataMatrix:</span><br><span class="line"> sum += _x[i]</span><br><span class="line"> mean = sum / dataMatrix.shape[<span class="number">0</span>]</span><br><span class="line"> SD = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> _x <span class="keyword">in</span> dataMatrix:</span><br><span class="line"> SD += (_x[i] - mean)**<span class="number">2</span></span><br><span class="line"> SD = np.sqrt(SD / dataMatrix.shape[<span class="number">0</span>])</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">for</span> _x <span class="keyword">in</span> dataMatrix:</span><br><span class="line"> _x[i] = (_x[i] - mean) / SD</span><br><span class="line"> <span class="keyword">return</span> dataMatrix</span><br></pre></td></tr></table></figure>
<h4 id="Mean-Normalization"><a href="#Mean-Normalization" class="headerlink" title="Mean Normalization"></a>Mean Normalization</h4><figure class="highlight python"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">meanNormalization</span><span class="params">(dataMatrix)</span>:</span></span><br><span class="line"> <span class="keyword">if</span> dataMatrix.shape[<span class="number">0</span>] == <span class="number">0</span>:</span><br><span class="line"> <span class="keyword">return</span> dataMatrix</span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> range(dataMatrix.shape[<span class="number">1</span>]):</span><br><span class="line"> sum = <span class="number">0</span></span><br><span class="line"> max = <span class="number">0</span></span><br><span class="line"> min = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> data <span class="keyword">in</span> dataMatrix:</span><br><span class="line"> sum += data[i]</span><br><span class="line"> <span class="keyword">if</span> data[i] > max:</span><br><span class="line"> max = data[i]</span><br><span class="line"> <span class="keyword">if</span> data[i] < min:</span><br><span class="line"> min = data[i]</span><br><span class="line"> mean = sum / dataMatrix.shape[<span class="number">0</span>]</span><br><span class="line"> <span class="keyword">if</span> (max - min) != <span class="number">0</span>:</span><br><span class="line"> <span class="keyword">for</span> data <span class="keyword">in</span> dataMatrix:</span><br><span class="line"> data[i] = (data[i] - mean) / (max - min)</span><br><span class="line"> <span class="keyword">return</span> dataMatrix</span><br></pre></td></tr></table></figure>
<h4 id="Rescaling"><a href="#Rescaling" class="headerlink" title="Rescaling"></a>Rescaling</h4><figure class="highlight python"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">rescaling</span><span class="params">(dataMatrix)</span>:</span></span><br><span class="line"> <span class="keyword">if</span> dataMatrix.shape[<span class="number">0</span>] == <span class="number">0</span>:</span><br><span class="line"> <span class="keyword">return</span> dataMatrix</span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> range(dataMatrix.shape[<span class="number">1</span>]):</span><br><span class="line"> max = <span class="number">0</span></span><br><span class="line"> min = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> data <span class="keyword">in</span> dataMatrix:</span><br><span class="line"> <span class="keyword">if</span> data[i] > max:</span><br><span class="line"> max = data[i]</span><br><span class="line"> <span class="keyword">if</span> data[i] < min:</span><br><span class="line"> min = data[i]</span><br><span class="line"> <span class="keyword">if</span> max - min != <span class="number">0</span>:</span><br><span class="line"> <span class="keyword">for</span> data <span class="keyword">in</span> dataMatrix:</span><br><span class="line"> data[i] = (data[i] - min) / (max - min)</span><br><span class="line"> <span class="keyword">return</span> dataMatrix</span><br></pre></td></tr></table></figure>
<blockquote>
<p>使用以上某一种方法对数据进行Feature Scaling<br>经过测试,Standardization的效果最好,其次是 Mean Normalization </p>
</blockquote>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line">trainX = []</span><br><span class="line">trainY = []</span><br><span class="line">testX = []</span><br><span class="line">testY = []</span><br><span class="line">loadData(trainX, trainY, <span class="string">'train.csv'</span>)</span><br><span class="line">loadData(testX, testY, <span class="string">'test.csv'</span>)</span><br><span class="line">trainX = np.array(trainX)</span><br><span class="line">trainY = np.array(trainY)</span><br><span class="line">testX = np.array(testX)</span><br><span class="line">testY = np.array(testY)</span><br><span class="line">trainX = standardization(trainX) <span class="comment">#Test accuracy: 0.8504391622392502</span></span><br><span class="line">testX = standardization(testX)</span><br><span class="line"><span class="comment"># trainX = meanNormalization(trainX) #Test accuracy: 0.8516675879786739</span></span><br><span class="line"><span class="comment"># testX = meanNormalization(testX)</span></span><br><span class="line"><span class="comment"># trainX = rescaling(trainX) #Test accuracy: 0.8477366255400303</span></span><br><span class="line"><span class="comment"># testX = rescaling(testX)</span></span><br></pre></td></tr></table></figure>
<h3 id="训练"><a href="#训练" class="headerlink" title="训练"></a>训练</h3><h4 id="设计模型"><a href="#设计模型" class="headerlink" title="设计模型"></a>设计模型</h4><blockquote>
<p>使用两层layers 激活函数 ReLU <del>(就乱设计的)</del> 😅<br>Loss Function 用的是Cross Entropy, 因为使用了ReLU用Square Error会有许多的地方没有梯度,很尴尬🥵</p>
</blockquote>
<figure class="highlight python"><table><tr><td class="code"><pre><span class="line">model = tf.keras.Sequential([</span><br><span class="line"> keras.layers.Dense(<span class="number">28</span>, activation=tf.nn.relu),</span><br><span class="line"> keras.layers.Dense(<span class="number">14</span>, activation=tf.nn.relu),</span><br><span class="line"> keras.layers.Dense(<span class="number">2</span>,activation=tf.nn.softmax)</span><br><span class="line">])</span><br><span class="line">model.compile(optimizer=tf.train.AdamOptimizer(), </span><br><span class="line"> loss=<span class="string">'sparse_categorical_crossentropy'</span>, </span><br><span class="line"> metrics=[<span class="string">'accuracy'</span>])</span><br></pre></td></tr></table></figure>
<blockquote>
<p>训练时的精准度:</p>
</blockquote>
<p><img src="/images/Winner/acc.png" alt="acc"></p>
<blockquote>
<p>测试精准度:</p>
</blockquote>
<figure class="highlight plain"><table><tr><td class="code"><pre><span class="line">16281/16281 [==============================] - 0s 25us/step</span><br><span class="line">Test loss: 0.3315070253370084 Test accuracy: 0.8466310422863189</span><br></pre></td></tr></table></figure>
<h2 id="负优化"><a href="#负优化" class="headerlink" title="负优化"></a>负优化</h2><h3 id="取消Feature-Scaling"><a href="#取消Feature-Scaling" class="headerlink" title="取消Feature Scaling"></a>取消Feature Scaling</h3><blockquote>
<p>很真实 🥵</p>
</blockquote>
<p><img src="/images/Winner/nonfs.png" alt="non-fs"></p>
<figure class="highlight plain"><table><tr><td class="code"><pre><span class="line">16281/16281 [==============================] - 0s 24us/step</span><br><span class="line">Test loss: 12.310577790542826 Test accuracy: 0.23622627602315244</span><br></pre></td></tr></table></figure>
<h3 id="增加layers"><a href="#增加layers" class="headerlink" title="增加layers"></a>增加layers</h3><figure class="highlight python"><table><tr><td class="code"><pre><span class="line">model = tf.keras.Sequential([</span><br><span class="line"> keras.layers.Dense(<span class="number">28</span>, activation=tf.nn.relu),</span><br><span class="line"> keras.layers.Dense(<span class="number">28</span>, activation=tf.nn.relu),</span><br><span class="line"> keras.layers.Dense(<span class="number">14</span>, activation=tf.nn.relu),</span><br><span class="line"> keras.layers.Dense(<span class="number">2</span>,activation=tf.nn.softmax)</span><br><span class="line">])</span><br></pre></td></tr></table></figure>
<blockquote>
<p>没用🤦 🤦 🤷 🤷</p>
</blockquote>
<p><img src="/images/Winner/more.png" alt="more layer"></p>
<figure class="highlight plain"><table><tr><td class="code"><pre><span class="line">16281/16281 [==============================] - 0s 26us/step</span><br><span class="line">Test loss: 0.33134235454223626 Test accuracy: 0.8481665745274117</span><br></pre></td></tr></table></figure>
<p>完结🎉🎉🎉</p>
<h2 id="Repo"><a href="#Repo" class="headerlink" title="Repo"></a>Repo</h2><p><a href="https://github.com/VIXNESS/machine-learning-course/tree/master/winner" target="_blank" rel="noopener">>50K?</a></p>
]]></content>
<categories>
<category>machine learning</category>
</categories>
<tags>
<tag>machine learning</tag>
<tag>deep learning</tag>
</tags>
</entry>
<entry>
<title>Java Note</title>
<url>/2020/07/31/Java-Note/</url>
<content><![CDATA[<h1 id="Java-Note"><a href="#Java-Note" class="headerlink" title="Java Note"></a>Java Note</h1><h2 id="I-虚拟机内存布局"><a href="#I-虚拟机内存布局" class="headerlink" title="I 虚拟机内存布局"></a>I 虚拟机内存布局</h2><h3 id="1-JVM运行时-内存布局"><a href="#1-JVM运行时-内存布局" class="headerlink" title="1. JVM运行时 内存布局"></a>1. JVM运行时 内存布局</h3><table>
<thead>
<tr>
<th>区域</th>
<th>作用</th>
<th>线程内存共享</th>
<th>异常</th>
<th>异常原因</th>
</tr>
</thead>
<tbody>
<tr>
<td>方法区</td>
<td>存放类型信息, 常量, 静态变量, JIT代码缓存.</td>
<td>线程共享</td>
<td>OOM</td>
<td>JVM规范规定, 无法满足内存分配时可抛出.</td>
</tr>
<tr>
<td>虚拟机栈</td>
<td>存储局部变量, 操作数栈, 动态链接, 接口等.</td>
<td>线程私有</td>
<td>OOM, SO</td>
<td>SO: 栈深度超出JVM允许范围;<br>OOM: 栈扩展失败导致的内存分配不足.</td>
</tr>
<tr>
<td>本地方法栈</td>
<td>与虚拟机栈功能相同, 但是对象是本地方法(Native Method).</td>
<td>线程私有</td>
<td>OOM, SO</td>
<td>同上.</td>
</tr>
<tr>
<td>堆</td>
<td>绝大部分对象在此存储(除了栈上分配和标量替换的对象).</td>
<td>共享/私有(TLAB)</td>
<td>OOM</td>
<td>堆中无法完成实例分配时抛出.</td>
</tr>
<tr>
<td>PC计数器</td>
<td>下一条指令的地址</td>
<td>线程私有</td>
<td>无</td>
<td>无</td>
</tr>
</tbody>
</table><a id="more"></a>
<blockquote>
<p>OOM: OutOfMemoryError</p>
<p>SO: StackOverFlowError</p>
<p>TLAB: 线程私有分配缓冲区 Thread Local Allocation Buffer</p>
</blockquote>
<h3 id="2-对象内存布局"><a href="#2-对象内存布局" class="headerlink" title="2. 对象内存布局"></a>2. 对象内存布局</h3><p>堆中对象的布局: 对象头 + 数据 + 对其填充.</p>
<p>对象头: 锁信息, GC信息, 偏向信息.</p>
<h3 id="3-对象访问"><a href="#3-对象访问" class="headerlink" title="3. 对象访问"></a>3. 对象访问</h3><h4 id="3-1-句柄访问"><a href="#3-1-句柄访问" class="headerlink" title="3.1 句柄访问"></a>3.1 句柄访问</h4><p><img src="/images/Java_Note/0.png" alt="句柄"> </p>
<h4 id="3-2-指针访问"><a href="#3-2-指针访问" class="headerlink" title="3.2 指针访问"></a>3.2 指针访问</h4><p><img src="/images/Java_Note/1.png" alt="指针"> </p>
<h2 id="II-垃圾回收策略"><a href="#II-垃圾回收策略" class="headerlink" title="II 垃圾回收策略"></a>II 垃圾回收策略</h2><h3 id="1-回收理论"><a href="#1-回收理论" class="headerlink" title="1. 回收理论"></a>1. 回收理论</h3><h4 id="对象存活判定"><a href="#对象存活判定" class="headerlink" title="对象存活判定"></a>对象存活判定</h4><p>应回收对象的判断: 可达性分析.</p>
<blockquote>
<p>可达性分析:</p>
<p>以一些可作为GC Root的对象为根的一棵由引用关系建立起来的树, 当树根死去, 整个棵树上的对象都将不可达.</p>
<p>什么对象可作为GC Root:</p>
<ul>
<li>虚拟栈中引用的对象</li>
<li>方法区中静态变量所引用的变量</li>
<li>常量引用的对象</li>
<li>JVM内部引用(例如一些Class对象, 异常对象, 类加载器)</li>
<li>synchronized持有的对象</li>
</ul>
</blockquote>
<h4 id="分代假说"><a href="#分代假说" class="headerlink" title="分代假说"></a>分代假说</h4><ul>
<li>大多数对象都是朝生夕灭的.</li>
<li>几次GC之后仍然存活的对象是不朽的.</li>
<li>跨代引用(新生代, 老年代之间的引用) 是极少的.</li>
</ul>
<p>所以将需要GC的区域分为Eden区和Survivors区.</p>
<p>新建立的对象放在Eden区.</p>
<p>经过几代GC仍然存活的对象放入Survivors区.</p>
<h4 id="回收算法"><a href="#回收算法" class="headerlink" title="回收算法"></a>回收算法</h4><ol>
<li>标记 - 清除: (产生大量碎片)<ol>
<li>先标记需要回收的对象.</li>
<li>清除对象.</li>
</ol>
</li>
<li>标记 - 复制: (将区域划分为两个区域, 回收时从一个区域把存活对象复制到另一个区域, 并且紧密排列)<ol>
<li>先标记要回收的对象.</li>
<li>复制存活到对象到另一个半区.</li>
</ol>
</li>
<li>标记 - 整理: (移动每次都存活的老年代代价是非常高的, 而且需要暂停用户线程)<ol>
<li>先标记要回收的对象.</li>
<li>将存活的对象进行整理, 使其紧密排列.</li>
</ol>
</li>
</ol>
<h3 id="2-回收时机"><a href="#2-回收时机" class="headerlink" title="2. 回收时机"></a>2. 回收时机</h3><p>强引用: 一般的声明的对象, 只要关系存在, 不会被回收.</p>
<p>软引用: 非必须对象, SoftReference, 在内存溢出之前优先回收.</p>
<p>弱引用: 非必须对象, WeakReference, 一旦触发GC就立刻回收.</p>
<p>虚引用: 不影响对象的声明走起, PhantomReference, 无法取得实例数据, 只能用于接受GC回收的通知.</p>
<p>Minor GC触发: 实例分配失败, Eden区满了.</p>
<p>Full GC触发: <code>System.gc()</code> , heap dump, Survivor区的大小无法满足Minor GC后的对象.</p>
<blockquote>
<p>Minor GC: 只回收新生代</p>
</blockquote>
<h3 id="3-回收器例举"><a href="#3-回收器例举" class="headerlink" title="3. 回收器例举"></a>3. 回收器例举</h3><h4 id="3-1-Serial-ParNew"><a href="#3-1-Serial-ParNew" class="headerlink" title="3.1 Serial / ParNew"></a>3.1 Serial / ParNew</h4><p>新生代: 标记 - 复制</p>
<p>老年代: 标记 - 清理</p>
<h4 id="3-2-CMS-注重响应时间-停顿时间短"><a href="#3-2-CMS-注重响应时间-停顿时间短" class="headerlink" title="3.2 CMS(注重响应时间, 停顿时间短)"></a>3.2 CMS(注重响应时间, 停顿时间短)</h4><p>使用标记 - 清除算法. </p>
<ol>
<li>初始标记: 标记GC Root直接关联的对象. (需要Stop The World)</li>
<li>并发标记: 从GC Root开始遍历对象图. </li>
<li>重新标记: 修正并发标记时对象的变动. (需要Stop The World)</li>
<li>并发清除. <strong>(Major GC)</strong></li>
</ol>
<blockquote>
<ul>
<li>CMS无法处理<strong>浮动垃圾</strong>, 因而引发一次Full GC. (浮动垃圾是指CMS在并发清理的时候, 伴随着产生的新垃圾对象.)</li>
<li>CMS收集完成后, 内存空间有碎片存在.</li>
</ul>
</blockquote>
<h4 id="3-3-G1-Mixed-GC"><a href="#3-3-G1-Mixed-GC" class="headerlink" title="3.3 G1 (Mixed GC)"></a>3.3 G1 (Mixed GC)</h4><blockquote>
<p> G1将空间分为大小不一的Region, 每个Region可作为Eden, Survivor空间.</p>
</blockquote>
<p>GC步骤:</p>
<ol>
<li>初始标记: 同CMS(需要Stop The World)</li>
<li>并发标记: 同CMS</li>
<li>最终标记: 修正并发标记时对象的变动. (需要Stop The World)</li>
<li>筛选回收: 根据Region的统计数据对Region进行回收</li>
</ol>
<h4 id="3-4-ZGC"><a href="#3-4-ZGC" class="headerlink" title="3.4 ZGC"></a>3.4 ZGC</h4><blockquote>
<p>与G1一样, 也是使用大小不同的Region拆分回收区域.</p>
<p>并且使用染色指针技术.(具体原理略)</p>
</blockquote>
<p>GC步骤:</p>
<ol>
<li>并发标记: 同CMS</li>
<li>并发预备重分配: 根据查询条件统计出需要回收的Region.</li>
<li>并发重分配: 把集中存活的对象复制到新Region上.</li>
<li>并发重映射: 修正堆中指向旧对象的引用.</li>
</ol>
<h2 id="III-类加载机制"><a href="#III-类加载机制" class="headerlink" title="III 类加载机制"></a>III 类加载机制</h2><h3 id="1-类加载步骤"><a href="#1-类加载步骤" class="headerlink" title="1. 类加载步骤"></a>1. 类加载步骤</h3><ol>
<li><p>加载: 从不同的源头读取二进制字节流, 并且创建Class对象.</p>
</li>
<li><p>验证: 验证文件.</p>
</li>
<li><p>准备: 为静态变量设定初始值(不是赋语句中定义的值, 而是各类型的默认值, 例如int的默认值是0). 初始在方法区.</p>
</li>
<li><p>解析: 将常量池中的符号引用替换为直接引用. (包括类、接口的解析, 字段的解析, 方法的解析, 接口方法的解析)</p>
</li>
<li><p>初始化: 初始化变量, 赋值实际的值给变量. 执行类构造器\<clinit>().</p>
</li>
</ol>
<h3 id="2-类加载器"><a href="#2-类加载器" class="headerlink" title="2. 类加载器"></a>2. 类加载器</h3><blockquote>
<p>不同类加载器加载的同一个类在JVM中不视为同一个类.</p>
</blockquote>
<h4 id="类加载机制"><a href="#类加载机制" class="headerlink" title="类加载机制"></a>类加载机制</h4><p><strong>双亲委派模型</strong> (递归加载)</p>
<ol>
<li>遇到要加载的类优先交由父类加载.</li>
<li>父类无法加载的类再由子类加载.</li>
</ol>
<p><strong>非双亲委派模型</strong></p>
<ol>
<li>线程上下文加载器: 可由父类请求子类类加载器加载.</li>
<li>模块化系统: 若可以找到对应的模块类加载器, 优先派发给其加载. 否则交由父类加载.</li>
</ol>
<h3 id="3-反射系统"><a href="#3-反射系统" class="headerlink" title="3. 反射系统"></a>3. 反射系统</h3><p>反射系统主要是基于Class对象的操作.</p>
<p>当类加载器从不同渠道加载完Java类后, 都会形成这个类加载器下一一对应的Class对象(当然, 不同的类加载器加载的同一个类不是同一个Class对象).</p>
<h2 id="IV-多线程并发"><a href="#IV-多线程并发" class="headerlink" title="IV 多线程并发"></a>IV 多线程并发</h2><h3 id="1-内存模型"><a href="#1-内存模型" class="headerlink" title="1. 内存模型"></a>1. 内存模型</h3><p>JVM主要解决各个线程的工作内存与主存之间的数据一致性问题.</p>
<p><img src="/images/Java_Note/mermaid-diagram-20200731122443.png" alt="内存"> </p>
<h4 id="一致性协议具体内容"><a href="#一致性协议具体内容" class="headerlink" title="一致性协议具体内容"></a>一致性协议具体内容</h4><ul>