I have notice that the value range of each feature map (i.e. channel) from certain layer of VGG19 are quite different, maximum of some feature maps are hundreds and others are less than 1. Therefore quite a few feature maps are useless because of small value.
To handle this problem, I tried to apply instance normalization (instead of the original normalization method used in the paper) to each channel (feature map), and it seems work well.