Hi, in the paper, there is a proof that when T goes to infinity, the estimate of conditional mutual information approaches to the real value of conditional mutual information of output y and parameters w. I wonder that why is this necessary? If I can derive an equation which is a proportional of conditional mutual information, can I use it to measure the uncertainty in the view of BALD? Why or why not?
Thanks!