Hi, Yanchao!
You mentioned in the appendix of your paper that when ψ imposes no compression on I, training the exponential loss leads to an estimation of P(f|I). But I wonder why it is compression. In other words, why you part the network to an encoder and a decoder? I would be grateful if you can share some knowledge about the proposal of CPN. I like the probability model very much. It is very logical.
Hi, Yanchao!
You mentioned in the appendix of your paper that when ψ imposes no compression on I, training the exponential loss leads to an estimation of P(f|I). But I wonder why it is compression. In other words, why you part the network to an encoder and a decoder? I would be grateful if you can share some knowledge about the proposal of CPN. I like the probability model very much. It is very logical.