Open
Description
Hi! I really like this work. The paper is very precise and readable. But I am still curious about some details about computing potential functions.
- To my understanding, if the model learns well, sum_s psi_{st}(y_s,y_t) will be equal to psi_s(y_s) the model learns. I notice that in this implementation, when computing edge's potential function, the denominator is computed by
sum_s = torch.sum(logits, dim=2).unsqueeze(2) + eps, sum_t = torch.sum(logits, dim=1).unsqueeze(1) + eps
instead of by usingpred_node
. So here I am curious that have you tested usingpred_node
instead? If yes, will the performance be sensitive to this? - And I notice here the aforementioned denominator has been scaled by
norm_coef
. Since I find the denominator will sometimes be a very small value in log-space. I wonder whether the model is sensitive to this hyper-parameter? If yes, do you think it is caused by some numerical stability issues, or just by the model's ability to learn this probability since sometimes the graph is sparse?
thks in advance :)
Metadata
Metadata
Assignees
Labels
No labels