You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I really like this work. The paper is very precise and readable. But I am still curious about some details about computing potential functions.
To my understanding, if the model learns well, sum_s psi_{st}(y_s,y_t) will be equal to psi_s(y_s) the model learns. I notice that in this implementation, when computing edge's potential function, the denominator is computed by sum_s = torch.sum(logits, dim=2).unsqueeze(2) + eps, sum_t = torch.sum(logits, dim=1).unsqueeze(1) + eps instead of by using pred_node. So here I am curious that have you tested using pred_node instead? If yes, will the performance be sensitive to this?
And I notice here the aforementioned denominator has been scaled by norm_coef. Since I find the denominator will sometimes be a very small value in log-space. I wonder whether the model is sensitive to this hyper-parameter? If yes, do you think it is caused by some numerical stability issues, or just by the model's ability to learn this probability since sometimes the graph is sparse?
thks in advance :)
The text was updated successfully, but these errors were encountered:
For psi_s(y_s), we actually tried both options, i.e., (1) directly using pred_node or (2) using sum_s psi_{st}(y_s,y_t). These two options yielded close results, and we used option (2) in the model.
You are right that the denominator will sometimes be very small in log-space. This is because sum_s and sum_t in the denominator tend to be a one-hot vector (i.e., one dimension close to 1 and others close 0), and hence we would obtain very small values after taking logarithm. These small values might cause numerical stability issues. To address the issues, we tried a few options, i.e., (1) adding a hyperparameter norm_coef as what we did in the current codes, (2) using a larger eps to make sum_s and sum_t smoother, (3) adding an annealing temperature to make sum_s and sum_t smoother. These options also yielded similar results and we picked up option (1) because of its simplicity. In this case, the results are quite sensitive to norm_coef.
Thank you again for the interest, and let me know if there is any further question.
Hi! I really like this work. The paper is very precise and readable. But I am still curious about some details about computing potential functions.
sum_s = torch.sum(logits, dim=2).unsqueeze(2) + eps, sum_t = torch.sum(logits, dim=1).unsqueeze(1) + eps
instead of by usingpred_node
. So here I am curious that have you tested usingpred_node
instead? If yes, will the performance be sensitive to this?norm_coef
. Since I find the denominator will sometimes be a very small value in log-space. I wonder whether the model is sensitive to this hyper-parameter? If yes, do you think it is caused by some numerical stability issues, or just by the model's ability to learn this probability since sometimes the graph is sparse?thks in advance :)
The text was updated successfully, but these errors were encountered: