Good work, enjoy reading it. And some questions about the deatils in implementation

Hi! I really like this work. The paper is very precise and readable. But I am still curious about some details about computing potential functions.
1. To my understanding, if the model learns well, sum_s psi_{st}(y_s,y_t) will be equal to psi_s(y_s) the model learns. I notice that in this implementation, when computing edge's potential function, the denominator is computed by `sum_s = torch.sum(logits, dim=2).unsqueeze(2) + eps, sum_t = torch.sum(logits, dim=1).unsqueeze(1) + eps` instead of by using `pred_node`. So here I am curious that have you tested using `pred_node` instead? If yes, will the performance be sensitive to this?
2. And I notice here the aforementioned denominator has been scaled by `norm_coef`. Since I find the denominator will sometimes be a very small value in log-space. I wonder whether the model is sensitive to this hyper-parameter? If yes, do you think it is caused by some numerical stability issues, or just by the model's ability to learn this probability since sometimes the graph is sparse?
thks in advance :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Good work, enjoy reading it. And some questions about the deatils in implementation #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Good work, enjoy reading it. And some questions about the deatils in implementation #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions