Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For cases like O->I, should I manually set the corresponding entries in the transition probability matrix to zero? #38

Open
lkqnaruto opened this issue Dec 4, 2021 · 4 comments

Comments

@lkqnaruto
Copy link

Hi

Again, thank you for the amazing work! I wonder in NER task, for cases like O->I, Should I manually set the corresponding entries in the transition probability matrix to zero? I went through the pytorch-crf code, and didn't see such settings.

Thanks in advance!

@fabiocapsouza
Copy link
Contributor

Hi @lkqnaruto ,
Yes, you can do that if you want to initialize the CRF layer with such constraints, but the pytorch-crf library does not expose an API for that, so you'll have to modify crf.transitions yourself and set them to negative values such as -1e5. Please let me know of your results if you try it, I thought about doing it before but I didn't have time :)

I wouldn't recommend putting these constraints on start_transitions though, because, for long documents that will be broken into smaller spans, there can be spans that start with I- tags.

@lkqnaruto
Copy link
Author

Hi @lkqnaruto , Yes, you can do that if you want to initialize the CRF layer with such constraints, but the pytorch-crf library does not expose an API for that, so you'll have to modify crf.transitions yourself and set them to negative values such as -1e5. Please let me know of your results if you try it, I thought about doing it before but I didn't have time :)

I wouldn't recommend putting these constraints on start_transitions though, because, for long documents that will be broken into smaller spans, there can be spans that start with I- tags.

Thank you for the quick reply. Do you think putting such constraint on CRF layer could improve the model performance than without it? It looks like without setting the constraint, the currently model is actually "LEARNING" such constraint by itself.

@fabiocapsouza
Copy link
Contributor

Yes, I believe so. I see it as a form of model initialization similar to adjusting the bias terms of a classification layer to produce the prior probabilities of the classes on the dataset (see init well), which is an good practice.

It could make the training easier, faster to converge, etc. But it does not necessarily improve the model performance.

@lkqnaruto
Copy link
Author

Yes, I believe so. I see it as a form of model initialization similar to adjusting the bias terms of a classification layer to produce the prior probabilities of the classes on the dataset (see init well), which is an good practice.

It could make the training easier, faster to converge, etc. But it does not necessarily improve the model performance.

I was trying to pub the above constraint to the transition matrix, but It confused me a little bit about the index of transition matrix. Basically I was using BIO scheme: B: 0, I: 1, O: 2

So the code I was trying to modify is:

    def reset_parameters(self) -> None:
        """Initialize the transition parameters.

        The parameters will be initialized randomly from a uniform distribution
        between -0.1 and 0.1.
        """
        nn.init.uniform_(self.start_transitions, -0.1, 0.1)
        nn.init.uniform_(self.end_transitions, -0.1, 0.1)
        nn.init.uniform_(self.transitions, -0.1, 0.1)
        self.transitions[2, 1] = -1e5                           # this is the corresponding entry for O->I

or it should be:

 self.transitions[1, 2] = -1e5

which one is the correct way to implement?

Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants