Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate ATE and ASC separetly #35

Open
yassmine-lam opened this issue Sep 5, 2021 · 4 comments
Open

Evaluate ATE and ASC separetly #35

yassmine-lam opened this issue Sep 5, 2021 · 4 comments

Comments

@yassmine-lam
Copy link

Hi,

Thank u for sharing ur code with us.
I have a question about predictions. As I read in ur paper, u reported a single result for these two tasks. However, is it possible to return evaluation scores for each task separately (As u did in this code https://github.com/lixin4ever/E2E-TBSA)? So we can compare this model against single-task models?

Thank u

@lixin4ever
Copy link
Owner

In this repo, we propose to handle the E2E-ABSA problem using a sequence tagging model. Since ATE can be formulated as a sequence tagging task, you can evaluate the ATE performance by simply degrading the predicted tags and the gold standard tags of the E2E-ABSA task. However, as ASC is a typical classification task, evaluating its performance in our sequence tagging model is not that straightforward.

@yassmine-lam
Copy link
Author

Thank u for ur reply. Yes, you are right, but I was wondering how did u do to report two results, one for ATE and the other for ASC in this repo https://github.com/lixin4ever/E2E-TBSA). As I understood, u separated the tags of each task and evaluated their results separately? Is that true? If yes, it is possible to do the same here, isn't it?

Sorry, I feel confused, so If u could explain this to me, I will be grateful.

Thank u

@lixin4ever
Copy link
Owner

First of all, I want to clarify that ASC, a typical classification task, is different from E2E-ABSA (or "targeted sentiment analysis", "E2E-TBSA"), which is formulated as a sequence tagging task in our paper. In this issue, I mistakenly told you that "targeted sentiment analysis" is equivalent to "aspect sentiment classification" and I think that's the point leading to part of your confusion (sorry for the misinformation).

Return to your question, in another work, namely https://github.com/lixin4ever/E2E-TBSA, the reason we can report the results of ATE and E2E-ABSA is that it is a multi-task learning framework, and ATE predictions are explicitly provided. In order to report the ATE performance in this repo, you may need to degrade the predicted/gold tags of E2E-ABSA, i.e., only preserve the boundary tag and ignore the sentiment tag, and then do evaluation.

@yassmine-lam
Copy link
Author

Thank u very much for the detailed answer; it is more clear to me now. I really appreciate your effort and time in answering our questions. Another question plz. For a sequence labeling model, people are generally using seqevel for evaluation, but in ur code, u used sklearn metrics; what is the difference between these two frameworks? And what is the most suitable one for this task (E2E-ABSA)?

Thank u in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants