Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Classification Example in Forte Pipeline using CNN Classifier and Bert Classifier #336

Open
wants to merge 57 commits into
base: master
Choose a base branch
from
Open

A Classification Example in Forte Pipeline using CNN Classifier and Bert Classifier #336

wants to merge 57 commits into from

Conversation

ziqian98
Copy link
Collaborator

@ziqian98 ziqian98 commented Dec 11, 2020

This PR fixes [https://github.com//issues/328].

Description of changes

Provide Forte with a classification example using CNN
Provide Forte with a classification example using CNN
Merge the data augmentation features in Reader currently

Possible influences of this PR.

  1. Using the reader to parse the dataset as Sentence Datapack and Token Datapack
  2. Using data augmentation processor to process the initial Datapack from step 1
  3. Using the AttributeExtractor on the input to Build the word embedding table from a Token level and using the AttributeExtractor on the output to get the label information from the Sentence Level.
  4. Apply a Conv classifier and word embedder from the Texar for the sentiment classification task.
    For the Conv classifier:
    https://github.com/asyml/texar-pytorch/blob/master/texar/torch/modules/classifiers/conv_classifiers.py
  5. Apply a Bert classifier from the Texar for the sentiment classification task.
    For the Bert classifier:
    https://github.com/asyml/texar-pytorch/blob/master/texar/torch/modules/classifiers/bert_classifier.py

Test Conducted

Describe what test cases are included for the PR.
CNN version can train and predict correctly locally.
Bert version can train and predict correctly locally.
CNN version with data augmentation can train and predict correctly locally.

@ziqian98 ziqian98 marked this pull request as draft December 11, 2020 09:05
@ziqian98 ziqian98 changed the title A Classification Example in Forte Pipeline using CNN classifier A Classification Example in Forte Pipeline using CNN Classifier and Bert Classifier Dec 16, 2020
@ziqian98
Copy link
Collaborator Author

ziqian98 commented Dec 16, 2020

For the classification task using Conv_classifier:

  1. run main_train_cnn.py to train
  2. run main_predict_cnn.py to predict
  3. model is defined in cnn.py

For the classification task using Bert_classifier:

  1. run main_train_bert.py to train
  2. run main_predict_bert.py to predict

For the classification task using Conv_classifier and merging the data_augmentation feature:

  1. run main_train_cnn_data_augmentation.py to train
  2. run main_predict_cnn.py to predict
  3. model is defined in cnn.py
  4. in imdb_reader_data_augmentation.py, DataAugmentProcessor is added for yielding original datapack and augmented datapack.

@ziqian98 ziqian98 marked this pull request as ready for review December 16, 2020 04:40
Copy link
Member

@hunterhector hunterhector left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. You will need a readme to tell people what is this example and how to use this.
  2. Please add models to download and evaluation results, please include these in the readme.
  3. Please clean out your code, if you commented out lines, remove them. If you have functions that you reuse, create a utility.
  4. Don't randomly print stuff, it will pollute the user's terminal. If you really want to show something, use logging, where users can suppress.
  5. Don't call your folder Classification_new.
  6. The whole train cnn and train bert are just dupcliates of each other. Please don't repeat yourself, just make an example that work for both. Same apply for the test file.
  7. The same for the data augmentation variant and the normal variant, why are you copying everything?
  8. You are using the augmentation processors wrong. Why are you copying the reader implementation again?

examples/Classification_new/main_predict_bert.py Outdated Show resolved Hide resolved
examples/Classification_new/main_predict_bert.py Outdated Show resolved Hide resolved
tests/forte/data/readers/imdb_reader_test.py Outdated Show resolved Hide resolved
examples/Classification_new/main_predict_bert.py Outdated Show resolved Hide resolved
examples/Classification_new/main_predict_bert.py Outdated Show resolved Hide resolved
examples/Classification_new/main_predict_bert.py Outdated Show resolved Hide resolved
examples/Classification_new/main_predict_cnn.py Outdated Show resolved Hide resolved
examples/Classification_new/main_train_bert.py Outdated Show resolved Hide resolved
examples/Classification_new/main_train_bert.py Outdated Show resolved Hide resolved
@ziqian98 ziqian98 marked this pull request as draft December 20, 2020 01:31
Copy link
Collaborator

@jasonyanwenl jasonyanwenl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, you missed the type hint for many variables.

@ziqian98 ziqian98 requested a review from hunterhector January 6, 2021 09:51
examples/classification_example/cnn.py Outdated Show resolved Hide resolved
examples/classification_example/main_train.py Show resolved Hide resolved
examples/classification_example/main_train.py Outdated Show resolved Hide resolved
examples/classification_example/main_train.py Outdated Show resolved Hide resolved
examples/classification_example/main_train.py Outdated Show resolved Hide resolved
examples/classification_example/main_train.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants