A Classification Example in Forte Pipeline using CNN Classifier and Bert Classifier #336

ziqian98 · 2020-12-11T08:06:40Z

This PR fixes [https://github.com//issues/328].

Description of changes

Provide Forte with a classification example using CNN
Provide Forte with a classification example using CNN
Merge the data augmentation features in Reader currently

Possible influences of this PR.

Using the reader to parse the dataset as Sentence Datapack and Token Datapack
Using data augmentation processor to process the initial Datapack from step 1
Using the AttributeExtractor on the input to Build the word embedding table from a Token level and using the AttributeExtractor on the output to get the label information from the Sentence Level.
Apply a Conv classifier and word embedder from the Texar for the sentiment classification task.
For the Conv classifier:
https://github.com/asyml/texar-pytorch/blob/master/texar/torch/modules/classifiers/conv_classifiers.py
Apply a Bert classifier from the Texar for the sentiment classification task.
For the Bert classifier:
https://github.com/asyml/texar-pytorch/blob/master/texar/torch/modules/classifiers/bert_classifier.py

Test Conducted

Describe what test cases are included for the PR.
CNN version can train and predict correctly locally.
Bert version can train and predict correctly locally.
CNN version with data augmentation can train and predict correctly locally.

ziqian98 · 2020-12-16T04:39:26Z

For the classification task using Conv_classifier:

run main_train_cnn.py to train
run main_predict_cnn.py to predict
model is defined in cnn.py

For the classification task using Bert_classifier:

run main_train_bert.py to train
run main_predict_bert.py to predict

For the classification task using Conv_classifier and merging the data_augmentation feature:

run main_train_cnn_data_augmentation.py to train
run main_predict_cnn.py to predict
model is defined in cnn.py
in imdb_reader_data_augmentation.py, DataAugmentProcessor is added for yielding original datapack and augmented datapack.

hunterhector

You will need a readme to tell people what is this example and how to use this.
Please add models to download and evaluation results, please include these in the readme.
Please clean out your code, if you commented out lines, remove them. If you have functions that you reuse, create a utility.
Don't randomly print stuff, it will pollute the user's terminal. If you really want to show something, use logging, where users can suppress.
Don't call your folder Classification_new.
The whole train cnn and train bert are just dupcliates of each other. Please don't repeat yourself, just make an example that work for both. Same apply for the test file.
The same for the data augmentation variant and the normal variant, why are you copying everything?
You are using the augmentation processors wrong. Why are you copying the reader implementation again?

examples/Classification_new/main_predict_bert.py

tests/forte/data/readers/imdb_reader_test.py

examples/Classification_new/main_train_cnn_data_augmentation.py

examples/Classification_new/main_predict_bert.py

examples/Classification_new/main_predict_cnn.py

examples/Classification_new/main_train_bert.py

examples/classification_example/main_train.py

jasonyanwenl

Overall, you missed the type hint for many variables.

examples/classification_example/main_train.py

examples/classification_example/cnn.py

examples/classification_example/main_train.py

ziqian98 added 2 commits December 11, 2020 02:49

LZQ:Add Classification Task__CNN

c575f12

LZQ: Reader and Reader Test

aac73a6

ziqian98 requested a review from hunterhector December 11, 2020 08:06

ziqian98 self-assigned this Dec 11, 2020

ziqian98 added the model_interface label Dec 11, 2020

ziqian98 added 2 commits December 11, 2020 03:25

LZQ:Update main train

c9a4771

LZQ:Update CNN model

ff24adc

ziqian98 marked this pull request as draft December 11, 2020 09:05

ziqian98 added 8 commits December 11, 2020 15:10

LZQ: Update minor bugs in main train

9b28b8d

LZQ: Update minor bugs in main predict

f2acf1a

LZQ: Update for minor changes

b74d77c

LZQ: update minor changes in main_train

9ae5fb4

LZQ: Add Bert in Classification Task

a73d7e6

LZQ: Add Bert in Classification Prediction

3b215e9

LZQ: Merge data augmentation feature with main_train_cnn

7480e5a

LZQ: Add reader for merging data_augmentation feature

83cbea7

ziqian98 changed the title ~~A Classification Example in Forte Pipeline using CNN classifier~~ A Classification Example in Forte Pipeline using CNN Classifier and Bert Classifier Dec 16, 2020

ziqian98 marked this pull request as ready for review December 16, 2020 04:40

Merge branch 'master' into lzq_new_classification

456f66d

hunterhector requested changes Dec 17, 2020

View reviewed changes

Reform main train

7ccffee

ziqian98 marked this pull request as draft December 20, 2020 01:31

ziqian98 added 5 commits December 19, 2020 20:34

LZQ: Reform redaer

9b8d1aa

LZQ: Delete old files

7f560d7

LZQ: Delete old augmentation reader

9ba6808

LZQ: Upload imdb data samples to lzq_new_classification

cad9fb7

LZQ: Format

63740ec

jasonyanwenl reviewed Dec 21, 2020

View reviewed changes

examples/classification_example/main_train.py Show resolved Hide resolved

jasonyanwenl reviewed Dec 21, 2020

View reviewed changes

examples/classification_example/main_train.py Outdated Show resolved Hide resolved