SMP2019中文隐式情感分析评测

这里说明如何在SMP2019-ECISA数据集上进行stacking集成，从而得到SOTA结果。更多的关于模型集成的说明请参考这里。SMP2019-ECISA数据集可以在下游任务数据集章节中下载。首先使用K折交叉验证训练，得到分类器和训练集特征：

CUDA_VISIBLE_DEVICES=0,1 python3 finetune/run_classifier_cv.py --pretrained_model_path models/google_zh_model.bin \
                                                               --vocab_path models/google_zh_vocab.txt \
                                                               --config_path models/bert/base_config.json \
                                                               --output_model_path models/ecisa_classifier_model_0.bin \
                                                               --train_path datasets/smp2019-ecisa/train.tsv \
                                                               --train_features_path datasets/smp2019-ecisa/train_features_0.npy \
                                                               --epochs_num 3 --batch_size 32 --folds_num 5

CUDA_VISIBLE_DEVICES=0,1 python3 finetune/run_classifier_cv.py --pretrained_model_path models/review_roberta_large_model.bin \
                                                               --vocab_path models/google_zh_vocab.txt \
                                                               --config_path models/bert/large_config.json \
                                                               --output_model_path models/ecisa_classifier_model_1.bin \
                                                               --train_path datasets/smp2019-ecisa/train.tsv \
                                                               --train_features_path datasets/smp2019-ecisa/train_features_1.npy \
                                                               --epochs_num 3 --batch_size 32 --folds_num 5

CUDA_VISIBLE_DEVICES=0 python3 finetune/run_classifier_cv.py --pretrained_model_path models/cluecorpussmall_roberta_base_seq512_model.bin-250000 \
                                                             --vocab_path models/google_zh_vocab.txt \
                                                             --config_path models/bert/base_config.json \
                                                             --output_model_path models/ecisa_classifier_model_2.bin \
                                                             --train_path datasets/smp2019-ecisa/train.tsv \
                                                             --train_features_path datasets/smp2019-ecisa/train_features_2.npy \
                                                             --epochs_num 3 --batch_size 32 --seq_length 160 --folds_num 5

CUDA_VISIBLE_DEVICES=0 python3 finetune/run_classifier_cv.py --pretrained_model_path models/cluecorpussmall_gpt2_seq1024_model.bin-250000 \
                                                             --vocab_path models/google_zh_vocab.txt \
                                                             --config_path models/gpt2/config.json \
                                                             --output_model_path models/ecisa_classifier_model_3.bin \
                                                             --train_path datasets/smp2019-ecisa/train.tsv \
                                                             --train_features_path datasets/smp2019-ecisa/train_features_3.npy \
                                                             --epochs_num 3 --batch_size 32 --seq_length 100 --folds_num 5 \
                                                             --pooling mean

CUDA_VISIBLE_DEVICES=0,1 python3 finetune/run_classifier_cv.py --pretrained_model_path models/mixed_corpus_bert_large_model.bin \
                                                               --vocab_path models/google_zh_vocab.txt \
                                                               --config_path models/bert/large_config.json \
                                                               --output_model_path models/ecisa_classifier_model_4.bin \
                                                               --train_path datasets/smp2019-ecisa/train.tsv \
                                                               --train_features_path datasets/smp2019-ecisa/train_features_4.npy \
                                                               --epochs_num 3 --batch_size 32 --folds_num 5

CUDA_VISIBLE_DEVICES=0,1 python3 finetune/run_classifier_cv.py --pretrained_model_path models/chinese_roberta_wwm_large_ext_pytorch.bin \
                                                               --vocab_path models/google_zh_vocab.txt \
                                                               --config_path models/bert/large_config.json \
                                                               --output_model_path models/ecisa_classifier_model_5.bin \
                                                               --train_path datasets/smp2019-ecisa/train.tsv \
                                                               --train_features_path datasets/smp2019-ecisa/train_features_5.npy \
                                                               --folds_num 5 --epochs_num 3 --batch_size 32

使用K折交叉验证推理，得到验证集特征：

CUDA_VISIBLE_DEVICES=0 python3 inference/run_classifier_cv_infer.py --load_model_path models/ecisa_classifier_model_0.bin \
                                                                    --vocab_path models/google_zh_vocab.txt \
                                                                    --config_path models/bert/base_config.json \
                                                                    --test_path datasets/smp2019-ecisa/dev.tsv \
                                                                    --test_features_path datasets/smp2019-ecisa/test_features_0.npy \
                                                                    --folds_num 5 --labels_num 3

CUDA_VISIBLE_DEVICES=0 python3 inference/run_classifier_cv_infer.py --load_model_path models/ecisa_classifier_model_1.bin \
                                                                    --vocab_path models/google_zh_vocab.txt \
                                                                    --config_path models/bert/large_config.json \
                                                                    --test_path datasets/smp2019-ecisa/dev.tsv \
                                                                    --test_features_path datasets/smp2019-ecisa/test_features_1.npy \
                                                                    --folds_num 5 --labels_num 3

CUDA_VISIBLE_DEVICES=0 python3 inference/run_classifier_cv_infer.py --load_model_path models/ecisa_classifier_model_2.bin \
                                                                    --vocab_path models/google_zh_vocab.txt \
                                                                    --config_path models/bert/base_config.json \
                                                                    --test_path datasets/smp2019-ecisa/dev.tsv \
                                                                    --test_features_path datasets/smp2019-ecisa/test_features_2.npy \
                                                                    --seq_length 160 --folds_num 5 --labels_num 3

CUDA_VISIBLE_DEVICES=0 python3 inference/run_classifier_cv_infer.py --load_model_path models/ecisa_classifier_model_3.bin \
                                                                    --vocab_path models/google_zh_vocab.txt \
                                                                    --config_path models/gpt2/config.json \
                                                                    --test_path datasets/smp2019-ecisa/dev.tsv \
                                                                    --test_features_path datasets/smp2019-ecisa/test_features_3.npy \
                                                                    --seq_length 100 --folds_num 5 --labels_num 3 \
                                                                    --pooling mean

CUDA_VISIBLE_DEVICES=0 python3 inference/run_classifier_cv_infer.py --load_model_path models/ecisa_classifier_model_4.bin \
                                                                    --vocab_path models/google_zh_vocab.txt \
                                                                    --config_path models/bert/large_config.json \
                                                                    --test_path datasets/smp2019-ecisa/dev.tsv \
                                                                    --test_features_path datasets/smp2019-ecisa/test_features_4.npy \
                                                                    --folds_num 5 --labels_num 3

CUDA_VISIBLE_DEVICES=0 python3 inference/run_classifier_cv_infer.py --load_model_path models/ecisa_classifier_model_5.bin \
                                                                    --vocab_path models/google_zh_vocab.txt \
                                                                    --config_path models/bert/large_config.json \
                                                                    --test_path datasets/smp2019-ecisa/dev.tsv \
                                                                    --test_features_path datasets/smp2019-ecisa/test_features_5.npy \
                                                                    --folds_num 5 --labels_num 3

LightGBM超参数搜索：

python3 scripts/run_lgb_cv_bayesopt.py --train_path datasets/smp2019-ecisa/train.tsv \
                                       --train_features_path datasets/smp2019-ecisa/ \
                                       --models_num 6 --folds_num 5 --labels_num 3 --epochs_num 100

使用LightGBM进行训练和验证：

python3 scripts/run_lgb.py --train_path datasets/smp2019-ecisa/train.tsv --test_path datasets/smp2019-ecisa/dev.tsv \
                           --train_features_path datasets/smp2019-ecisa/ --test_features_path datasets/smp2019-ecisa/ \
                           --models_num 6 --labels_num 3

可以在比赛主页上找到更多详细信息。

Home
主页
- 项目特色
- 依赖环境
- 快速上手
- 预训练数据
- 下游任务数据集
- 预训练模型仓库
- 使用说明
- 竞赛解决方案
  - 中文任务测评基准CLUE
  - SMP2020-EWECT
  - SMP2019-ECISA
  - CCF-BDCI2021-面向黑灰产治理的恶意短信变体字还原
  - 英文任务测评基准GLUE
  - 视觉任务评测基准
- 引用

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SMP2019中文隐式情感分析评测

Clone this wiki locally