Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperparameter setting for allsides data #3

Open
slchun-mmlab opened this issue Feb 26, 2024 · 1 comment
Open

Hyperparameter setting for allsides data #3

slchun-mmlab opened this issue Feb 26, 2024 · 1 comment

Comments

@slchun-mmlab
Copy link

Dear authors. Thank you for your great work and also your efforts in sharing the code & data.

I was wondering if you can let me know which hyperparameters were used for allsides data.

With your default hyperparameter setting in the README, the performance of allsides-s test data have reached maximum of 70% accuracy like below and I could not find all the required parameters from the paper.

Again, thank you for your work and assistance.

==================================================================================
--gpu_index=0
--batch_size=16
--num_epochs=50
--learning_rate=0.001
--max_sentence=20
--embed_size=256
--dropout=0.3
--num_layer=1
--num_head=4
--d_hid=128
--dataset=ALLSIDES-S
--alpha=0.6
--beta=0.2
Count of using GPUs: 2
Count of using GPUs: 2
====================================TRAIN INFO START====================================

  • TRAINING MODEL = KHAN
    • Embedding Size = 256
    • Maximum Length = 20
    • Number of Transformer Encoder Layers = 1
    • Number of Multi-head Attentions = 4
    • Hidden Layer Dimension = 128
    • Dropout Probability = 0.3
    • Alpha = 0.6
    • Beta = 0.2
  • DATASET = ALLSIDES-S
  • BATCH SIZE = 16
  • NUM EPOCHS = 50
  • LEARNING RATE = 0.001

==================================== Training Start ====================================

  • Training data size: 13304
  • Test data size: 1479
    [2.233713901947616, 3.180492469519484, 4.20347551342812]
  • Reading Pre-trained Knowledge Embeddings...
    50
    torch.Size([256])
    Total params: 122.80M
    Fold: 1 | Epoch: 1 | Loss: 1.3780 | TrainAcc: 0.4497 | ValAcc: 0.6105 | Time: 40.98
    Fold: 1 | Epoch: 2 | Loss: 0.9669 | TrainAcc: 0.5270 | ValAcc: 0.5916 | Time: 40.51
    Fold: 1 | Epoch: 3 | Loss: 0.8768 | TrainAcc: 0.5714 | ValAcc: 0.5625 | Time: 40.43
    Fold: 1 | Epoch: 4 | Loss: 0.8125 | TrainAcc: 0.6040 | ValAcc: 0.5991 | Time: 40.43
    Fold: 1 | Epoch: 5 | Loss: 0.7928 | TrainAcc: 0.6197 | ValAcc: 0.6045 | Time: 40.58
    Fold: 1 | Epoch: 6 | Loss: 0.7621 | TrainAcc: 0.6335 | ValAcc: 0.6241 | Time: 40.37
    Fold: 1 | Epoch: 7 | Loss: 0.7471 | TrainAcc: 0.6439 | ValAcc: 0.6302 | Time: 40.31
    Fold: 1 | Epoch: 8 | Loss: 0.7436 | TrainAcc: 0.6512 | ValAcc: 0.6504 | Time: 40.32
    Fold: 1 | Epoch: 9 | Loss: 0.7038 | TrainAcc: 0.6757 | ValAcc: 0.6234 | Time: 40.48
    Fold: 1 | Epoch: 10 | Loss: 0.6852 | TrainAcc: 0.6941 | ValAcc: 0.6376 | Time: 40.37
    Fold: 1 | Epoch: 11 | Loss: 0.6664 | TrainAcc: 0.7068 | ValAcc: 0.6092 | Time: 40.42
    Fold: 1 | Epoch: 12 | Loss: 0.6353 | TrainAcc: 0.7223 | ValAcc: 0.6795 | Time: 40.41
    Fold: 1 | Epoch: 13 | Loss: 0.6127 | TrainAcc: 0.7389 | ValAcc: 0.6160 | Time: 40.42
    Fold: 1 | Epoch: 14 | Loss: 0.5859 | TrainAcc: 0.7540 | ValAcc: 0.6599 | Time: 40.39
    Fold: 1 | Epoch: 15 | Loss: 0.5598 | TrainAcc: 0.7653 | ValAcc: 0.6633 | Time: 40.53
    Fold: 1 | Epoch: 16 | Loss: 0.5337 | TrainAcc: 0.7862 | ValAcc: 0.6849 | Time: 40.34
    Fold: 1 | Epoch: 17 | Loss: 0.5187 | TrainAcc: 0.7892 | ValAcc: 0.6545 | Time: 40.52
    Fold: 1 | Epoch: 18 | Loss: 0.5091 | TrainAcc: 0.8012 | ValAcc: 0.6910 | Time: 40.27
    Fold: 1 | Epoch: 19 | Loss: 0.4569 | TrainAcc: 0.8190 | ValAcc: 0.6728 | Time: 40.29
    Fold: 1 | Epoch: 20 | Loss: 0.4471 | TrainAcc: 0.8238 | ValAcc: 0.6842 | Time: 40.32
    Fold: 1 | Epoch: 21 | Loss: 0.4305 | TrainAcc: 0.8340 | ValAcc: 0.6795 | Time: 40.32
    Fold: 1 | Epoch: 22 | Loss: 0.4135 | TrainAcc: 0.8422 | ValAcc: 0.6802 | Time: 40.24
    Fold: 1 | Epoch: 23 | Loss: 0.4013 | TrainAcc: 0.8430 | ValAcc: 0.6883 | Time: 40.26
    Fold: 1 | Epoch: 24 | Loss: 0.3912 | TrainAcc: 0.8534 | ValAcc: 0.6890 | Time: 40.31
    Fold: 1 | Epoch: 25 | Loss: 0.3343 | TrainAcc: 0.8771 | ValAcc: 0.6910 | Time: 40.36
    Fold: 1 | Epoch: 26 | Loss: 0.3321 | TrainAcc: 0.8761 | ValAcc: 0.6687 | Time: 40.28
    Fold: 1 | Epoch: 27 | Loss: 0.2972 | TrainAcc: 0.8899 | ValAcc: 0.6795 | Time: 40.23
    Fold: 1 | Epoch: 28 | Loss: 0.2997 | TrainAcc: 0.8887 | ValAcc: 0.6728 | Time: 40.29
    Fold: 1 | Epoch: 29 | Loss: 0.2795 | TrainAcc: 0.9000 | ValAcc: 0.6782 | Time: 40.30
    Fold: 1 | Epoch: 30 | Loss: 0.2816 | TrainAcc: 0.8957 | ValAcc: 0.6660 | Time: 40.23
    Fold: 1 | Epoch: 31 | Loss: 0.2515 | TrainAcc: 0.9103 | ValAcc: 0.6795 | Time: 40.37
    Fold: 1 | Epoch: 32 | Loss: 0.2407 | TrainAcc: 0.9112 | ValAcc: 0.6863 | Time: 40.35
    Fold: 1 | Epoch: 33 | Loss: 0.2225 | TrainAcc: 0.9213 | ValAcc: 0.6802 | Time: 40.33
    Fold: 1 | Epoch: 34 | Loss: 0.2215 | TrainAcc: 0.9216 | ValAcc: 0.6957 | Time: 40.29
    Fold: 1 | Epoch: 35 | Loss: 0.2202 | TrainAcc: 0.9204 | ValAcc: 0.6788 | Time: 40.44
    Fold: 1 | Epoch: 36 | Loss: 0.2118 | TrainAcc: 0.9250 | ValAcc: 0.6775 | Time: 40.34
    Fold: 1 | Epoch: 37 | Loss: 0.1921 | TrainAcc: 0.9335 | ValAcc: 0.6768 | Time: 40.32
    Fold: 1 | Epoch: 38 | Loss: 0.1914 | TrainAcc: 0.9305 | ValAcc: 0.6707 | Time: 40.18
    Fold: 1 | Epoch: 39 | Loss: 0.1904 | TrainAcc: 0.9324 | ValAcc: 0.6694 | Time: 40.15
    Fold: 1 | Epoch: 40 | Loss: 0.1721 | TrainAcc: 0.9393 | ValAcc: 0.6755 | Time: 40.25
    Fold: 1 | Epoch: 41 | Loss: 0.1718 | TrainAcc: 0.9390 | ValAcc: 0.6897 | Time: 40.23
    Fold: 1 | Epoch: 42 | Loss: 0.1819 | TrainAcc: 0.9363 | ValAcc: 0.6829 | Time: 40.25
    Fold: 1 | Epoch: 43 | Loss: 0.1778 | TrainAcc: 0.9379 | ValAcc: 0.6700 | Time: 40.27
    Fold: 1 | Epoch: 44 | Loss: 0.1685 | TrainAcc: 0.9392 | ValAcc: 0.6897 | Time: 40.25
    Fold: 1 | Epoch: 45 | Loss: 0.1669 | TrainAcc: 0.9425 | ValAcc: 0.6714 | Time: 40.21
    Fold: 1 | Epoch: 46 | Loss: 0.1521 | TrainAcc: 0.9460 | ValAcc: 0.6667 | Time: 40.26
    Fold: 1 | Epoch: 47 | Loss: 0.1477 | TrainAcc: 0.9480 | ValAcc: 0.6755 | Time: 40.08
    Fold: 1 | Epoch: 48 | Loss: 0.1444 | TrainAcc: 0.9484 | ValAcc: 0.6761 | Time: 40.00
    Fold: 1 | Epoch: 49 | Loss: 0.1621 | TrainAcc: 0.9441 | ValAcc: 0.6856 | Time: 40.00
    Fold: 1 | Epoch: 50 | Loss: 0.1584 | TrainAcc: 0.9445 | ValAcc: 0.6640 | Time: 39.97

FOLD - 1
Test Accuracy: 0.6957, Training time: 2096.58 (sec.)

@slchun-mmlab
Copy link
Author

Also, how can I reproduce the results for SemEval data? The performance of the model on SemEval data is also saturated near 80% given the parameter settings in README. Pytorch version was 1.10.0 with torchtext 0.11.0 which is slightly different form mentioned in README, but I have also evaluated the code in torch > 2.0 which incurs not much difference.

=============================== 10-Folds Training Result ===============================
=============== Total Accuracy: 0.7906, Training time: 925.95 (sec.) ================
=============== Best Accuracy: 0.8438, Accuracy variance: 0.0013 ================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant