Problems with training #69

constellation39 · 2024-05-08T08:36:43Z

constellation39
May 8, 2024

Hello Cryolite,

I encountered some issues while training models using kanachan, and I can't pinpoint the problem accurately.
I've tried many parameters and processes, but the models obtained through offline_rl are always unsatisfactory.

My most recent attempt was using the code in the v2 branch, where I obtained a model using bc, and then I applied its encoder to cql. Since the cql module doesn't seem to support this directly, I modified some of the code, but these modifications didn't involve the specific training part.
I used the following parameters:

- num_workers=2
- device=cuda
- encoder=bert_base
- decoder=double
- checkpointing=true
- initial_model_prefix=/workspace/data/snapshots
- initial_model_index=0
- batch_size=32
- gradient_accumulation_steps=4
- optimizer=lamb
- reward_plugin=/workspace/reward.py
- snapshot_interval=10000000

I deliberately set the index to 0
Then, I checked the first saved model, and the effect was very poor. The Q value had no distribution, and each option received the same weight.
I analyzed the weights in the decoder model and found that most of the values were 0, which seems abnormal.

After that, I tried the single size decoder, and the results seemed normal with a certain distribution of Q values. However, as the training progressed, the model's performance deteriorated. The Q values no longer showed a distribution but were extremely 0 or 0.9.

The training data was generated by annotation, then converted and randomly shuffled by annotate4rl before being extracted.

The reward function used is End-of-Game Ranking + Raw Points from the wiki.

Can you offer any help?
For example, some references on processes and parameters.
I realize that I frequently ask you questions, which may cause some inconvenience. I am sincerely sorry for this. I truly appreciate your ongoing assistance and patience.
Thank you.

Cryolite · 2024-05-08T11:23:43Z

Cryolite
May 8, 2024
Maintainer

First and foremost, it is important to understand that the v2 branch is in the midst of active development, and due to significant changes in the codebase, code that previously worked may no longer function soon. For example, the format of annotations is currently being changed. Please use it at your own risk and responsibility.

It seems that you are in the process of searching for the optimal training settings. In this case, I recommend that you first look for effective training settings using a smaller model rather than starting with bert_base. In the v2 branch, in addition to bert_base and bert_large, there are smaller preset model configurations, bert_tiny, bert_mini, bert_small, and bert_medium. bert_tiny is the smallest model configuration, but it may be too small to achieve decent performance, so how about trying to find effective training settings with bert_mini? Once you find a successful setup, you will need to retrain with bert_base, but this approach can significantly reduce the time spent searching for successful training settings, hence it is recommended.

When leveraging the knowledge learned from one model to train another model, it is recommended to use fine-tuning. In this process, please consider that the decoder part learned from one model cannot be used in the decoder part of another model. Therefore, when starting the training of another model, only specify the initial values for the encoder. To do this, instead of using initial_model_prefix and initial_model_index, please specify only encoder.load_from=/workspace/data/snapshots/encoder.xxx.pth.

I realize that I frequently ask you questions, which may cause some inconvenience. I am sincerely sorry for this. I truly appreciate your ongoing assistance and patience.

Please don't worry about it at all. I'm here to help, and your questions are always welcome!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with training #69

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Problems with training #69

constellation39 May 8, 2024

Replies: 1 comment

Cryolite May 8, 2024 Maintainer

constellation39
May 8, 2024

Cryolite
May 8, 2024
Maintainer