Problems with training #69
Replies: 1 comment
-
First and foremost, it is important to understand that the It seems that you are in the process of searching for the optimal training settings. In this case, I recommend that you first look for effective training settings using a smaller model rather than starting with When leveraging the knowledge learned from one model to train another model, it is recommended to use fine-tuning. In this process, please consider that the decoder part learned from one model cannot be used in the decoder part of another model. Therefore, when starting the training of another model, only specify the initial values for the encoder. To do this, instead of using
Please don't worry about it at all. I'm here to help, and your questions are always welcome! |
Beta Was this translation helpful? Give feedback.
-
Hello Cryolite,
I encountered some issues while training models using kanachan, and I can't pinpoint the problem accurately.
I've tried many parameters and processes, but the models obtained through offline_rl are always unsatisfactory.
My most recent attempt was using the code in the v2 branch, where I obtained a model using bc, and then I applied its encoder to cql. Since the cql module doesn't seem to support this directly, I modified some of the code, but these modifications didn't involve the specific training part.
I used the following parameters:
I deliberately set the index to 0
Then, I checked the first saved model, and the effect was very poor. The Q value had no distribution, and each option received the same weight.
I analyzed the weights in the decoder model and found that most of the values were 0, which seems abnormal.
After that, I tried the
single
size decoder, and the results seemed normal with a certain distribution of Q values. However, as the training progressed, the model's performance deteriorated. The Q values no longer showed a distribution but were extremely 0 or 0.9.The training data was generated by annotation, then converted and randomly shuffled by annotate4rl before being extracted.
The reward function used is
End-of-Game Ranking + Raw Points
from the wiki.Can you offer any help?
For example, some references on processes and parameters.
I realize that I frequently ask you questions, which may cause some inconvenience. I am sincerely sorry for this. I truly appreciate your ongoing assistance and patience.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions