You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the training process, commonly used to save the file structure:
config dump -> workdir
log.txt -> _log_dir(workdir+tempstemp)
checkpoint(best) -> workdir
checkpoint(iter、epoch)and txt -> workdir
vis(tensorboard) -> workdir
In addition, I'll customize hooks to save the project code and validate the visualizations.
I have tried several ways to make each training file saved in a folder.Will encounter some problems.
Method 1: Before creating runner, change workdir to workdir+experment_name(tempstemp). However, tempstemp may be inconsistent due to multiple cards. You need to put dist_init in front of the runner, modify it, and then create the runner.
Method 2: Inherit runner and unify the save path to _log_dir(workdir+tempstemp). Because _log_dir is written dead. config dump and save checkpoint need to be rewritten to achieve this with minimal changes. However, if you save the checkpoint (iter, epoch) and txt, the txt will be stored in workdir. As a result, only the last three checkpoints cannot be saved.
Although the above two methods can indirectly complete the purpose, but the feeling of sewing is very uncomfortable.
From the save logic, workdir should be self.workdir+self._experment_name. For example, the XX experiment I want to do has been done many times. For example, the save path is XX, experiment A, experiment B... . Although runner's init has _experment_name, the effect is not represented.
I also understand that multiple save files may be scattered into different paths, but some are written as fixed, which can only be modified by the inherited runner, and some are written in the runner's init, which cannot be modified even if I add hooks. It kills me with Obsessive-compulsive disorder.
Any other context?
No response
The text was updated successfully, but these errors were encountered:
What is the feature?
In the training process, commonly used to save the file structure:
config dump -> workdir
log.txt -> _log_dir(workdir+tempstemp)
checkpoint(best) -> workdir
checkpoint(iter、epoch)and txt -> workdir
vis(tensorboard) -> workdir
In addition, I'll customize hooks to save the project code and validate the visualizations.
I have tried several ways to make each training file saved in a folder.Will encounter some problems.
Method 1: Before creating runner, change workdir to workdir+experment_name(tempstemp). However, tempstemp may be inconsistent due to multiple cards. You need to put dist_init in front of the runner, modify it, and then create the runner.
Method 2: Inherit runner and unify the save path to _log_dir(workdir+tempstemp). Because _log_dir is written dead. config dump and save checkpoint need to be rewritten to achieve this with minimal changes. However, if you save the checkpoint (iter, epoch) and txt, the txt will be stored in workdir. As a result, only the last three checkpoints cannot be saved.
Although the above two methods can indirectly complete the purpose, but the feeling of sewing is very uncomfortable.
From the save logic, workdir should be self.workdir+self._experment_name. For example, the XX experiment I want to do has been done many times. For example, the save path is XX, experiment A, experiment B... . Although runner's init has _experment_name, the effect is not represented.
I also understand that multiple save files may be scattered into different paths, but some are written as fixed, which can only be modified by the inherited runner, and some are written in the runner's init, which cannot be modified even if I add hooks. It kills me with Obsessive-compulsive disorder.
Any other context?
No response
The text was updated successfully, but these errors were encountered: