You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
def deepspeed_load_checkpoint(deepspeed_engine, checkpoint_path, load_module_strict=True):
# it's possible that the user is trying to resume from model_path, which doesn't necessarily
# contain a deepspeed checkpoint. e.g. examples just check if the dir exists and assume it's
# a resume from a checkpoint and not just a local pretrained weight. So we check here if the
# path contains what looks like a deepspeed checkpoint
import glob
deepspeed_checkpoint_dirs = sorted(glob.glob(f"{checkpoint_path}/global_step*"))
if len(deepspeed_checkpoint_dirs) > 0:
logger.info(f"Attempting to resume from {checkpoint_path}")
# this magically updates self.optimizer and self.lr_scheduler
load_path, _ = deepspeed_engine.load_checkpoint(
checkpoint_path,
load_module_strict=load_module_strict,
load_optimizer_states=True,
load_lr_scheduler_states=True,
)
if load_path is None:
raise ValueError(f"[deepspeed] failed to resume from checkpoint {checkpoint_path}")
else:
raise ValueError(f"Can't find a valid checkpoint at {checkpoint_path}")
The text was updated successfully, but these errors were encountered:
Question
默认预训练加载模型是在 deepspeed_checkpoint_dirs = sorted(glob.glob(f"{checkpoint_path}/global_step*")),可是预训练保存模型的时候根本就没有这个目录
The text was updated successfully, but these errors were encountered: