Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is a training process and GPU memory usage, but the GPU is not working. #58

Open
yinkaaiwu opened this issue Oct 19, 2023 · 2 comments

Comments

@yinkaaiwu
Copy link

yinkaaiwu commented Oct 19, 2023

Hello, here is my code:

ml_potential = FinetunerCalc(
    checkpoint_path="gemnet_t_direct_h512_all.pt",
    mlp_params={
        "tuner": {
            "unfreeze_blocks": [
                "out_blocks.3.seq_forces",
                "out_blocks.3.scale_rbf_F",
                "out_blocks.3.dense_rbf_F",
                "out_blocks.3.out_forces",
                "out_blocks.2.seq_forces",
                "out_blocks.2.scale_rbf_F",
                "out_blocks.2.dense_rbf_F",
                "out_blocks.2.out_forces",
                "out_blocks.1.seq_forces",
                "out_blocks.1.scale_rbf_F",
                "out_blocks.1.dense_rbf_F",
                "out_blocks.1.out_forces",
            ],
            "num_threads": 32
        },
        "optim": {
            "batch_size": 1,
            "num_workers": 0,
            "max_epochs": 400,
            "lr_initial": 0.0003,
            "factor": 0.9,
            "eval_every": 1,
            "patience": 3,
            "checkpoint_every": 100000,
            "scheduler_loss": "train",
            "weight_decay": 0,
            "eps": 1e-8,
            "optimizer_params": {
                "weight_decay": 0,
                "eps": 1e-8,
            },
        },
        "task": {
            "primary_metric": "loss",
        },
        "local_rank": 0
    }, 
)
ml_potential.train(parent_dataset=train_dataset[:2])

my cuda version is 11.3, nvidia-smi can see the training process and GPU memory usage, but the volatile gpu-util is 0, and the power consumption has not increased. Is there a problem with my parameter settings?

@jiaozihao18
Copy link

jiaozihao18 commented Oct 25, 2023

@yinkaai maybe can try add "cpu":False in mlp_params dict. (ref: update oal example for gpu usage #36)

@yinkaaiwu
Copy link
Author

@yinkaai maybe can try add "cpu":False in mlp_params dict. (ref: update oal example for gpu usage #36)

thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants