Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blitz Bayesian model wrapped in DataParallel not using multiple GPUs #84

Open
jsanket123 opened this issue Jun 13, 2021 · 2 comments
Open

Comments

@jsanket123
Copy link

jsanket123 commented Jun 13, 2021

I have been trying to use Bayesian linear regression example given by Blitz authors and parallelize their model by wrapping it with torch.nn.DataParallel. However, it seems that the given code is only using one gpu and not multiple gpus. Below is the same code from the bayesian_regression_boston.py example with model wrapped in DataParallel.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

from blitz.modules import BayesianLinear
from blitz.utils import variational_estimator

from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

X, y = load_boston(return_X_y=True)
X = StandardScaler().fit_transform(X)
y = StandardScaler().fit_transform(np.expand_dims(y, -1))

X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=.25,
                                                    random_state=42)


X_train, y_train = torch.tensor(X_train).float(), torch.tensor(y_train).float()
X_test, y_test = torch.tensor(X_test).float(), torch.tensor(y_test).float()


@variational_estimator
class BayesianRegressor(nn.Module):
    def __init__(self, input_dim, output_dim):
        super().__init__()
        #self.linear = nn.Linear(input_dim, output_dim)
        self.blinear1 = BayesianLinear(input_dim, 512)
        self.blinear2 = BayesianLinear(512, output_dim)
        
    def forward(self, x):
        print("\tIn Model: input size", x.size())
        x_ = self.blinear1(x)
        x_ = F.relu(x_)
        return self.blinear2(x_)


def evaluate_regression(regressor,
                        X,
                        y,
                        samples = 100,
                        std_multiplier = 2):
    preds = [regressor(X) for i in range(samples)]
    preds = torch.stack(preds)
    means = preds.mean(axis=0)
    stds = preds.std(axis=0)
    ci_upper = means + (std_multiplier * stds)
    ci_lower = means - (std_multiplier * stds)
    ic_acc = (ci_lower <= y) * (ci_upper >= y)
    ic_acc = ic_acc.float().mean()
    return ic_acc, (ci_upper >= y).float().mean(), (ci_lower <= y).float().mean()

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
regressor = BayesianRegressor(13, 1).to(device)
optimizer = optim.Adam(regressor.parameters(), lr=0.01)
criterion = torch.nn.MSELoss()

if torch.cuda.device_count() > 1:
  print("Let's use", torch.cuda.device_count(), "GPUs!")
  regressor = nn.DataParallel(regressor)
regressor.to(device)

ds_train = torch.utils.data.TensorDataset(X_train, y_train)
dataloader_train = torch.utils.data.DataLoader(ds_train, batch_size=16, shuffle=True)

ds_test = torch.utils.data.TensorDataset(X_test, y_test)
dataloader_test = torch.utils.data.DataLoader(ds_test, batch_size=16, shuffle=True)


iteration = 0
for epoch in range(100):
    for i, (datapoints, labels) in enumerate(dataloader_train):
        optimizer.zero_grad()
        
        print("Outside: input size", datapoints.size())
        loss = regressor.module.sample_elbo(inputs=datapoints.to(device),
                           labels=labels.to(device),
                           criterion=criterion,
                           sample_nbr=1,
                           complexity_cost_weight=1/X_train.shape[0])
        loss.backward()
        optimizer.step()
        
        iteration += 1
        if iteration%100==0:
            ic_acc, under_ci_upper, over_ci_lower = evaluate_regression(regressor,
                                                                        X_test.to(device),
                                                                        y_test.to(device),
                                                                        samples=25,
                                                                        std_multiplier=3)
            
            print("CI acc: {:.2f}, CI upper acc: {:.2f}, CI lower acc: {:.2f}".format(ic_acc, under_ci_upper, over_ci_lower))
            print("Loss: {:.4f}".format(loss))

Below I provide the portion of the output. I print out the input dimension before calling the loss.module.sample_elbo and that should have the batch_size x no_variables (which is correctly printed out). However, inside the model's forward map it should print out the dimensions of each smaller batch taken by each of the GPU so it should have printed out 8 lines as 'In Model: input size torch.Size([2, 13])'. But apparently, it is only putting data on one GPU.

Could you please let me know what needs to be done for this to work on multiple GPUs? FYI: I use the methods suggested here to check whether multiple GPUs are actually being used for processing the input batch.

Let's use 8 GPUs!
Outside: input size torch.Size([16, 13])
        In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
        In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
        In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
        In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
        In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
        In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
        In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
        In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
        In Model: input size torch.Size([16, 13])
@piEsposito
Copy link
Owner

That is still not on our roadmap, and I think that the probabilistic layers do not implement all the needed inferfaces to comply with Torch DataParallel. if you want to work on that, it would be much appreciated.

@jsanket123
Copy link
Author

Hey thanks for getting back to me. I was able to implement my model in a parallelized way. It definitely takes lots of restructuring than what we normally do for one GPU. This is my current work so I will not be releasing it until I go through the review process for my paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants