Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling PyTorch DataLoader Method #382

Closed
dbl001 opened this issue Jun 12, 2022 · 11 comments
Closed

Profiling PyTorch DataLoader Method #382

dbl001 opened this issue Jun 12, 2022 · 11 comments

Comments

@dbl001
Copy link

dbl001 commented Jun 12, 2022

I ported some AI-Feynman code to use PyTorch's new backend - 'mps'.
Their 'vanilla' code runs fine my 'mps' code slows down after each iteration in NN_Train
and virtually stops at iteration 11, where the process grows to 48gb.
First, I'd like to profile just this line of code:

for i, data in enumerate(my_dataloader):

Second, the whole function NN_train.

Any suggestions how I just profile the invocation to 'my_dataloader'?

def NN_train(pathdir, filename, epochs=1000, lrs=1e-2, N_red_lr=4, pretrained_path=""):
    try:
        os.mkdir("results/NN_trained_models/")
    except:
        pass

    try:
        os.mkdir("results/NN_trained_models/models/")
    except:
        pass
    try:
        n_variables = np.loadtxt(pathdir+"%s" %filename, dtype='str').shape[1]-1
        variables = np.loadtxt(pathdir+"%s" %filename, dtype = np.float32, usecols=(0,))

        # epochs = 200*n_variables
        if len(variables)<5000:
            print('WARNING: tripling epochs since len(variables)<5000...')
            epochs = epochs*3

        if n_variables==0 or n_variables==1:
            return 0

        else:
            for j in range(1,n_variables):
                v = np.loadtxt(pathdir+"%s" %filename, dtype = np.float32, usecols=(j,))
                variables = np.column_stack((variables,v))

        f_dependent = np.loadtxt(pathdir+"%s" %filename, dtype = np.float32, usecols=(n_variables,))
        f_dependent = np.reshape(f_dependent,(len(f_dependent),1))

        factors = torch.from_numpy(variables)
        if is_cuda:
            factors = factors.cuda()
        elif is_mps:
            factors = factors.to('mps')
        else:
            factors = factors
        factors = factors.float()

        product = torch.from_numpy(f_dependent)
        if is_cuda:
            product = product.cuda()
        elif is_mps:
            product = product.to('mps')
        else:
            product = product
        product = product.float()

        class SimpleNet(nn.Module):
            def __init__(self, ni):
                super().__init__()
                self.linear1 = nn.Linear(ni, 128)
                self.linear2 = nn.Linear(128, 128)
                self.linear3 = nn.Linear(128, 64)
                self.linear4 = nn.Linear(64,64)
                self.linear5 = nn.Linear(64,1)
            
            def forward(self, x):
                x = F.tanh(self.linear1(x))
                x = F.tanh(self.linear2(x))
                x = F.tanh(self.linear3(x))
                x = F.tanh(self.linear4(x))
                x = self.linear5(x)
                return x

        my_dataset = utils.TensorDataset(factors,product) # create your datset
        my_dataloader = utils.DataLoader(my_dataset, batch_size=bs, shuffle=False) # create your dataloader

        if is_cuda:
            model_feynman = SimpleNet(n_variables).cuda()
        else:
            model_feynman = SimpleNet(n_variables).to('mps')

        if pretrained_path!="":
            model_feynman.load_state_dict(torch.load(pretrained_path))

        check_es_loss = 10000

        for i_i in range(N_red_lr):
            optimizer_feynman = optim.Adam(model_feynman.parameters(), lr = lrs)
            for epoch in range(epochs):
                model_feynman.train()
                **_for i, data in enumerate(my_dataloader):_**
                    optimizer_feynman.zero_grad()
                    print(".")
                    if is_cuda:
                        fct = data[0].float().cuda()
                        prd = data[1].float().cuda()
                    elif is_mps:
                        fct = data[0].float().to('mps')
                        prd = data[1].float().to('mps')
                    else:
                        fct = data[0].float()
                        prd = data[1].float()
                    
                    loss = rmse_loss(model_feynman(fct),prd)
                    loss.backward()
                    optimizer_feynman.step()
                
                '''
                # Early stopping
                if epoch%20==0 and epoch>0:
                    if check_es_loss < loss:
                        break
                    else:
                        torch.save(model_feynman.state_dict(), "results/NN_trained_models/models/" + filename + ".h5")
                        check_es_loss = loss
                if epoch==0:
                    if check_es_loss < loss:
                        torch.save(model_feynman.state_dict(), "results/NN_trained_models/models/" + filename + ".h5")
                        check_es_loss = loss
                '''
                torch.save(model_feynman.state_dict(), "results/NN_trained_models/models/" + filename + ".h5")   
            lrs = lrs/10

        return model_feynman

    except NameError:
        print("Error in file: %s" %filename)
        raise

@itamarst
Copy link
Collaborator

Fil has the ability to profile only certain functions; see https://pythonspeed.com/fil/docs/api.html

If that doesn't answer your question, let me know and I'll try to give an example.

@dbl001
Copy link
Author

dbl001 commented Jun 13, 2022

Where does 'generate_report' come from?

...
print("Training a NN on the data... \n")
        result = profile(lambda: NN_train(pathdir,filename,NN_epochs), "/tmp/fil-result")
        generate_report(result)
Training a NN on the data... 


=fil-profile= Preparing to write to /tmp/fil-result
=fil-profile= Wrote flamegraph to "/tmp/fil-result/peak-memory.svg"
=fil-profile= Wrote flamegraph to "/tmp/fil-result/peak-memory-reversed.svg"
Traceback (most recent call last):
  File "/Users/davidlaxer/AI-Feynman/examples/example.py", line 3, in <module>
    run_aifeynman("../example_data/", "example1.txt", 30,
  File "/Users/davidlaxer/anaconda3/envs/AI-Feynman/lib/python3.9/site-packages/aifeynman/S_run_aifeynman.py", line 276, in run_aifeynman
    PA = run_AI_all(pathdir,filename+"_train",BF_try_time,BF_ops_file_type, polyfit_deg, NN_epochs, PA=PA)
  File "/Users/davidlaxer/anaconda3/envs/AI-Feynman/lib/python3.9/site-packages/aifeynman/S_run_aifeynman.py", line 77, in run_AI_all
    generate_report(result)
NameError: name 'generate_report' is not defined

(AI-Feynman) davidlaxer@x86_64-apple-darwin13 aifeynman % pip show filprofiler
Name: filprofiler
Version: 2022.5.0
Summary: A memory profiler for data batch processing applications.
Home-page: https://pythonspeed.com/fil/
Author: 
Author-email: 
License: Apache 2.0
Location: /Users/davidlaxer/anaconda3/envs/AI-Feynman/lib/python3.9/site-packages
Requires: threadpoolctl
Required-by: 

 % ls -l /tmp/fil-result
total 368
-rw-r--r--  1 davidlaxer  wheel   5090 Jun 13 12:08 index.html
-rw-r--r--  1 davidlaxer  wheel  59594 Jun 13 12:08 peak-memory-reversed.svg
-rw-r--r--  1 davidlaxer  wheel  76255 Jun 13 12:08 peak-memory.prof
-rw-r--r--  1 davidlaxer  wheel  39343 Jun 13 12:08 peak-memory.svg

Screen Shot 2022-06-13 at 12 45 32 PM

@itamarst
Copy link
Collaborator

That's just part of the example code, you can delete that line.

@itamarst
Copy link
Collaborator

And, you can use the resulting memory profile, it's still valid.

@dbl001
Copy link
Author

dbl001 commented Jun 13, 2022 via email

@itamarst
Copy link
Collaborator

itamarst commented Jun 13, 2022

It depends how it implements it. If memory is allocated at import time, it won't catch it if you're only profiling a function, and you'll want profile the whole program. If memory is allocated only during your function call, then Fil will (almost certainly) catch it, since Fil profiles low-level C memory allocation APIs like malloc().

There's also some caveats where a giant mmap() isn't really allocated until you start writing to it, but I don't have a good way to handle that right now (#308). For detailed discussion of this issue see https://pythonspeed.com/articles/measuring-memory-python/

@dbl001
Copy link
Author

dbl001 commented Jun 13, 2022 via email

@itamarst
Copy link
Collaborator

Fil shows peak memory. As such, if you have a leak, you'll have callstack X->Y->Z adding more and more memory over time, and showing up in the profile. So I would just look at the end report after all the iterations, and focus on the memory that is listed there, and see (a) what's allocating it, whcih Fil tells you (b) how it's used (c) why it might be sticking around, if you think it's a memory leak.

@itamarst
Copy link
Collaborator

Do you have any more questions, or shall I close this issue?

@dbl001
Copy link
Author

dbl001 commented Jun 19, 2022

I opened an issue with PyTorch. Awaiting their response:

pytorch/pytorch#79840

@dbl001
Copy link
Author

dbl001 commented Jul 18, 2022

This is resolved.

see

pytorch/pytorch#79840

@dbl001 dbl001 closed this as completed Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants