Profiling PyTorch DataLoader Method #382

dbl001 · 2022-06-12T20:20:39Z

I ported some AI-Feynman code to use PyTorch's new backend - 'mps'.
Their 'vanilla' code runs fine my 'mps' code slows down after each iteration in NN_Train
and virtually stops at iteration 11, where the process grows to 48gb.
First, I'd like to profile just this line of code:

for i, data in enumerate(my_dataloader):

Second, the whole function NN_train.

Any suggestions how I just profile the invocation to 'my_dataloader'?

def NN_train(pathdir, filename, epochs=1000, lrs=1e-2, N_red_lr=4, pretrained_path=""):
    try:
        os.mkdir("results/NN_trained_models/")
    except:
        pass

    try:
        os.mkdir("results/NN_trained_models/models/")
    except:
        pass
    try:
        n_variables = np.loadtxt(pathdir+"%s" %filename, dtype='str').shape[1]-1
        variables = np.loadtxt(pathdir+"%s" %filename, dtype = np.float32, usecols=(0,))

        # epochs = 200*n_variables
        if len(variables)<5000:
            print('WARNING: tripling epochs since len(variables)<5000...')
            epochs = epochs*3

        if n_variables==0 or n_variables==1:
            return 0

        else:
            for j in range(1,n_variables):
                v = np.loadtxt(pathdir+"%s" %filename, dtype = np.float32, usecols=(j,))
                variables = np.column_stack((variables,v))

        f_dependent = np.loadtxt(pathdir+"%s" %filename, dtype = np.float32, usecols=(n_variables,))
        f_dependent = np.reshape(f_dependent,(len(f_dependent),1))

        factors = torch.from_numpy(variables)
        if is_cuda:
            factors = factors.cuda()
        elif is_mps:
            factors = factors.to('mps')
        else:
            factors = factors
        factors = factors.float()

        product = torch.from_numpy(f_dependent)
        if is_cuda:
            product = product.cuda()
        elif is_mps:
            product = product.to('mps')
        else:
            product = product
        product = product.float()

        class SimpleNet(nn.Module):
            def __init__(self, ni):
                super().__init__()
                self.linear1 = nn.Linear(ni, 128)
                self.linear2 = nn.Linear(128, 128)
                self.linear3 = nn.Linear(128, 64)
                self.linear4 = nn.Linear(64,64)
                self.linear5 = nn.Linear(64,1)
            
            def forward(self, x):
                x = F.tanh(self.linear1(x))
                x = F.tanh(self.linear2(x))
                x = F.tanh(self.linear3(x))
                x = F.tanh(self.linear4(x))
                x = self.linear5(x)
                return x

        my_dataset = utils.TensorDataset(factors,product) # create your datset
        my_dataloader = utils.DataLoader(my_dataset, batch_size=bs, shuffle=False) # create your dataloader

        if is_cuda:
            model_feynman = SimpleNet(n_variables).cuda()
        else:
            model_feynman = SimpleNet(n_variables).to('mps')

        if pretrained_path!="":
            model_feynman.load_state_dict(torch.load(pretrained_path))

        check_es_loss = 10000

        for i_i in range(N_red_lr):
            optimizer_feynman = optim.Adam(model_feynman.parameters(), lr = lrs)
            for epoch in range(epochs):
                model_feynman.train()
                **_for i, data in enumerate(my_dataloader):_**
                    optimizer_feynman.zero_grad()
                    print(".")
                    if is_cuda:
                        fct = data[0].float().cuda()
                        prd = data[1].float().cuda()
                    elif is_mps:
                        fct = data[0].float().to('mps')
                        prd = data[1].float().to('mps')
                    else:
                        fct = data[0].float()
                        prd = data[1].float()
                    
                    loss = rmse_loss(model_feynman(fct),prd)
                    loss.backward()
                    optimizer_feynman.step()
                
                '''
                # Early stopping
                if epoch%20==0 and epoch>0:
                    if check_es_loss < loss:
                        break
                    else:
                        torch.save(model_feynman.state_dict(), "results/NN_trained_models/models/" + filename + ".h5")
                        check_es_loss = loss
                if epoch==0:
                    if check_es_loss < loss:
                        torch.save(model_feynman.state_dict(), "results/NN_trained_models/models/" + filename + ".h5")
                        check_es_loss = loss
                '''
                torch.save(model_feynman.state_dict(), "results/NN_trained_models/models/" + filename + ".h5")   
            lrs = lrs/10

        return model_feynman

    except NameError:
        print("Error in file: %s" %filename)
        raise

The text was updated successfully, but these errors were encountered:

itamarst · 2022-06-12T21:22:39Z

Fil has the ability to profile only certain functions; see https://pythonspeed.com/fil/docs/api.html

If that doesn't answer your question, let me know and I'll try to give an example.

dbl001 · 2022-06-13T19:39:20Z

Where does 'generate_report' come from?

...
print("Training a NN on the data... \n")
        result = profile(lambda: NN_train(pathdir,filename,NN_epochs), "/tmp/fil-result")
        generate_report(result)
Training a NN on the data... 


=fil-profile= Preparing to write to /tmp/fil-result
=fil-profile= Wrote flamegraph to "/tmp/fil-result/peak-memory.svg"
=fil-profile= Wrote flamegraph to "/tmp/fil-result/peak-memory-reversed.svg"
Traceback (most recent call last):
  File "/Users/davidlaxer/AI-Feynman/examples/example.py", line 3, in <module>
    run_aifeynman("../example_data/", "example1.txt", 30,
  File "/Users/davidlaxer/anaconda3/envs/AI-Feynman/lib/python3.9/site-packages/aifeynman/S_run_aifeynman.py", line 276, in run_aifeynman
    PA = run_AI_all(pathdir,filename+"_train",BF_try_time,BF_ops_file_type, polyfit_deg, NN_epochs, PA=PA)
  File "/Users/davidlaxer/anaconda3/envs/AI-Feynman/lib/python3.9/site-packages/aifeynman/S_run_aifeynman.py", line 77, in run_AI_all
    generate_report(result)
NameError: name 'generate_report' is not defined

(AI-Feynman) davidlaxer@x86_64-apple-darwin13 aifeynman % pip show filprofiler
Name: filprofiler
Version: 2022.5.0
Summary: A memory profiler for data batch processing applications.
Home-page: https://pythonspeed.com/fil/
Author: 
Author-email: 
License: Apache 2.0
Location: /Users/davidlaxer/anaconda3/envs/AI-Feynman/lib/python3.9/site-packages
Requires: threadpoolctl
Required-by: 

 % ls -l /tmp/fil-result
total 368
-rw-r--r--  1 davidlaxer  wheel   5090 Jun 13 12:08 index.html
-rw-r--r--  1 davidlaxer  wheel  59594 Jun 13 12:08 peak-memory-reversed.svg
-rw-r--r--  1 davidlaxer  wheel  76255 Jun 13 12:08 peak-memory.prof
-rw-r--r--  1 davidlaxer  wheel  39343 Jun 13 12:08 peak-memory.svg

itamarst · 2022-06-13T20:39:09Z

That's just part of the example code, you can delete that line.

itamarst · 2022-06-13T20:39:59Z

And, you can use the resulting memory profile, it's still valid.

dbl001 · 2022-06-13T20:48:28Z

Will the test I ran on one Python module capture PyTorch internal memory usage, or do I need to profile the whole program?

…

On Jun 13, 2022, at 1:40 PM, Itamar Turner-Trauring ***@***.***> wrote: And, you can use the resulting memory profile, it's still valid. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

itamarst · 2022-06-13T21:28:08Z

It depends how it implements it. If memory is allocated at import time, it won't catch it if you're only profiling a function, and you'll want profile the whole program. If memory is allocated only during your function call, then Fil will (almost certainly) catch it, since Fil profiles low-level C memory allocation APIs like malloc().

There's also some caveats where a giant mmap() isn't really allocated until you start writing to it, but I don't have a good way to handle that right now (#308). For detailed discussion of this issue see https://pythonspeed.com/articles/measuring-memory-python/

dbl001 · 2022-06-13T22:41:04Z

Is there a way to see what memory was allocated but NOT freed after each invocation of the data loader’s ‘next’ iterator ( in the enumeration). I suspect Tensors are being allocated but not freed, and that they are cluttering up the computation graph … increasing memory and slowing computation.

…

On Jun 13, 2022, at 2:29 PM, Itamar Turner-Trauring ***@***.***> wrote: It depends how it implements it. If memory is allocated at import time, it won't catch it if you're only profiling a function, and you'll want profile the whole program. If memory is allocated only during your function call, then Fil will (almost certainly) catch it. There's also some caveats where a giant mmap() isn't really allocated until you start writing to it, but I don't have a good way to handle that right now (#308). For detailed discussion of this issue see https://pythonspeed.com/articles/measuring-memory-python/ — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

itamarst · 2022-06-13T23:01:46Z

Fil shows peak memory. As such, if you have a leak, you'll have callstack X->Y->Z adding more and more memory over time, and showing up in the profile. So I would just look at the end report after all the iterations, and focus on the memory that is listed there, and see (a) what's allocating it, whcih Fil tells you (b) how it's used (c) why it might be sticking around, if you think it's a memory leak.

itamarst · 2022-06-19T21:01:30Z

Do you have any more questions, or shall I close this issue?

dbl001 · 2022-06-19T21:13:11Z

I opened an issue with PyTorch. Awaiting their response:

pytorch/pytorch#79840

dbl001 · 2022-07-18T18:46:01Z

This is resolved.

see

pytorch/pytorch#79840

dbl001 closed this as completed Jul 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profiling PyTorch DataLoader Method #382

Profiling PyTorch DataLoader Method #382

dbl001 commented Jun 12, 2022

itamarst commented Jun 12, 2022

dbl001 commented Jun 13, 2022 •

edited

Loading

itamarst commented Jun 13, 2022

itamarst commented Jun 13, 2022

dbl001 commented Jun 13, 2022 via email

itamarst commented Jun 13, 2022 •

edited

Loading

dbl001 commented Jun 13, 2022 via email

itamarst commented Jun 13, 2022

itamarst commented Jun 19, 2022

dbl001 commented Jun 19, 2022 •

edited

Loading

dbl001 commented Jul 18, 2022

Profiling PyTorch DataLoader Method #382

Profiling PyTorch DataLoader Method #382

Comments

dbl001 commented Jun 12, 2022

itamarst commented Jun 12, 2022

dbl001 commented Jun 13, 2022 • edited Loading

itamarst commented Jun 13, 2022

itamarst commented Jun 13, 2022

dbl001 commented Jun 13, 2022 via email

itamarst commented Jun 13, 2022 • edited Loading

dbl001 commented Jun 13, 2022 via email

itamarst commented Jun 13, 2022

itamarst commented Jun 19, 2022

dbl001 commented Jun 19, 2022 • edited Loading

dbl001 commented Jul 18, 2022

dbl001 commented Jun 13, 2022 •

edited

Loading

itamarst commented Jun 13, 2022 •

edited

Loading

dbl001 commented Jun 19, 2022 •

edited

Loading