-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make IterToMap
loading more lazily
#454
Comments
IterToMap
can be more lazilyIterToMap
loading more lazily
This would be very much appreciated for my use-case. I also would like to ask a related question: I am using I found, somewhat to my surprise, that 3) gives the fastest training speed (in terms of number of samples per second) compared to 1) and even 2). In fact, 3) could be 2-4 times faster than 1). However, 3) is very memory intensive and simply infeasible even for a relatively small dataset (100k samples), since we need to load the whole I also found that 2) is slower than 3), but still faster than 1). The main problem with 2) is that handling errors during tensor generation in Preferably, I would have expected that 1) should be similarly as fast as 3), perhaps just a little slower, e.g. maybe collating an Iterator into batches may need some extra overhead. Please help me understand if I am missing anything here. I have tried Also, should I open a new Issue for this? |
@linminhtoo Thanks for providing feedback on your use case. In terms of speed, could you please also share how you measure the speed? I think the overall time, 3 should be the same as 1. If we are talking about the speed per iteration, the (3) should be faster as all the loading logics happen at the beginning of the epoch, but (1) is loading data lazily per iteration. |
Thanks for your very prompt reply. Before the last step, the samples only contain the identity and metadata, such as paths. I'm in the cheminformatics space where we generate various features for a given molecule or a protein. We may, for example, generate morgan fingerprints of dimension 2048, or a large 3d surface mesh of the protein. These can be both computationally and memory expensive. I define a new class myself because there are many different features we want to generate for each sample. As for the timing, I'm simply measuring the time taken for an epoch of training a very simple model (such that the bottleneck is more likely to be shifted to the dataloading) over about 100k samples. 3) can be as fast as 30 seconds for the 1st epoch, and even faster at 20 seconds from the 2nd epoch onwards. But it almost maxes my 32 gb RAM and is unscalable. For 1, it keeps my RAM nice and low, but could take 2 to 3 mins per epoch. I realized I have not tried 2 on 100k samples (on a smaller set of 3k samples, it was faster than 1), but I can double check that. I may be misunderstanding something. If I define the computationally expensive step as 1 or 2, there should also be little difference in speed and memory right? 3 is not really usable in practice, but its sheer speedup relative to 1 is a little concerning. |
Hi, I have run 2) on the larger 100k samples dataset and I discovered the root cause why it is faster than 1). For 2):
For 1):
As you can see, clearly, To be clear, I am doing exactly the same last step (the expensive step) in both cases, just the difference of whether it's a For sharding the
For 2), as
|
Sorry for another comment. I had a weird intuition to modify the order of operations for 1).
to
Now,
For 1):
By placing the |
It is definitely the right way to place shuffle and sharding as early as possible to get rid of redundant operations. We should add a few words on our tutorial https://pytorch.org/data/beta/tutorial.html#working-with-dataloader
When you measure the total time per epoch, do you include the time creating iterator of |
I went back to my code to do more extensive and more rigorous speed benchmarks. As I realized the thread is geting a little long, I decided to create a new Issue and link to this one. |
Summary: See the feedback from a user: #454 (comment) We should explicitly ask users to place `sharding_filter` as early as possible. Pull Request resolved: #487 Reviewed By: wenleix Differential Revision: D36812259 Pulled By: ejguan fbshipit-source-id: 4c983f3216a80be398f85b20871e65b0e41627e0
Summary: See the feedback from a user: pytorch#454 (comment) We should explicitly ask users to place `sharding_filter` as early as possible. Pull Request resolved: pytorch#487 Reviewed By: wenleix Differential Revision: D36812259 Pulled By: ejguan fbshipit-source-id: 4c983f3216a80be398f85b20871e65b0e41627e0
Summary: See the feedback from a user: #454 (comment) We should explicitly ask users to place `sharding_filter` as early as possible. Pull Request resolved: #487 Reviewed By: wenleix Differential Revision: D36812259 Pulled By: ejguan fbshipit-source-id: 4c983f3216a80be398f85b20871e65b0e41627e0
Summary: See the feedback from a user: pytorch#454 (comment) We should explicitly ask users to place `sharding_filter` as early as possible. Pull Request resolved: pytorch#487 Reviewed By: wenleix Differential Revision: D36812259 Pulled By: ejguan fbshipit-source-id: 4c983f3216a80be398f85b20871e65b0e41627e0
🚀 The feature
Currently,
IterToMap
starts to load all data from priorIterDataPipe
when the first__getitem__
is invoked here.data/torchdata/datapipes/iter/util/converter.py
Line 78 in 13b574c
We can stop loading data from prior
IterDataPipe
whenever we find the requested index. And, we might need to add a flag to prevent loading data multiple times.Motivation, pitch
This would improve the performance if users simply iterate over the
MapDataPipe
as we don't need to pre-load everything at the beginning of the iteration, basically, simulating the behavior ofIterDataPipe
.Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: