-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing data collection #992
Comments
In the log file, does it detect the open? If it's reasonably sized, you
might be able to post the generated log as well.
kevin
…On Sat, Jun 8, 2024 at 11:45 AM fakerst ***@***.***> wrote:
I'm using Darshan to collect data for the MLperf storage benchmark, but I
cannot collect the data for reading the dataset. I simply wrote a function
call to serially detect the operation of reading a dataset, and everything
was normal. I'm wondering if the function call is too deep or if there is a
conflict between dlio and darshan. The process of reading a dataset is to
use tf. data TFRecordDataset() reading the TFRecord file, I cannot collect
data for any IO operations on __ call__ (), such as reads in the image, but
I can collect data outside of this. Perhaps it's because __ call__ () is
called too frequently?
Note: I modified the source code of darshan during configure so that it
can collect the directory where my dataset is located
image.png (view on web)
<https://github.com/darshan-hpc/darshan/assets/119720309/7c6fde39-fd34-444c-b1bc-9cf4c5cd29b2>
—
Reply to this email directly, view it on GitHub
<#992>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEVK4J6MIX32YFS2JLRK723ZGMYK3AVCNFSM6AAAAABJADXTFSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DCNZTGUZTGOI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
2024-06-08-22-11-21.txt Thank you very much for your help! |
Looking at your log output, I guess this is the record of interest for you?
And the issue is that this particular file is read by your code, but apparently not included in the instrumentation in the log file (i.e., Without diving into your code or DLIO specifics, which I'm not sure I fully understand from your description above, maybe we can start with some simple things:
I'm mostly just speculating that some usage of |
I'm sure there are not other Darshan logs generated and I have tried the latest release (3.4.5). |
Hmm, yeah, I don't see any evidence of the By default, Darshan doesn't trace data in the |
I'm using Darshan to collect data for the MLperf storage benchmark, but I cannot collect the data for reading the dataset. I simply wrote a function call to serially detect the operation of reading a dataset, and everything was normal. I'm wondering if the function call is too deep or if there is a conflict between dlio and darshan. The process of reading a dataset is to use tf. data TFRecordDataset() reading the TFRecord file, I cannot collect data for any IO operations on __ call__ (), such as reads in the image, but I can collect data outside of this. Perhaps it's because __ call__ () is called too frequently?
Note: I modified the source code of darshan during configure so that it can collect the directory where my dataset is located
The text was updated successfully, but these errors were encountered: