Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Python derived/accum interface #839

Merged
merged 20 commits into from
Feb 14, 2023

Conversation

tylerjereddy
Copy link
Collaborator

  • early draft of Python/CFFI interface to derived metrics/accumulators described in:
  • for now this definitely doesn't work, and feels like I'm basically reconstituting a C control flow in CFFI/Python instead of using a sensible exposure point between C and Python to pull out populated structs from a single entry point

  • perhaps folks can just help me sort out the current issues noted in the source changes rather than providing a convenient API, though once thorough regression tests are in place that might be something to consider in the future... (or even just maintaining it in pandas/Python someday if the performance is ~similar)

@carns
Copy link
Contributor

carns commented Oct 28, 2022

darshan_accumulator isn't supposed to be a visible type (so I wouldn't think you need to define it in api_def_c.py). It's intentionally a forward declaration in darshan-logutils.h just so we can use it as an opaque reference.

Does that change things/help to start with?

The darshan_derived_metrics struct (and anything that branches off of it) will be the crucial datatypes for the Python bindings, I would think. Happy to make some helpers if that particular struct is awkward for bindings, though.

@tylerjereddy
Copy link
Collaborator Author

tylerjereddy commented Oct 29, 2022

Does that change things/help to start with?

Not really, I think the exposed accumulation functions are just too easy to segfault right now. Granted, the segfault may happen from misuse, but it should be possible to make segfaults effectively impossible with enough struct/type inspection guards that return useful feedback.

For example, I isolated the segfault in the current version of this branch to this block of code in darshan_accumulator_inject():

110     if(!mod_logutils[acc->module_id]->log_agg_records ||
111        !mod_logutils[acc->module_id]->log_sizeof_record ||
112        !mod_logutils[acc->module_id]->log_record_metrics) {
113         /* this module doesn't support this operation */
114         return(-1);
115     }

Only one of those three conditions is needed to trigger a segfault. A quick look at the C code makes me wonder where mod_logutils even comes from... it isn't an argument to the function? darshan_accumulator_inject() is in a "public header" in darshan-logutils.h, so I'm a bit confused about how it can be safe to assume that mod_logutils is in scope if the function primary purpose is for consumption by an external Python API that will call the function in isolation?

@carns
Copy link
Contributor

carns commented Oct 30, 2022

The mod_logutils is a global array containing function pointers for each module's module-specific log parsing functions. That could probably be cleaned up a little as an orthogonal code quality improvement, but it's the same convention presently used by the rest of the logutils library and should work Ok.

As for going out of bounds on the array index there, we could put a guard on the module_id value (as is done at https://github.com/darshan-hpc/darshan/blob/main/darshan-util/darshan-logutils-accumulator.c#L56), but the reason there isn't a guard in this function is because the value is not being directly set by the user of the API. By the time you get to darshan_accumulator_inject() the function is relying on a field in the accumulator that should have already been initialized and sanity checked. This sounds like the accumulator struct is already corrupted going into this function? Can you tell me how to reproduce it?

I don't want to go overboard checking for rational values in that accumulator input struct because at the end of the day it's C and a user can pass in whatever invalid pointer they want and it will segfault anyway as soon as we try to dereference the struct.

@tylerjereddy
Copy link
Collaborator Author

This sounds like the accumulator struct is already corrupted going into this function? Can you tell me how to reproduce it?

The memory is either getting freed between the create and inject calls because of CFFI/language barrier details, or the fancy pointer stuff in create isn't actually assigning back to the memory at the CFFI interface to begin with (ownership with opacity may be tricky). Either way, assuming the API calls are done in the correct order and that things work perfectly from one API function to the next seems a bit fragile so I'm going to open a PR that adds some of the error handling I had to add anyway.

tylerjereddy added a commit to tylerjereddy/darshan that referenced this pull request Oct 30, 2022
* returning `-1` from a public API C function isn't sufficient
to provide useful error information when working in Python/CFFI--it
only tells you something went wrong if you check for a return
code in the first place (which isn't normally done in Python
anyway--normally you end execution at the error point and `exit()`
with an appropriate error message)

* when there are multiple conditions that can trigger the `-1`
return value, the situation is even worse, one literally has
to `printf` sprinkle the source to figure out what went wrong
where

* as a compromise, I'll leave the `-1` approach in since that
is quite common in standard `C`, but I'm going to add in
prints to `stderr` so that Python can then intercept the `-1`
and refer the user to `stderr`

* also, `darshan_accumulator_inject()` assumed that the `module_id`
was reasonable because it was set/checked in
`darshan_accumulator_create()`, however my experience in darshan-hpcgh-839
was that the accumulator memory location can get freed, or not
properly set at CFFI boundary after calling the creation function,
so I think the assumption that a previous function was called
and worked perfectly is too fragile--I'm adding error handling
to prevent a hard segfault on nonsense values of that structure
member as a result
@carns
Copy link
Contributor

carns commented Oct 31, 2022

This sounds like the accumulator struct is already corrupted going into this function? Can you tell me how to reproduce it?

The memory is either getting freed between the create and inject calls because of CFFI/language barrier details, or the fancy pointer stuff in create isn't actually assigning back to the memory at the CFFI interface to begin with (ownership with opacity may be tricky). Either way, assuming the API calls are done in the correct order and that things work perfectly from one API function to the next seems a bit fragile so I'm going to open a PR that adds some of the error handling I had to add anyway.

Re: the potential for memory to be freed, there isn't any memory that the caller should be aware of that it could plausibly free. For example, in this function prototype:

int darshan_accumulator_create(darshan_module_id id,
                               int64_t job_nprocs,
                               darshan_accumulator*   new_accumulator)

the darshan_accumulator is just an opaque type who's value is filled in by the function. That darshan_accumulator value is then subsequently used in other functions. Under the covers it technically does have memory associated with it, but that's managed internally and will be released automatically when darshan_accumulator_destroy() is called later. darshan_accumulator could just as easily be a 64 bit integer type, where create fills in the value of the integer and the value is then passed in to later functions. In fact, we can make it an integer type if that's helpful for clarity/translation in the Python bindings (this would be analogous to posix open()/read()/close() functions using an integer to refer to file descriptors). It's mainly a style/convention difference. Either way, the fact that the a darshan_accumulator value (whether the type is an integer or an opaque forward declaration) refers to memory that's been allocated internal to the darshan-util library is an implementation detail that the caller shouldn't make assumptions about.

I'll have a look at the error handling PR later today, I just wanted to clarify that point.

I don't know how Python bindings work really, but I could imagine that an integer type might possibly be easier to handle than an opaque type; if that's a change that would be useful then we can update that.

@tylerjereddy
Copy link
Collaborator Author

There may be some things that are surprising re: memory ownership from https://cffi.readthedocs.io/en/latest/using.html, though I'm not clear if that's actually biting us here.

Unlike C, the returned pointer object has ownership on the allocated memory: when this exact object is garbage-collected, then the memory is freed.

I figured exposing a single function that calls the create, inject, emit, destroy internally, keeping the memory handling/passage of implementation details/"private structs" completely concealed on the C side would reduce the likelihood for complications, though I'm not sure that's worth doing at this point.

It is almost certainly way easier for me to do this in Cython, because it will just produce a C file that passes things around in C like you expect them to be, but then I'd have to contend with a debate about introducing another dependency/technology.

The fix on the bindings side is probably some trick about first allocating a pointer, then a pointer to that pointer, or that kind of thing.. CFFI certainly wasn't my choice, but it is what I have to work with at the moment. I basically end up iterating with stuff like ffi.new(void **) vs. ffi.new(struct accumulator **) vs. ffi.new(struct accumulator *) and then trying to dereference with [0] or [0][0] or whatever--it is marginally better than guesswork sometimes.

@shanedsnyder
Copy link
Contributor

I took a closer look at this and found a couple of issues:

  1. We need to be careful that the types and function prototypes match exactly what's defined in the C library headers. E.g.,:
/* opaque accumulator reference */
struct darshan_accumulator_st;
typedef struct darshan_accumulator_st* darshan_accumulator;

/* Instantiate a stateful accumulator for a particular module type.
 */
int darshan_accumulator_create(darshan_module_id id,
                               int64_t job_nprocs,
                               darshan_accumulator*   new_accumulator);

I think the existing definitions in api_def_c.py incorrectly use struct darshan_accumulator *, which will probably lead to issues. We want to use darshan_accumulator *.

  1. We need to be sure to error check the return value from accumulator_create() before calling accumulator_inject() (or any other accumulator functions). Not all modules implement accumulators and thus create() returns an error -- trying to call inject on the returned pointer will lead to seg faults, also. I noticed this after resolving the issue in 1), but was still hitting segfaults, only to find that the seg faults only occurred for the Lustre module which does not implement an accumulator.

I've got to take off for now, but I can clean up and push code tomorrow that fixes 1.

@tylerjereddy
Copy link
Collaborator Author

tylerjereddy commented Oct 31, 2022

We need to be sure to error check the return value from accumulator_create() before calling accumulator_inject() (or any other accumulator functions). Not all modules implement accumulators and thus create() returns an error

Indeed, see: #840 -- I'd prefer to know not only that there was an error, but why, so I tried to make that clearer over there.

We need to be careful that the types and function prototypes match exactly what's defined in the C library headers.

I did intentionally mutate some of the prototypes to deal with other errors, for now. As I understand it I've even exposed a struct that isn't meant to be exposed, since this is still being debugged.

Also, I still think this is a pain point:

Why does it make sense to have the user of the API have to manage 4 functions instead of wrapping them in 1 where the internal handling of inter-function data passage is managed privately? i.e., --give me the derived metrics data for this module, rather than.. create, did create work? inject.. did inject work? emit.. did emit work? destory .. did destroy work?

For example, the early prototype of log_get_accumulator() in this PR aims to expose only the 1 approach -- "give me the derived metrics struct for a module", at least so far. Are there use cases for the more granular calls that we really care about, that can't be handled with a few extra function arguments?

Mock-up might be something like this:

int log_get_derived_metrics(fd log_file, char* mod_name, struct derived_metrics) {

   // the darshan_accumulator struct is effectively a private implementation detail
   // that is not part of the public interface of this function; we only ask the user
   // to provide a file handle for a log file, and a module name for which they wish
   // to retrieve the derived metrics struct--CFFI will own the memory of derived_metrics
   // struct--this C function is only allowed to assign to its structure members
   
   // call the create, inject, emit, and destroy accum functions as needed, passing whatever
   // private structures are needed between them without bothering the consumer of the
   // interface with these details; if a failure occurs, try to return a non-zero exit code, preferably
   // one that is unique to the failure type so Python can make sensible decisions based on it, and
   // also try to write to `stderr` when this happens


}

You may need to add argument for the number of records you wish to accumulate if there are compelling use cases for not using them all, etc.

I guess my contention here is that if we need to be doing a bunch of shimming and error checking, we may as well just do it in C and expose the thing we actually want to call. Popping in and out of the CFFI language boundary isn't very fluid, and IMO should be restricted to just the most crucial calls that give us what we want.

@shanedsnyder
Copy link
Contributor

Why does it make sense to have the user of the API have to manage 4 functions instead of wrapping them in 1 where the internal handling of inter-function data passage is managed privately? i.e., --give me the derived metrics data for this module, rather than.. create, did create work? inject.. did inject work? emit.. did emit work? destory .. did destroy work?

For example, the early prototype of log_get_accumulator() in this PR aims to expose only the 1 approach -- "give me the derived metrics struct for a module", at least so far. Are there use cases for the more granular calls that we really care about, that can't be handled with a few extra function arguments?

I think the main driver for this decision is that this accumulator API is envisioned for more use cases than just "accumulate every record in this module". Otherwise, I agree your proposed changes make sense for that workflow. You could imagine users wanting to accumulate metrics for different reasons, e.g.,:

  • accumulate derived metrics for all files on a particular file system (i.e. matching a specific prefix like /gpfs/ or something like that)
  • accumulate an aggregate record for per-process records referring to the same file, which be useful in a couple of instances:
    • user uses DARSHAN_DISABLE_SHARED_REDUCTION at runtime
    • app opens files on a subset of processes, but not all

I believe the darshan-parser utility is at least relying on that 2nd item I mention about accumulating details on partially-shared files, so it's not a contrived example or anything.

So, the answer is really just that the API allows for a lot of flexibility right now.

I do agree that we could hide a lot of the back-and-forth in a helper function to simplify the "give me everything for this module" approach, especially since we'll be using that in the job summary tool. We should probably still offer bindings for all functions though, in case PyDarshan users want more control over what records to accumulate on.

@shanedsnyder
Copy link
Contributor

I just pushed a commit that appears to correctly create an accumulator and inject into it, using the opaque types Darshan defines in darshan-logutils.h. You still ultimately get a seg fault for the existing tests, since there is no error handling currently to limit to modules that actually implement an accumulator, and the log under test includes the Lustre module (which does not implement an accumulator).

Not sure what the cleanest way to handle in PyDarshan, but maybe rather than maintaining a separate list of supported modules, we just catch any error and gracefully skip the module for now? I agree that ideally darshan-util library would have actual return codes that PyDarshan could try and catch, but that's a more involved change -- I think we can get by just by catching general errors for now.

@carns
Copy link
Contributor

carns commented Nov 1, 2022

Thanks @shanedsnyder .

Re: the complexity of the accumulator API, that's right that the intent was to be able to run it on more granular subsets than the entire module (and darshan-parser already uses that functionality). We could make a helper function to hide this for sure, but there is a potential performance downside to hiding the individual records; it would require an additional pass over the log file. The way it's set up right now, you can read a record and do whatever you would like with it in addition to passing it through an accumulator (and the accumulator calculation overhead is small). The darshan-parser utility also takes advantage of that; no matter which options you select it only does one linear pass through a log file.

If the Python tools can do something similar (use a single pass both to retrieve individual records and to do derived metric accumulation) then we might want to keep the same set up. If the Python tools are going to go back to the log separately for the derived metrics anyway then we can just wrap it in a helper.

@tylerjereddy tylerjereddy force-pushed the treddy_derived_metrics_1 branch from a6cadf4 to ae5b0c1 Compare November 4, 2022 18:59
@tylerjereddy
Copy link
Collaborator Author

tylerjereddy commented Nov 8, 2022

In case it helps, with this diff I get a bit farther, past the POSIX and to MPI-IO:

--- a/darshan-util/pydarshan/darshan/backend/cffi_backend.py
+++ b/darshan-util/pydarshan/darshan/backend/cffi_backend.py
@@ -705,9 +705,10 @@ def log_get_derived_metrics(log_path: str, mod_name: str):
     print("after inject")
     darshan_derived_metrics = ffi.new("struct darshan_derived_metrics *")
     print("before emit")
+    buf_agg = ffi.new("char[81920]")
     r = libdutil.darshan_accumulator_emit(darshan_accumulator[0],
                                           darshan_derived_metrics,
-                                          rbuf[0])
+                                          buf_agg)
     if r != 0:
         raise RuntimeError("A nonzero exit code was received from "
                            "darshan_accumulator_emit() at the C level. "
@@ -716,4 +717,7 @@ def log_get_derived_metrics(log_path: str, mod_name: str):
                            "stream.")
     print("after emit")
     #libdutil.darshan_accumulator_destroy(darshan_accumulator)
+    print("darshan_derived_metrics:", darshan_derived_metrics)
+    print("darshan_derived_metrics.total_bytes:", darshan_derived_metrics.total_bytes)
+    print("darshan_derived_metrics.unique_io_total_time_by_slowest:", darshan_derived_metrics.unique_io_total_time_by_slowest)
tests/test_cffi_misc.py testing mod_name: POSIX
before create
after create
before inject
after inject
before emit
after emit
darshan_derived_metrics: <cdata 'struct darshan_derived_metrics *' owning 128 bytes>
darshan_derived_metrics.total_bytes: 0
darshan_derived_metrics.unique_io_total_time_by_slowest: 0.00012302398681640625
testing mod_name: MPI-IO
before create
after create
before inject
after inject
before emit
after emit
darshan_derived_metrics: <cdata 'struct darshan_derived_metrics *' owning 128 bytes>
darshan_derived_metrics.total_bytes: 25098793816
darshan_derived_metrics.unique_io_total_time_by_slowest: 0.0
double free or corruption (!prev)
Fatal Python error: Aborted

tylerjereddy added a commit to tylerjereddy/darshan that referenced this pull request Nov 9, 2022
* this is a Python/pandas-only version of doing
some simple derived metrics; I don't think we'll
actually do this, but I was exploring a bit because
of the difficulties in darshan-hpcgh-839

* this matches pretty well with the `perl` based reports
for total bytes, but even simple cases can sometimes
disagree on bandwidth per darshan-hpcgh-847, so now I'm curious
what is going on

* one problem with doing this is that we'd have the same
algorithms implemented in two different languages; the advantages
include:
- not reading all the records in a second time, crossing the CFFI
boundary each time
- easier to debug/maintain because bounds checking/no segfaults, etc.
@tylerjereddy
Copy link
Collaborator Author

I may test the waters with moving the "wrapper" to C itself, a bit like copying/modifying what is done in darshan-parser.c and calling into just that wrapper from CFFI. That would still leave the other public prototypes exposed of course.

A Python-only approach is in #848, but I don't think we'll rewrite the algorithms beyond the prototype there at this point, though it does have potential advantages someday in the future perhaps.

tylerjereddy added a commit to tylerjereddy/darshan that referenced this pull request Nov 9, 2022
* this is a Python/pandas-only version of doing
some simple derived metrics; I don't think we'll
actually do this, but I was exploring a bit because
of the difficulties in darshan-hpcgh-839

* this matches pretty well with the `perl` based reports
for total bytes, but even simple cases can sometimes
disagree on bandwidth per darshan-hpcgh-847, so now I'm curious
what is going on

* one problem with doing this is that we'd have the same
algorithms implemented in two different languages; the advantages
include:
- not reading all the records in a second time, crossing the CFFI
boundary each time
- easier to debug/maintain because bounds checking/no segfaults, etc.
@carns
Copy link
Contributor

carns commented Nov 10, 2022

I pushed some fixes just now that have the test working for me:

  • added missing array of structs within the derived metric struct (this was the primary problem, it caused the struct memory allocated in cffi to be too small, and then the c library overflowed trying to memset it)
  • quieted a warning related to the module_id enum
  • added some explicit free wrapper calls (including I think a missing error handling path in another function); I just followed the convention I saw elsewhere in here and used it in cases where ** pointers were allocated.

C memory leak checkers report a lot of leaks running this test, but its probably false positives because it doesn't understand Python memory management.

@tylerjereddy
Copy link
Collaborator Author

I started expanding the code and testing to compare against the strings from the old perl report, but it looks like only a subset of modules/cases currently work to reproduce the numbers in the old report--you can see the xfail marks in the test that is added in this PR.

My initial question is whether I should be able to easily reproduce the strings like below using only the data present in the derived metrics struct I'm getting back in all cases/for all reports and modules?

image

Even for some of the simple cases that do reproduce here, I'm having to go back and use Python/pandas to do arithmetic on the report record data, which brings me back to the concerns raised in the Python version of this PR (gh-848)--if I can't cherry-pick most of what I need from the derived structs, and still have to encode some complex logic on what is in the records--what am I gaining through the structs? At the moment all I get is the total bytes, which is trivial to extract it seems.

Hopefully I'm just missing something, and all the values I want, like the bandwidth in MiB/s can be directly cherry-picked from the structs, or at least reconstituted with some simple math without complex logic on a per-module basis, or needing switches for shared and unique IO in my math code.

@carns
Copy link
Contributor

carns commented Nov 18, 2022

Hi @tylerjereddy (sorry for the delay, was busy at SC22) you should be able to use this single value directly:

https://github.com/darshan-hpc/darshan/blob/main/darshan-util/darshan-logutils.h#L367

(agg_perf_by_slowest)

The python logic in that log_get_bytes_bandwidth() routine looks more like an average rate per process. The accumulator API does something a little different, in that it considers each file separately, and for each one it chooses the elapsed time for that file to be the time taken by the slowest rank (if multiple ranks accessed it). This tends to be more accurate representation of what the application observes as the throughput, especially for bulk synchronous/collective patterns. In those cases what really matters is the time it took for the app as a whole to finish accessing the file even if IO was delegated to a subset of processes or there were stragglers. It gets a little complicated doing this generally to account for fully shared files (which each have a single aggregated record), partially shared files (which have many records), or unique files.

It should match the perl report exactly because that report is relying on the same C code, by way of invoking the darshan-parser utility as a shell command.

(and we should continue to make sure to mark the value as an "estimate", because the true bandwidth is tricky to represent for some applications, particularly if they are doing truly uncoordinated I/O)

@tylerjereddy tylerjereddy force-pushed the treddy_derived_metrics_1 branch from fbb726e to 3714795 Compare November 21, 2022 00:13
@tylerjereddy
Copy link
Collaborator Author

Ok, this has some substantial updates now. I've added comments to the branch for sections where discussion may still be needed. I'll try to highlight a few points below as well:

  • I think we decided that if we get 0 exit codes then we include a bandwidth string in the appropriate section of the report--note that this differs from the Perl report where POSIX and MPI-IO are often not reported in the same PDF--for example, with imbalanced-io.darshan this branch will do what is below for the Python summary report, where MPI-IO, STDIO, AND POSIX all get per-section summary strings. I suspect there may be some desire in the Perl report to avoid reporting the same data twice, and this is also probably related to the special MPI-IO shims I had to add to log_get_bytes_bandwidth -- from my perspective, it would be much easier if the struct just gave me what I should report for a given module rather than needing to shim around like that. Of course, then you'd need meta-logic/error-handling around what the accumulator interface does I think (i.e., your report contains POSIX and MPI-IO so I'm going to do something quite different now...).

image

image

image

  • should the new text be red? possibly not--seems to be default the way I injected the string at the moment, which is itself pretty awkward with a string masquerading as a ReportFigure

  • what's going on with APMPI and other add-on modules? there's an xfail test case for that with e3sm_io_heatmap_only.darshan; we discussed avoiding filters/ban-lists for modules on the Python
    side, so how would you like that handled then? Is the absence of i.e., APMPI from _structdefs a bug
    in our control flow here or?

@carns
Copy link
Contributor

carns commented Nov 21, 2022

Thanks @tylerjereddy.

On the first point (re: shims and posix/mpiio bytes descrepancy), let's keep it simple, remove the special-case handling, and have each module report its own statistics (if present) in all cases. I'm not sure why the old report would have been reporting the posix bytes in the mpi-io summary; I'm guessing that was a mistake in the old code. The io-imbalance.darshan log definitely shows different volumes at the two levels, perhaps because MPI-IO elided some overlapping bytes or because one module hit the memory limit at a different time from the other. At any rate, that would be the correct thing to report from an analysis tools perspective.

On the second point (re: text color) I agree. Might be nice to have it stand out a little since it is a key piece of information, maybe blue or something, but red has a bad connotation and we aren't necessarily reporting bad information here :)

I don't know about the 3rd point (APMPI), but I wanted to go ahead and respond about the other two. Is this issue particular to the accumulator / derived metric stuff? Or a more general question?

@shanedsnyder
Copy link
Contributor

On the first point (re: shims and posix/mpiio bytes descrepancy), let's keep it simple, remove the special-case handling, and have each module report its own statistics (if present) in all cases. I'm not sure why the old report would have been reporting the posix bytes in the mpi-io summary; I'm guessing that was a mistake in the old code. The io-imbalance.darshan log definitely shows different volumes at the two levels, perhaps because MPI-IO elided some overlapping bytes or because one module hit the memory limit at a different time from the other. At any rate, that would be the correct thing to report from an analysis tools perspective.

The old Perl reports only show POSIX estimates if there is no MPI-IO data in the log, so, it shouldn't ever report estimates for both MPI-IO and POSIX for the same log. I think the original reasoning was that apps that use MPI-IO might naturally just prefer MPI-IO level metrics since they are closer to what is observed by the app. I agree it's confusing in hindsight -- no reason we shouldn't just provide estimates at all layers.

What's up with the reported byte values in the most recent report screenshots? I had to double check the log to confirm, but Phil is right that total bytes moved are different for MPI-IO and POSIX for imbalanced-io.darshan in this test case, but they have the same values in the new report. The POSIX value is correct, but think we want 126326.82 MiB for MPI-IO. I also checked output from the old Perl tool and see that the value is misreported exactly the same way there, too... So, maybe we have a bug here or maybe we are just trying to replicate the faulty logic in Perl code, but probably should fix that here? I'm using output from darshan-parser --show-incomplete --perf on this log as ground truth (snipped below):

# *******************************************************
# POSIX module data
# *******************************************************

# *WARNING*: The POSIX module contains incomplete data!
#            This happens when a module runs out of
#            memory to store new record data.

# To avoid this error, consult the darshan-runtime
# documentation and consider setting the
# DARSHAN_EXCLUDE_DIRS environment variable to prevent
# Darshan from instrumenting unecessary files.

# performance
# -----------
# total_bytes: 106730099902

# *******************************************************
# MPI-IO module data
# *******************************************************

# performance
# -----------
# total_bytes: 132463273244

@tylerjereddy
Copy link
Collaborator Author

Yes, I treated the Perl report as the Oracle of all truth and added some spaghetti to do what it did--sounds like you and Phil are giving me the ok to remove the shims and just report by module, which will give you that difference.

@shanedsnyder
Copy link
Contributor

* what's going on with `APMPI` and other add-on modules? there's an `xfail` test case for that with `e3sm_io_heatmap_only.darshan`; we discussed avoiding filters/ban-lists for modules on the Python
  side, so how would you like that handled then? Is the absence of i.e., `APMPI` from `_structdefs` a bug
  in our control flow here or?

Yeah, ideally the accumulator API just knows that AutoPerf modules aren't supported and it returns 0, as well, but I guess that's not the case? There is some AutoPerf data in the _structdefs and we seem to be able to handle it gracefully in other tests (though we aren't reporting on any of it), AFAICT? In any case, I wouldn't consider this to be an expected failure.

* a number of additional `MPI-IO` and `STDIO` test cases
were added from the logs repo to `test_derived_metrics_bytes_and_bandwidth()`

* for the `MPI-IO` cases to pass, special casing was added
to `log_get_bytes_bandwidth()` such that `total_bytes` is
actually extracted from `POSIX`
* removed an invalid `darshan_free()` from `log_get_derived_metrics()`--
the `buf` object didn't even exist at that point in the control flow

* add a `LUSTRE` test case, which raises a `RuntimeError` as expected

* add a tentatie `POSIX` test case, which reports a bandwidth string
at the Python level, but is not included in the Perl summary reports...
* when `log_get_derived_metrics()` receives a
module name that doesn't exist in the log
file it received, it will now raise a `ValueError`
for clarity of feedback

* update `test_derived_metrics_bytes_and_bandwidth()`
accordingly, and also start regex matching on expected
error messages in this test
* add the bandwidth summary string to the Python
report proper, and include a test for the presence
of this string in logs repo-based summary reports
* add one of the tricky `APMPI` cases I discovered
to `test_derived_metrics_bytes_and_bandwidth()`, pending
discussion with team re: how I should handle this
* adjust tests to more closely match `darshan-parser` instead
of the old Perl report in cases where MPI-IO and POSIX are
both involved; this allows me to remove the weird MPI-IO
shim in `log_get_bytes_bandwidth()`
* the bandwidth text in the Python summary report is now
colored "blue," along with a regression test, based on
reviewer feedback

* added `skew-app.darshan` log to
`test_derived_metrics_bytes_and_bandwidth()`--we get the same
results as `darshan-parser`

* replaced the `xfail` for `e3sm_io_heatmap_only.darshan` with
an expected `KeyError` when handling `APMPI` (this should already
be handled gracefully/ignored by the Python summary report)
* the testsuite now always uses `DarshanReport` with a context
manager to avoid shenanigans with `__del__` and garbage
collection/`pytest`/multiple threads

* this appears to fix the problem with testsuite hangs
described in darshan-hpcgh-839 and darshan-hpcgh-851
@tylerjereddy tylerjereddy force-pushed the treddy_derived_metrics_1 branch from 2419408 to b89884f Compare December 16, 2022 18:06
* `cffi_backend` module changes requested from PR review
  - remove a spurious `darshan_free` from `_log_get_heatmap_record()`
  - fix the scoping of the `darshan_free` of `buf` object used with
    `darshan_accumulator_inject` in `log_get_derived_metrics`
  - adding a missing `log_close()` to `log_get_derived_metrics` (maybe
    we can wrap in Python contexts in the future though)
  - use a separate buffer for `darshan_accumulator_emit()` inside
    `log_get_derived_metrics`

* note that making the above CFFI/free-related changes caused
a segfault in the testuite, so in the end I adjusted the location
of the memory freeing as I saw fit to avoid segfaults--I'd say at this
point please provide concrete evidence with a memory leak plot or
failing test for additional adjustments there, or just push the change
in

* in the end, there is a slightly more concise usage of `darshan_free()`
but no meaningful change in the free operations

* I also reverted the suggested changed to `darshan_accumulator_emit()`
usage--there was no testable evidence of an issue, and it was also
causing segfaults..

* address many of the discussion points that came up in darshan-hpcgh-868:
  - `log_get_derived_metrics()` now uses an LRU cache, which effectively
    means that we use memoization to return derived metrics data
    rather than doing another pass over the log file if the same
    log path and module name have already been accumulated from; we
    still need to pass over a given log twice in most cases--once at
    initial read-in and once for using `log_get_derived_metrics`; how
    we decide to add filtering of records prior to accumulation
    interface in Python is probably a deeper discussion/for later
  - `log_get_bytes_bandwidth()` and its associated testing have been
     migrated to modules not named after "CFFI", like the in the above
     PR, because I think we should only use the "CFFI" named modules
     for direct CFFI interaction/testing, and for other analyses we
     should probably use more distinct names. Also, to some extent
     everything depends on the CFFI layer, so trying to restrict
     "CFFI" modules to direct rather than direct interaction will
     help keep them manageably sized, especially given the proclivity
     for surprising memory issues/segfaults in those parts of the code.
  - add a proper docstring with examples for `log_get_bytes_bandwidth()`
@tylerjereddy tylerjereddy force-pushed the treddy_derived_metrics_1 branch from b89884f to 96806fa Compare December 16, 2022 18:13
@tylerjereddy
Copy link
Collaborator Author

tylerjereddy commented Dec 16, 2022

Revisions pushed in, from commit message:

  • cffi_backend module changes requested from PR review

    • remove a spurious darshan_free from _log_get_heatmap_record()
    • fix the scoping of the darshan_free of buf object used with
      darshan_accumulator_inject in log_get_derived_metrics
    • adding a missing log_close() to log_get_derived_metrics (maybe
      we can wrap in Python contexts in the future though)
    • use a separate buffer for darshan_accumulator_emit() inside
      log_get_derived_metrics
  • note that making the above CFFI/free-related changes caused
    a segfault in the testuite, so in the end I adjusted the location
    of the memory freeing as I saw fit to avoid segfaults--I'd say at this
    point please provide concrete evidence with a memory leak plot or
    failing test for additional adjustments there, or just push the change
    in

  • in the end, there is a slightly more concise usage of darshan_free()
    but no meaningful change in the free operations

  • I also reverted the suggested change to darshan_accumulator_emit()
    usage--there was no testable evidence of an issue, and it was also
    causing segfaults..

  • address many of the discussion points that came up in ENH: file count summary #868:

    • log_get_derived_metrics() now uses an LRU cache, which effectively
      means that we use memoization to return derived metrics data
      rather than doing another pass over the log file if the same
      log path and module name have already been accumulated from; we
      still need to pass over a given log twice in most cases--once at
      initial read-in and once for using log_get_derived_metrics; how
      we decide to add filtering of records prior to accumulation
      interface in Python is probably a deeper discussion/for later
    • log_get_bytes_bandwidth() and its associated testing have been
      migrated to modules not named after "CFFI", like in the above
      PR, because I think we should only use the "CFFI" named modules
      for direct CFFI interaction/testing, and for other analyses we
      should probably use more distinct names. Also, to some extent
      everything depends on the CFFI layer, so trying to restrict
      "CFFI" modules to direct rather than indirect interaction will
      help keep them manageably sized, especially given the proclivity
      for surprising memory issues/segfaults in those parts of the code.
    • add a proper docstring with examples for log_get_bytes_bandwidth()

Memory pressure check with memory_profiler:

from darshan.log_utils import get_log_path
from darshan.lib.accum import log_get_bytes_bandwidth

for i in range(10_000_000):
    log_path = get_log_path("imbalanced-io.darshan")
    log_get_bytes_bandwidth(log_path, "POSIX")

Screen Shot 2022-12-16 at 11 08 28 AM

@tylerjereddy tylerjereddy changed the title WIP: Python derived/accum interface Python derived/accum interface Dec 16, 2022
@tylerjereddy tylerjereddy changed the title Python derived/accum interface ENH: Python derived/accum interface Dec 16, 2022
@tylerjereddy tylerjereddy added the enhancement New feature or request label Dec 16, 2022
@shanedsnyder
Copy link
Contributor

Changes look good.

My bad on the wild goose chase on pointers/segfaults. I'm not sure exactly how the code changed from previous commits, but I think what I see there now makes sense. I see that you are allocating the initial record pointer ahead of the loop, but what I may have missed is that the C library will only allocate the memory on the very first call to get_record(), so subsequent calls can just keep reusing this memory without need for allocation/free every iteration. Then it gets freed after the loop finishes as you have here.

I'll have another look at #868 before approving/merging this just to be safe.

@shanedsnyder
Copy link
Contributor

I just did a fresh review of this, and I think I'm mostly good with it.

I noticed one issue in that we don't ever call accumulator_destroy(), so don't think we're freeing up all memory allocated by the C library. I did confirm no obvious memory leak using same test from Tyler above. The reason for that is actually the LRU cache put in for this particular function -- that should prevent continued calls into the C library and cap the memory leak to a single accumulator, I think. I tested this theory by removing that cache, and seems to confirm:
Figure_1

In any case, it's probably worth adding a quick fix for that. I'll try to push something momentarily.

shanedsnyder added a commit to tylerjereddy/darshan that referenced this pull request Jan 27, 2023
* move `log_get_bytes_bandwidth()` out of CFFI module to
  darshan/lib/accum.py
* adopt LRU cache and other cleanup tweaks for
  log_get_derived_metrics()
shanedsnyder
shanedsnyder previously approved these changes Jan 27, 2023
@shanedsnyder
Copy link
Contributor

I just approved, but I'll wait to merge until double checking some things in #868 .

[skip actions]
shanedsnyder
shanedsnyder previously approved these changes Jan 27, 2023
@shanedsnyder
Copy link
Contributor

Change of plans. I just merged #886 into main, and so wanted to merge main back into this branch and refactor changes on top of it.

@shanedsnyder shanedsnyder merged commit 6f5f716 into darshan-hpc:main Feb 14, 2023
@shanedsnyder
Copy link
Contributor

We replaced this with #898 and merged into main, so closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pydarshan
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants