Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does NetCDF in Trilinos now require HDF5? #11426

Closed
ikalash opened this issue Dec 23, 2022 · 21 comments
Closed

Does NetCDF in Trilinos now require HDF5? #11426

ikalash opened this issue Dec 23, 2022 · 21 comments
Assignees
Labels
client: Albany Issue impacting the Albany project client: LCM Solid mechanics fork of Albany CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. type: bug The primary issue is a bug in Trilinos code or tests

Comments

@ikalash
Copy link
Contributor

ikalash commented Dec 23, 2022

A bunch of our Albany nightlies began failing last night, b/c it appears NetCDF requires HDF5 now in Trilinos. Could someone please confirm that that's the case?

CMake Error at cmake/tribits/core/package_arch/TribitsAdjustPackageEnables.cmake:952 (message):
   ***
   *** ERROR: Setting TPL_ENABLE_Netcdf=OFF which was 'ON' because Netcdf has a required library dependence on disabled package HDF5!
   ***

https://sems-cdash-son.sandia.gov/cdash/build/43480/configure

Tagging @lxmota who would be interested also in this issue.

@ikalash ikalash added type: bug The primary issue is a bug in Trilinos code or tests client: Albany Issue impacting the Albany project client: LCM Solid mechanics fork of Albany labels Dec 23, 2022
@spdomin
Copy link
Contributor

spdomin commented Dec 23, 2022

I am not sure if this is related, however our Nalu builds all started to fail due to the use of this apparently now illegal config command:

-DTrilinos_ASSERT_MISSING_PACKAGES=OFF \

Perhaps if this check is no longer occurring, some sort of build process would allow a build with the missing package and then fail at run time.

I believe an app code would want to be notified of a missing package, however, perhaps that is now default.

I need to check my records on Trilinos breaks during the Holiday break:)

@ikalash
Copy link
Contributor Author

ikalash commented Dec 23, 2022

@spdomin : the nightly builds for Albany that did not fail with the NetCDF/HDF5 error failed with the error you are referring to:

https://sems-cdash-son.sandia.gov/cdash/build/43456/configure

Applying the fix suggested in the error message resolved the problem.

FWIW, I'm not sure pushing big changes to Trilinos that will break everything the day before the shutdown is the best idea in general...

@cgcgcg
Copy link
Contributor

cgcgcg commented Dec 23, 2022

@bartlettroscoe Could the recent TriBITS changes have caused this?

@ikalash
Copy link
Contributor Author

ikalash commented Dec 23, 2022

As expected, it appears turning on HDF5 in the build fixes the problem. I'm OK with this, but would be curious to know why HDF5 is now required for NetCDF in Trilinos. That was not the case before.

@spdomin
Copy link
Contributor

spdomin commented Dec 23, 2022

On the Nalu side, we already require HDF5, NetCDF, and Parallel-NetCDF. In general, I believe that from an application perspective, we need to understand all new dependencies that are created on our behalf. HDF5 and NetCDF are not the easiest TPLs to build. As such, I would make sure that these are really required for an Application.

From what I have seen over the years with STK facilitating Boost deprecation, and KLU allowing for SuperLU deprecation, we are generally improving the TPL dependencies. However, I am sure that there is a Trilinos process to be followed when adding a new dependency, right? (I know that adding a new package to Trilinos-proper has a process.)

My change to the Trilinos config for Nalu resulted in a successful build.

Looking forward to more discussion in the New Year.

@dridzal
Copy link
Contributor

dridzal commented Dec 23, 2022

We are seeing some unexplained memory use with HDF5 (even when it is not explicitly called in our code). Until this is resolved, we need to be able to build with NetCDF and without HDF5. Generally speaking, I would prefer it if HDF5 wasn't required for NetCDF.

@dridzal
Copy link
Contributor

dridzal commented Dec 23, 2022

I should add that we need to start paying attention to the memory footprint of Trilinos. In several heavier builds, where we are not super-careful about which TPLs and packages are turned on, each MPI rank seems to consume on the order of 1GB of RAM. For explicit methods, where we want to use all available MPI ranks on a node, this lowers the total usable memory on many systems by 25%.

@ikalash
Copy link
Contributor Author

ikalash commented Dec 23, 2022

I agree with your statements @dridzal . It also seems to me with the HDF5 enabled, the builds are taking longer, though that will not be 100% clear until the nightlies run again tonight.

@bartlettroscoe
Copy link
Member

@ikalash, this is interesting. This is likely due to the big TriBITS refactoring that I just merged in PR #11380. This is a change in backward compatibility that I was not expecting.

I will add a test case for this use case to TriBITS and update the behavior.

@bartlettroscoe bartlettroscoe self-assigned this Dec 24, 2022
@bartlettroscoe
Copy link
Member

So this is news to me that you can configure NetCDF without HDF5 so this should not be a required TPL dependency.

I guess a question I need to ask is:

  • Should there be TPLs that have required dependencies? For example, does it ever make sense to enable LAPACK without also needing BLAS?

@ikalash
Copy link
Contributor Author

ikalash commented Dec 24, 2022

@bartlettroscoe : just so you know, we had other things in Albany get broken we think due to tribits - see for example this issue: sandialabs/Albany#879 . If you read to the bottom, it seems that tribits might be clashing with the Albany cmake config. I think we might have something in LCM that is similar - I am testing now.

@ikalash
Copy link
Contributor Author

ikalash commented Dec 24, 2022

I think you absolutely can enable and use NetCDF in Trilinos w/o HDF5 - we do this regularly. My understanding was HDF5 was really only needed for very very large files, but perhaps that's not quite accurate.

@bartlettroscoe
Copy link
Member

FYI: It seems this use case was a known break in backward compatibility (see TriBITSPub/TriBITS#557). (I am not sure why I was convinced at the time why that was a good idea. (But I don't have anyone to bounce ideas off of while I do this work towards TriBITSPub/TriBITS#63 so we end with up delayed feedback like this issue.)

Longer term, I think we are going to need to extend support in TriBITS for external package/TPL dependencies to be either optional or required on a case-by-case basis (e.g. LAPACK => BLAS required, NetCDF => HDF5 optional). What is implemented in TriBITS right now for TPL dependencies is not consistent with either optional or required dependencies (and is a mix of both).

The right solution for now, and the solution that maintains perfect backward compatibility is to make all TPL dependencies optional.

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Dec 24, 2022

Note that the disable of NetCDF in this case with this version of TriBITS can be avoided by just not enabling HDF5 (instead of disabling it) as described in:

But this is still a change in behavior and hence the problem reported here.

@gsjaardema
Copy link
Contributor

We are seeing some unexplained memory use with HDF5 (even when it is not explicitly called in our code). Until this is resolved, we need to be able to build with NetCDF and without HDF5. Generally speaking, I would prefer it if HDF5 wasn't required for NetCDF.

There should be no memory use due to hdf5 unless it is being used. As best I can tell and from the tests I ran and showed you earlier, the only time HDF5 resulted in a memory bump (due to a buffer allocation) was when I was explicitly using an exodus file that use the netcdf-4 format which is hdf5-bnsed. Are you seeing memory increases just due to having hdf5 enabled in netcdf (other than the size of the library itself(

@dridzal
Copy link
Contributor

dridzal commented Dec 25, 2022

Yes, massif reports about 20% of memory used in HDF5. The test example I have does not perform any output. Is HDF5 also used for input? I haven't been able to use stk without some memory being allocated by HDF5. I was hoping that only NetCDF would be used, which is why I tried compiling without HDF5.

@gsjaardema
Copy link
Contributor

gsjaardema commented Dec 25, 2022 via email

@bartlettroscoe
Copy link
Member

FYI: This is resolved with TriBITSPub/TriBITS#558 and snapshotted into Trilinos in #11458

@bartlettroscoe
Copy link
Member

Can we go ahead and close this? This should have been resolved by PR:

#11458

merged back on 1/19/2023.

Copy link

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

@github-actions github-actions bot added the MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. label Feb 24, 2024
Copy link

This issue was closed due to inactivity for 395 days.

@github-actions github-actions bot added the CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. label Mar 27, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client: Albany Issue impacting the Albany project client: LCM Solid mechanics fork of Albany CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. type: bug The primary issue is a bug in Trilinos code or tests
Projects
Development

No branches or pull requests

6 participants