Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix up FO_AIS tests and get them to run with IOPX #761

Open
ikalash opened this issue Nov 13, 2021 · 26 comments
Open

Fix up FO_AIS tests and get them to run with IOPX #761

ikalash opened this issue Nov 13, 2021 · 26 comments
Assignees

Comments

@ikalash
Copy link
Collaborator

ikalash commented Nov 13, 2021

Request by @rstumin . Was hoping to get help from @mperego and @bartgol to fix the input options so these tests run again. Code complains about Field Origin: Output field.

@mperego
Copy link
Collaborator

mperego commented Nov 13, 2021

I think I fixed the tests.
However, I'm wondering if requiring Iopx library for using serial mesh is the right thing to do. On my workstation I don't have Iopx but "Use Serial Mesh: true" works fine. @bartgol @jewatkins

@ikalash courios about the ProSPect$ label... we haven't used it in years...

@ikalash
Copy link
Collaborator Author

ikalash commented Nov 14, 2021

Thanks @mperego I'll close. No idea where the ProSPect$ label came from. I was curious about the $ symbol, but maybe it means "prospect money" (things funded by prospect)?

@ikalash ikalash closed this as completed Nov 14, 2021
@ikalash ikalash reopened this Nov 15, 2021
@ikalash
Copy link
Collaborator Author

ikalash commented Nov 15, 2021

@mperego : do you know why tests that don't use FRoSch but have it in the input file are failing?

https://testing.sandia.gov/cdash/viewBuildError.php?buildid=10240241

@mperego
Copy link
Collaborator

mperego commented Nov 15, 2021

@ikalash oh right, I forgot about that. That's a bit annoying, but because of the way the preconditioner parameter list is validated, if FROSch is not enabled, it cannot be listed among the preconditioners. I'll remove the FROSch preconditioners from the the input files.

@mperego
Copy link
Collaborator

mperego commented Nov 15, 2021

@jewatkins do you understand what's the issue here:
https://sems-cdash-son.sandia.gov/cdash/test/1844066

@mperego
Copy link
Collaborator

mperego commented Nov 15, 2021

OK, so I "should" have fixed all the issues now, except the Cuda one.

@bartgol
Copy link
Collaborator

bartgol commented Nov 15, 2021

I'm wondering if requiring Iopx library for using serial mesh is the right thing to do. On my workstation I don't have Iopx but "Use Serial Mesh: true" works fine.

Isn't Iopx used to run decomp and/or to load a partitioned mesh? I wonder if it's not necessary for serial mesh which are partitioned at runtime...

@mperego
Copy link
Collaborator

mperego commented Nov 15, 2021

I'm wondering if requiring Iopx library for using serial mesh is the right thing to do. On my workstation I don't have Iopx but "Use Serial Mesh: true" works fine.

Isn't Iopx used to run decomp and/or to load a partitioned mesh? I wonder if it's not necessary for serial mesh which are partitioned at runtime...

right, so what's the minimum requirement for "Use Serial Mesh" to work? Based on the dashboard, it seems that the build on Camobap
https://sems-cdash-son.sandia.gov/cdash/viewTest.php?onlyfailed&buildid=25282
cannot handle "Use Serial Mesh". (In this test "Use Serial Mesh" was wrongly set to true independently of the check on Seacas and Iopx).

@jewatkins
Copy link
Collaborator

jewatkins commented Nov 15, 2021

Doesn't seacas do the decomp at runtime? Maybe try turning that off to see if it fails?

@mperego
Copy link
Collaborator

mperego commented Nov 15, 2021

Doesn't seacas do the decomp at runtime? Maybe try turning that off to see if it fails?

Isn't Seacas required for the "decomp" function? Decomp works on Camobap, so if that's the case, checking for Seacas is not enough for guarantiing that "Use Serial Mesh" works

@jewatkins
Copy link
Collaborator

okay, what about Ioss? I know that's needed.

@mperego
Copy link
Collaborator

mperego commented Nov 15, 2021

Maybe we need to check if NetCDF library is built with PNetCDF support:

This is the error on Camobap

 Warning/Error: [ex_open_par_int]
	EXODUS: ERROR: Attempting to open the NetCDF file:
	'antarctica_2d.exo'
	The NetCDF library was not built with PNetCDF support as required for parallel access to this file.

	NetCDF: Unknown file format

@jewatkins
Copy link
Collaborator

maybe in cmake, check if something like this gets pulled: TPL_Netcdf_Enables_PNetcdf

@mperego
Copy link
Collaborator

mperego commented Nov 15, 2021

@jewatkins do you understand what's the issue here: https://sems-cdash-son.sandia.gov/cdash/test/1844066

B.t.w. @jewatkins have you seen this. This is different from the exodus issue.

@ikalash
Copy link
Collaborator Author

ikalash commented Nov 15, 2021

I don't have PNetCDF on camobap. I build my own libs there and there are some problems with building netcdf with pnetcdf. I would say the code should not barf even if the user does not have pnetcdf - I'm not sure why it does.

@mperego
Copy link
Collaborator

mperego commented Nov 15, 2021

I don't have PNetCDF on camobap. I build my own libs there and there are some problems with building netcdf with pnetcdf. I would say the code should not barf even if the user does not have pnetcdf - I'm not sure why it does.

@ikalash I understand why it complains. It was an issue with my fix that should have been solved. However, what I'm trying to understand here, is how to better detect when "Use Serial Mesh" works. With the check on Iopx we are doing now, we are basically disabling "Use Serial Mesh" on all builds, which is not needed.

@jewatkins
Copy link
Collaborator

@jewatkins do you understand what's the issue here: https://sems-cdash-son.sandia.gov/cdash/test/1844066

B.t.w. @jewatkins have you seen this. This is different from the exodus issue.

I have not seen that. There's probably a compatibility issue now when using Kokkos::Cuda + MueLu without kokkos refactor on. The performance tests use kokkos so that's why they're working still. It's worth discussing with other muelu folks to see if that's the desired behavior.

@jewatkins
Copy link
Collaborator

If it is the desired behavior, will have to create a second test with the kokkos syntax and turn off this test with cuda builds.

@ikalash
Copy link
Collaborator Author

ikalash commented Nov 15, 2021

I don't understand why "Use Serial Mesh" would not work on camobap whereas it would on other platforms... maybe a special configuration of netcdf is needed? I can try to look into this but it will not be immediately. I'm curious also if this problem happens on @lxmota 's machines, which are identical to camobap in terms of OS. I can try building Albany there to check.

@mperego
Copy link
Collaborator

mperego commented Nov 15, 2021

@ikalash It does not work there because you don't have parallel netcdf or hdf5 such that exodus can import a file in parallel. I'm working on a fix.

@mperego
Copy link
Collaborator

mperego commented Nov 15, 2021

@jewatkins do you understand what's the issue here: https://sems-cdash-son.sandia.gov/cdash/test/1844066

B.t.w. @jewatkins have you seen this. This is different from the exodus issue.

I have not seen that. There's probably a compatibility issue now when using Kokkos::Cuda + MueLu without kokkos refactor on. The performance tests use kokkos so that's why they're working still. It's worth discussing with other muelu folks to see if that's the desired behavior.

@jewatkins Would you be willing of bringing this up with the MueLu folks?

@mperego
Copy link
Collaborator

mperego commented Nov 16, 2021

@jewatkins Now that we fixed the issue with exodus, the MueLu tests seem to fail also on OpenMP builds.

p=0: *** Caught standard std::exception of type 'Teuchos::bad_any_cast' :

 /nightlyAlbanyTests/Results/Trilinos/packages/muelu/src/MueCentral/MueLu_VariableContainer.hpp:101:
 
 Throw number = 1
 
 Throw test that evaluated to true: data_->type() != typeid(T)
 
 Error, cast to type Data<Teuchos::RCP<MueLu::GraphBase<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >> failed since the actual underlying type is 'Teuchos::RCP<MueLu::LWGraph_kokkos<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >!

@jewatkins
Copy link
Collaborator

That looks like the same issue. Yes, I can post the issue.

@jewatkins
Copy link
Collaborator

Done. trilinos/Trilinos#9943

Now that I have thought about this some more, I think I do remember that error message. Even if muelu were to work for openmp/cuda, it would probably try to do everything in serial. So we would need to create a new test for openmp/cuda builds anyways (or just modify the test so that it uses kokkos for everything).

@jewatkins
Copy link
Collaborator

I'll check tomorrow to see if turning off kokkos refactor fixes the issue for openmp/cuda.

@mperego
Copy link
Collaborator

mperego commented Nov 18, 2021

@jewatkins thanks for fixing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants