Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI 4.0 API #28

Merged
merged 10 commits into from
Jan 23, 2024
Merged

MPI 4.0 API #28

merged 10 commits into from
Jan 23, 2024

Conversation

dalcinl
Copy link
Collaborator

@dalcinl dalcinl commented Jan 18, 2024

With these changes, mpi4py is able to build cleanly and some tests pass.

@jeffhammond
Copy link
Owner

Thanks. I'll test later today.

@jeffhammond
Copy link
Owner

something you did broke MacOS builds...

+ MPI_LIB=/opt/homebrew/Cellar/open-mpi/5.0.1/lib/libmpi.dylib
+ /opt/homebrew/Cellar/open-mpi/5.0.1/bin/mpirun -n 2 ./testbottom.x
I am 0 of 2
I am 1 of 2
all done
+ MPI_LIB=/opt/homebrew/Cellar/mpich/4.1.2/lib/libmpi.dylib
+ /opt/homebrew/Cellar/mpich/4.1.2/bin/mpirun -n 2 ./testbottom.x
dlopen of /opt/homebrew/Cellar/mpich/4.1.2/lib/libmpi.dylib failed: dlopen(/opt/homebrew/Cellar/mpich/4.1.2/lib/libmpi.dylib, 0x0005): symbol not found in flat namespace '_ADIOI_Datarep_head'
dlopen of /opt/homebrew/Cellar/mpich/4.1.2/lib/libmpi.dylib failed: dlopen(/opt/homebrew/Cellar/mpich/4.1.2/lib/libmpi.dylib, 0x0005): symbol not found in flat namespace '_ADIOI_Datarep_head'

@dalcinl
Copy link
Collaborator Author

dalcinl commented Jan 19, 2024

This looks like some fundamental issue with the MPICH shared library

In [1]: import ctypes

In [2]: ctypes.CDLL("/opt/homebrew/Cellar/mpich/4.1.2/lib/libpmpi.dylib")
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[2], line 1
----> 1 ctypes.CDLL("/opt/homebrew/Cellar/mpich/4.1.2/lib/libpmpi.dylib")

File /opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ctypes/__init__.py:376, in CDLL.__init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    373 self._FuncPtr = _FuncPtr
    375 if handle is None:
--> 376     self._handle = _dlopen(self._name, mode)
    377 else:
    378     self._handle = handle

OSError: dlopen(/opt/homebrew/Cellar/mpich/4.1.2/lib/libpmpi.dylib, 0x0006): symbol not found in flat namespace '_ADIOI_Datarep_head'

@jeffhammond
Copy link
Owner

The issue does not exist in main so it's specifically triggered by your changes. I'm not saying it's your fault, but also, you broke it 😄

@dalcinl
Copy link
Collaborator Author

dalcinl commented Jan 19, 2024

Yes, the issue most likely comes from the change that removes explicit linking on the MPI libraries. You have to grant me that a plain and basic dlopen("libmpi.dylib") is something that should work out of the box. But for MPICH on macOS, that is not the case, and then think broke bad. I need to investigate what's going on.

@dalcinl
Copy link
Collaborator Author

dalcinl commented Jan 19, 2024

I think the issue comes from using RTLD_LOCAL. Moving to RTLD_GLOBAL would fix it, but I'm not sure things will still work properly. IMHO, the problem is in the PMPI library.

The PMPI lib is not self contained, it depends on symbols the MPI library, even on "MPI" symbols! And that does not sound correct. Therefore, IMHO I believe my change here just unmasked and issue in MPICH. Looks like ROMIO stuff is not landing in the proper library. @hzhou What do you think?

nm /opt/homebrew/Cellar/mpich/4.1.2/lib/libpmpi.dylib | grep ' U ' | grep -E '(ADIO|MPI)'
                 U _ADIOI_Datarep_head
                 U _ADIOI_Datatype_iscontig
                 U _ADIOI_Free_fn
                 U _ADIOI_Get_byte_offset
                 U _ADIOI_Get_eof_offset
                 U _ADIOI_Get_position
                 U _ADIOI_Malloc_fn
                 U _ADIOI_Shfp_fname
                 U _ADIOI_Strdup
                 U _ADIOI_Strncpy
                 U _ADIOI_Type_ispredef
                 U _ADIO_Close
                 U _ADIO_Get_shared_fp
                 U _ADIO_ImmediateOpen
                 U _ADIO_Open
                 U _ADIO_ResolveFileType
                 U _ADIO_Set_shared_fp
                 U _ADIO_Set_view
                 U _ADIO_same_amode
                 U _MPIO_Completed_request_create
                 U _MPIO_Err_create_code
                 U _MPIO_Err_return_file
                 U _MPIO_File_c2f
                 U _MPIO_File_f2c
                 U _MPIO_File_free
                 U _MPIO_File_resolve
                 U _MPIR_Comm_split_filesystem
                 U _MPIR_MPIOInit
                 U _MPIR_ROMIO_Get_file_errhand
                 U _MPIR_ROMIO_Set_file_errhand
                 U _MPIR_Status_set_bytes
                 U _MPIU_datatype_full_size
                 U _MPIU_external32_buffer_setup
                 U _MPIU_read_external32_conversion_fn
                 U _MPI_File_c2f
                 U _MPI_File_close
                 U _MPI_File_delete
                 U _MPI_File_f2c
                 U _MPI_File_get_amode
                 U _MPI_File_get_atomicity
                 U _MPI_File_get_byte_offset
                 U _MPI_File_get_errhandler
                 U _MPI_File_get_group
                 U _MPI_File_get_info
                 U _MPI_File_get_position
                 U _MPI_File_get_position_shared
                 U _MPI_File_get_size
                 U _MPI_File_get_type_extent
                 U _MPI_File_get_type_extent_c
                 U _MPI_File_get_view
                 U _MPI_File_iread
                 U _MPI_File_iread_all
                 U _MPI_File_iread_all_c
                 U _MPI_File_iread_at
                 U _MPI_File_iread_at_all
                 U _MPI_File_iread_at_all_c
                 U _MPI_File_iread_at_c
                 U _MPI_File_iread_c
                 U _MPI_File_iread_shared
                 U _MPI_File_iread_shared_c
                 U _MPI_File_iwrite
                 U _MPI_File_iwrite_all
                 U _MPI_File_iwrite_all_c
                 U _MPI_File_iwrite_at
                 U _MPI_File_iwrite_at_all
                 U _MPI_File_iwrite_at_all_c
                 U _MPI_File_iwrite_at_c
                 U _MPI_File_iwrite_c
                 U _MPI_File_iwrite_shared
                 U _MPI_File_iwrite_shared_c
                 U _MPI_File_open
                 U _MPI_File_preallocate
                 U _MPI_File_read
                 U _MPI_File_read_all
                 U _MPI_File_read_all_begin
                 U _MPI_File_read_all_begin_c
                 U _MPI_File_read_all_c
                 U _MPI_File_read_all_end
                 U _MPI_File_read_at
                 U _MPI_File_read_at_all
                 U _MPI_File_read_at_all_begin
                 U _MPI_File_read_at_all_begin_c
                 U _MPI_File_read_at_all_c
                 U _MPI_File_read_at_all_end
                 U _MPI_File_read_at_c
                 U _MPI_File_read_c
                 U _MPI_File_read_ordered
                 U _MPI_File_read_ordered_begin
                 U _MPI_File_read_ordered_begin_c
                 U _MPI_File_read_ordered_c
                 U _MPI_File_read_ordered_end
                 U _MPI_File_read_shared
                 U _MPI_File_read_shared_c
                 U _MPI_File_seek
                 U _MPI_File_seek_shared
                 U _MPI_File_set_atomicity
                 U _MPI_File_set_errhandler
                 U _MPI_File_set_info
                 U _MPI_File_set_size
                 U _MPI_File_set_view
                 U _MPI_File_sync
                 U _MPI_File_write
                 U _MPI_File_write_all
                 U _MPI_File_write_all_begin
                 U _MPI_File_write_all_begin_c
                 U _MPI_File_write_all_c
                 U _MPI_File_write_all_end
                 U _MPI_File_write_at
                 U _MPI_File_write_at_all
                 U _MPI_File_write_at_all_begin
                 U _MPI_File_write_at_all_begin_c
                 U _MPI_File_write_at_all_c
                 U _MPI_File_write_at_all_end
                 U _MPI_File_write_at_c
                 U _MPI_File_write_c
                 U _MPI_File_write_ordered
                 U _MPI_File_write_ordered_begin
                 U _MPI_File_write_ordered_begin_c
                 U _MPI_File_write_ordered_c
                 U _MPI_File_write_ordered_end
                 U _MPI_File_write_shared
                 U _MPI_File_write_shared_c
                 U _MPI_Register_datarep
                 U _MPI_Register_datarep_c

@hzhou
Copy link

hzhou commented Jan 19, 2024

I think the issue comes from using RTLD_LOCAL. Moving to RTLD_GLOBAL would fix it, but I'm not sure things will still work properly. IMHO, the problem is in the PMPI library.

The PMPI lib is not self contained, it depends on symbols the MPI library, even on "MPI" symbols! And that does not sound correct. Therefore, IMHO I believe my change here just unmasked and issue in MPICH. Looks like ROMIO stuff is not landing in the proper library. @hzhou What do you think?

I agree with your insight. It is on my agenda to redo the ROMIO bindings.

@hzhou
Copy link

hzhou commented Jan 19, 2024

cc @raffenet

@dalcinl dalcinl force-pushed the fixes branch 4 times, most recently from bd6ec41 to f6b7292 Compare January 22, 2024 17:37
@jeffhammond jeffhammond merged commit 399530f into main Jan 23, 2024
2 checks passed
@jeffhammond jeffhammond deleted the fixes branch January 23, 2024 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants