Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with installation and running tests #106

Closed
gonsie opened this issue Apr 23, 2024 · 14 comments
Closed

problem with installation and running tests #106

gonsie opened this issue Apr 23, 2024 · 14 comments
Labels
documentation Improvements or additions to documentation

Comments

@gonsie
Copy link

gonsie commented Apr 23, 2024

This is related to openjournals/joss-reviews#6451.

I'm trying to install using the docker instructions, found on this page. I've been able to get a dolfinx image running, and install an adios4dolfinx package, but I can't run any successful tests.

I've run:

docker run -ti dolfinx/dolfinx:stable

Then within the container I have tried various ways to install adios4dolfinx:

python3 -m pip install adios4dolfinx[test]
python3 -m pip install adios4dolfinx[test]@git+https://github.com/jorgensd/[email protected]

Both still require me to manually install ipyparallel (which seems like I shouldn't have to from the instructions).

But installing with pip doesn't quite get me to a test environment, so I've separately cloned the repo and tried running the tests:

git clone https://github.com/jorgensd/adios4dolfinx.git
cd adios4dolfinx
python3 -m pytest .

gives the following errors:

python3 -m pytest .
============================================================= test session starts =============================================================
platform linux -- Python 3.10.12, pytest-7.4.2, pluggy-1.3.0
rootdir: /root/adios4dolfinx
configfile: pyproject.toml
plugins: xdist-3.3.1
collected 1145 items / 1 error                                                                                                                

=================================================================== ERRORS ====================================================================
_____________________________________________ ERROR collecting tests/test_snapshot_checkpoint.py ______________________________________________
ImportError while importing test module '/root/adios4dolfinx/tests/test_snapshot_checkpoint.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/test_snapshot_checkpoint.py:12: in <module>
    from adios4dolfinx.adios2_helpers import resolve_adios_scope
E   ImportError: cannot import name 'resolve_adios_scope' from 'adios4dolfinx.adios2_helpers' (/usr/local/lib/python3.10/dist-packages/adios4dolfinx/adios2_helpers.py)
============================================================== warnings summary ===============================================================
../../usr/local/lib/python3.10/dist-packages/ufl/core/ufl_type.py:56
../../usr/local/lib/python3.10/dist-packages/ufl/core/ufl_type.py:56
../../usr/local/lib/python3.10/dist-packages/ufl/core/ufl_type.py:56
../../usr/local/lib/python3.10/dist-packages/ufl/core/ufl_type.py:56
../../usr/local/lib/python3.10/dist-packages/ufl/core/ufl_type.py:56
../../usr/local/lib/python3.10/dist-packages/ufl/core/ufl_type.py:56
../../usr/local/lib/python3.10/dist-packages/ufl/core/ufl_type.py:56
../../usr/local/lib/python3.10/dist-packages/ufl/core/ufl_type.py:56
  /usr/local/lib/python3.10/dist-packages/ufl/core/ufl_type.py:56: DeprecationWarning: attach_operators_from_hash_data deprecated, please use UFLObject instead.
    warnings.warn("attach_operators_from_hash_data deprecated, please use UFLObject instead.", DeprecationWarning)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================================================== short test summary info ===========================================================
ERROR tests/test_snapshot_checkpoint.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
======================================================== 8 warnings, 1 error in 0.15s =========================================================
root@141d6cc482a7:~/adios4dolfinx# 

I appreciate your assistance.

@jorgensd
Copy link
Owner

jorgensd commented Apr 23, 2024

Apologies for the lack of clarity. I mentioned in #95 (comment) that the version under review is 0.8.0 of the software, compatible with dolfinx nightly (as the 0.8.0 is not released, although it is a few days away).
Once it is released one should be able to use the new dolfinx/dolfinx:stable image, or conda once binaries land there.

There were some API changes over the last few days in the main branch, which I have addressed in: #107. These have just been merged into the main branch.

Here is the way I tested this locally:

docker pull  ghcr.io/fenics/dolfinx/dolfinx:nightly
docker run -ti -v $(pwd):/root/shared -w /root/shared --rm ghcr.io/fenics/dolfinx/dolfinx:nightly

Then either install adios4dolfinx with:

python3 -m pip install adios4dolfinx[test]@git+https://github.com/jorgensd/adios4dolfinx@main

or

python3 -m pip install -e .[test]

if you have the repository locally (remember to pull the main branch for the latest changes added 10 minutes ago).

Then run tests with

python3 -m pytest -xvs .

or

mpirun -n 2 python3 -m pytest -xvs .

@jorgensd jorgensd added the documentation Improvements or additions to documentation label Apr 23, 2024
@jorgensd
Copy link
Owner

@gonsie Let me know if this clarifies and resolves your issue:)

@gonsie
Copy link
Author

gonsie commented Apr 25, 2024

Thanks, with the nightly build I have been able to run the parallel and serial versions of the tests.

Not sure if you're expecting all tests to pass... I got these results:

  • Serial: 158 failed, 1010 passed, 5 skipped in 39.41s
  • Parallel: 133 failed, 855 passed, 185 skipped in 31.99s

Thanks.

@jorgensd
Copy link
Owner

Thanks, with the nightly build I have been able to run the parallel and serial versions of the tests.

Not sure if you're expecting all tests to pass... I got these results:

* Serial: 158 failed, 1010 passed, 5 skipped in 39.41s

* Parallel: 133 failed, 855 passed, 185 skipped in 31.99s

Thanks.

I am expecting all tests to pass. You can see that they are executed on CI at every run:
https://github.com/jorgensd/adios4dolfinx/actions/runs/8829098883/job/24239460728]

I just re-ran the commands I posted in the previous post, and they all pass for me.
Could you post what commands you executed, and what the traceback of one of the failing tests are?
For instance by running python3 -m pytest -xvs .

@gonsie
Copy link
Author

gonsie commented Apr 25, 2024

I ran the following:

docker run -ti dolfinx/dolfinx:nightly
python3 -m pip install adios4dolfinx[test]@git+https://github.com/jorgensd/adios4dolfinx@main
git clone https://github.com/jorgensd/adios4dolfinx.git
cd adios4dolfinx
python3 -m pytest -vs .
mpirun -n 2 python3 -m pytest -vs .

The first failed serial run was:

__________________________________________ test_read_write_P_3D[mesh_3D0-read_comm0-4-Lagrange-True] __________________________________________

read_comm = <mpi4py.MPI.Intracomm object at 0x1555543b4e30>, family = 'Lagrange', degree = 4, is_complex = True
mesh_3D = <dolfinx.mesh.Mesh object at 0x15553301f580>, get_dtype = <function get_dtype.<locals>._get_dtype at 0x155533107eb0>
write_function = <function write_function.<locals>._write_function at 0x15553bc5cca0>
read_function = <function read_function.<locals>._read_function at 0x155533309cf0>

    @pytest.mark.parametrize("is_complex", [True, False])
    @pytest.mark.parametrize("family", ["Lagrange", "DG"])
    @pytest.mark.parametrize("degree", [1, 4])
    @pytest.mark.parametrize("read_comm", [MPI.COMM_SELF, MPI.COMM_WORLD])
    def test_read_write_P_3D(
        read_comm, family, degree, is_complex, mesh_3D, get_dtype, write_function, read_function
    ):
        mesh = mesh_3D
        f_dtype = get_dtype(mesh.geometry.x.dtype, is_complex)
        el = basix.ufl.element(
            family,
            mesh.ufl_cell().cellname(),
            degree,
            basix.LagrangeVariant.gll_warped,
            shape=(mesh.geometry.dim,),
        )
    
        def f(x):
            values = np.empty((3, x.shape[1]), dtype=f_dtype)
            values[0] = np.pi + x[0]
            values[1] = x[1] + 2 * x[0]
            values[2] = np.cos(x[2])
            if is_complex:
                values[0] -= 2j * x[2]
                values[2] += 1j * x[1]
            return values
    
        hash = write_function(mesh, el, f, f_dtype)
    
        MPI.COMM_WORLD.Barrier()
>       read_function(read_comm, el, f, hash, f_dtype)

tests/test_checkpointing.py:105: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/conftest.py:68: in _read_function
    np.testing.assert_allclose(v.x.array, v_ex.x.array, atol=10 * res, rtol=10 * res)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

args = (<function assert_allclose.<locals>.compare at 0x1555334c9120>, array([3.94159265+2.77800107e-17j, 1.6       +0.000000...    -1.05453137e-17j, ..., 3.19159265-1.90000000e+00j,
       1.        +0.00000000e+00j, 0.58168309+9.00000000e-01j]))
kwds = {'equal_nan': True, 'err_msg': '', 'header': 'Not equal to tolerance rtol=1e-14, atol=1e-14', 'verbose': True}

    @wraps(func)
    def inner(*args, **kwds):
        with self._recreate_cm():
>           return func(*args, **kwds)
E           AssertionError: 
E           Not equal to tolerance rtol=1e-14, atol=1e-14
E           
E           Mismatched elements: 16830 / 27783 (60.6%)
E           Max absolute difference: 8016162.74067225
E           Max relative difference: 800859.99683645
E            x: array([3.941593+2.778001e-17j, 1.6     +0.000000e+00j,
E                  1.      -1.054531e-17j, ..., 3.191593-1.900000e+00j,
E                  1.      +0.000000e+00j, 0.581683+9.000000e-01j])
E            y: array([3.941593+2.778001e-17j, 1.6     +0.000000e+00j,
E                  1.      -1.054531e-17j, ..., 3.191593-1.900000e+00j,
E                  1.      +0.000000e+00j, 0.581683+9.000000e-01j])

/usr/lib/python3.10/contextlib.py:79: AssertionError
=========================================================== short test summary info ===========================================================
FAILED tests/test_checkpointing.py::test_read_write_P_3D[mesh_3D0-read_comm0-4-Lagrange-True] - AssertionError: 
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
======================================================== 1 failed, 262 passed in 3.24s ========================================================

first failed parallel test:

__________ test_read_write_P_2D[mesh_2D3-read_comm0-4-Lagrange-True] ___________

read_comm = <mpi4py.MPI.Intracomm object at 0x155554414490>, family = 'Lagrange'
degree = 4, is_complex = True
mesh_2D = <dolfinx.mesh.Mesh object at 0x155532ed47c0>
get_dtype = <function get_dtype.<locals>._get_dtype at 0x1555450ed5a0>
write_function = <function write_function.<locals>._write_function at 0x155532e2f760>
read_function = <function read_function.<locals>._read_function at 0x155532e2e050>

    @pytest.mark.parametrize("is_complex", [True, False])
    @pytest.mark.parametrize("family", ["Lagrange", "DG"])
    @pytest.mark.parametrize("degree", [1, 4])
    @pytest.mark.parametrize("read_comm", [MPI.COMM_SELF, MPI.COMM_WORLD])
    def test_read_write_P_2D(
        read_comm, family, degree, is_complex, mesh_2D, get_dtype, write_function, read_function
    ):
        mesh = mesh_2D
        f_dtype = get_dtype(mesh.geometry.x.dtype, is_complex)
    
        el = basix.ufl.element(
            family,
            mesh.ufl_cell().cellname(),
            degree,
            basix.LagrangeVariant.gll_warped,
            shape=(mesh.geometry.dim,),
            dtype=mesh.geometry.x.dtype,
        )
    
        def f(x):
            values = np.empty((2, x.shape[1]), dtype=f_dtype)
            values[0] = np.full(x.shape[1], np.pi) + x[0]
            values[1] = x[0]
            if is_complex:
                values[0] += 1j * x[1]
                values[1] -= 3j * x[1]
            return values
    
        hash = write_function(mesh, el, f, f_dtype)
        MPI.COMM_WORLD.Barrier()
>       read_function(read_comm, el, f, hash, f_dtype)

tests/test_checkpointing.py:72: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/conftest.py:68: in _read_function
    np.testing.assert_allclose(v.x.array, v_ex.x.array, atol=10 * res, rtol=10 * res)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (<function assert_allclose.<locals>.compare at 0x155532e2f370>, array([3.14159265e+00+0.9j       , 2.77555756e-18-2.7j...5e+00+1.j        , ...,  9.82732684e-01-0.15j      ,
        4.12432534e+00+0.08273268j,  9.82732684e-01-0.24819805j]))
kwds = {'equal_nan': True, 'err_msg': '', 'header': 'Not equal to tolerance rtol=1e-14, atol=1e-14', 'verbose': True}

    @wraps(func)
    def inner(*args, **kwds):
        with self._recreate_cm():
>           return func(*args, **kwds)
E           AssertionError: 
E           Not equal to tolerance rtol=1e-14, atol=1e-14
E           
E           Mismatched elements: 78 / 3362 (2.32%)
E           Max absolute difference: 10755.05379197
E           Max relative difference: 599.98195738
E            x: array([3.141593e+00+0.9j     , 2.775558e-18-2.7j     ,
E                  3.141593e+00+1.j      , ..., 9.827327e-01-0.15j    ,
E                  4.124325e+00+0.082733j, 9.827327e-01-0.248198j])
E            y: array([ 3.141593e+00+0.9j     , -2.775558e-18-2.7j     ,
E                   3.141593e+00+1.j      , ...,  9.827327e-01-0.15j    ,
E                   4.124325e+00+0.082733j,  9.827327e-01-0.248198j])

/usr/lib/python3.10/contextlib.py:79: AssertionError
=========================== short test summary info ============================
FAILED tests/test_checkpointing.py::test_read_write_P_2D[mesh_2D3-read_comm0-4-Lagrange-True]
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
======================== 1 failed, 102 passed in 9.61s =========================

@jorgensd
Copy link
Owner

I cannot reproduce those error messages running the exact commands you sketched above:

root@6efd03467dae:~/adios4dolfinx# history
    1  python3 -m pip install adios4dolfinx[test]@git+https://github.com/jorgensd/adios4dolfinx@main
    2  git clone https://github.com/jorgensd/adios4dolfinx.git
    3  cd adios4dolfinx
    4  python3 -m pytest -vs .
    5  mpirun -n 2 python3 -m pytest -vs .
    6  history

Could you print the output of:

 python3 -c "import dolfinx; print(dolfinx.git_commit_hash)"

Also what kind of system are you running on?

@nate-sime
Copy link
Contributor

nate-sime commented Apr 25, 2024

I also cannot reproduce those error messages using the commands:

docker run -ti dolfinx/dolfinx:nightly
python3 -m pip install adios4dolfinx[test]@git+https://github.com/jorgensd/adios4dolfinx@main
git clone https://github.com/jorgensd/adios4dolfinx.git
cd adios4dolfinx
python3 -m pytest -vs .
mpirun -n 2 python3 -m pytest -vs .

In the serial case I get

1168 passed, 5 skipped in 50.99s

and the parallel case

988 passed, 185 skipped in 24.49s

For completeness I also see:

python3 -c "import dolfinx; print(dolfinx.git_commit_hash)"
d5fb3db2a8544a3e93a3f3aa52d201846b2b9278

My host device is running Ubuntu 22.04.4 LTS

@gonsie
Copy link
Author

gonsie commented Apr 25, 2024

I see:

python3 -c "import dolfinx; print(dolfinx.git_commit_hash)"
d5fb3db2a8544a3e93a3f3aa52d201846b2b9278

I'm running the container on a Red Hat Enterprise Linux v8.9 HPC system.

@jorgensd
Copy link
Owner

Very strange... It would be interesting to see if it can be reproduced with conda on your system once:
conda-forge/fenics-dolfinx-feedstock#71
has been merged.

I do unfortunately not have a Red Hat system to try to reproduce this on.

@jorgensd
Copy link
Owner

What MPI is installed on your HPC system?

@jorgensd
Copy link
Owner

Other relevant questions:

  • what specification does your hpc system have (like, what kind of nodes)
  • are you able to run the dolfinx Python unit tests in serial in the docker image?
  • Is it an AMD or ARM system?

@jorgensd
Copy link
Owner

Is there a user manual for your hpc system that i could consult?

@gonsie
Copy link
Author

gonsie commented Apr 26, 2024

I'm using the poodle system at LLNL, intel CPUs. Of note, I'm also using podman rather than docker directly, so there may be some issues there. That or an MPI issue would be the first places I would look. However, I'm not really interested in debugging this further. I'm happy to share the list of tests that are failing, if that would be helpful.

@jorgensd
Copy link
Owner

It would be interesting to find out what is wrong, but it is unlikely that i will have a similar system to replicate that setup on. Therefore I’ll close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants