Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GETRF checking fails when multiple threads threads are bind to the same core. #18

Open
abouteiller opened this issue Apr 13, 2020 · 1 comment
Labels
bug Something isn't working high priority This is an important feature

Comments

@abouteiller
Copy link
Contributor

Original report by Nuria Losada (Bitbucket: nuriallv, GitHub: nuriallv).


The result checking fails on GETRF when multiple threads threads are bind to the same core.
Setting info[2] = 1 on the body of getrf at zgetrf.jdf seems to fix it.

Small reproducer (running on saturn, nodes c0*) using the dplasma/tests

mpirun -np 1 ./testing_dgetrf -c 4 -N 20 -t 2 -x -- --parsec_bind file:binding_onecore
$ cat binding_onecore 
0

@abouteiller
Copy link
Contributor Author

replicated, PMIX_MCA_psec='' SLURM_TIMELIMIT=100 PARSEC_MCA_device_cuda_enabled=1 PARSEC_MCA_device_cuda_memory_use=10 OMPI_MCA_rmaps_base_oversubscribe=true salloc -wleconte -n 8 --gpus-per-task=1 --cpus-per-task=1 srun -n 1 --pty gdb --args tests/testing_dgetrf_1d -c 4 -N 20 -t 2 -x -- --mca bind_map file:binding_onecore --mca debug_verbose 0

cat binding_onecore
0,0,0,0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high priority This is an important feature
Projects
None yet
Development

No branches or pull requests

1 participant