Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wm_cluster_from_atlas.py crashes on a 128 GB RAM, Python 3.12 environment #240

Open
tashrifbillah opened this issue Sep 26, 2024 · 4 comments

Comments

@tashrifbillah
Copy link
Contributor

tashrifbillah commented Sep 26, 2024

wm_cluster_from_atlas.py crashes at this line saying Killed:

similarities = np.exp(-distance / (sigmasq))

We pip installed this on a 128 GB RAM, Python 3.12, Redhat 9 environment. We realized that it does not crash on a 512 GB RAM, Python 3.12, Redhat 9 environment. Feel free to share your thoughts.


Edit: we collected a few statistics when the above crash happens:

-> similarities = np.exp(-distance / (sigmasq))
(Pdb) distance.max()
24388.2132106186
(Pdb) distance.min()
0.0
(Pdb) distance.shape
(2500, 1860721)
(Pdb) np.exp(-distance / (sigmasq))
Killed

The RAM usage simply jumps to 128 GB when it crashes on a 128 GB machine. Same issue on a 256 GB RAM machine.

@tashrifbillah
Copy link
Contributor Author

Command used:

wm_cluster_from_atlas.py \
sub-4003_ses-2_dir-416_desc-XcUnEdEp_reg.vtk \
/software/rocky9/ORG-Atlases-1.2/ORG-800FC-100HCP \
wma/sub-4003_ses-2_dir-416_desc-XcUnEdEp/FiberClustering/InitialClusters \
-l 40 -j 1

@tashrifbillah
Copy link
Contributor Author

Upon further investigation, I realize that just this also fails:

-distance / (sigmasq)

@tashrifbillah
Copy link
Contributor Author

tashrifbillah commented Sep 27, 2024

I replaced that line with:

    np.multiply(distance, -1, out=distance, dtype=np.float32)
    np.divide(distance, sigmasq, out=distance, dtype=np.float32)

    M = distance.shape[0]
    N = distance.shape[1]
    similarities = np.zeros((M,N), dtype=np.float32)
    np.exp(distance, out=similarities, dtype=np.float32)

    del distance

Thereby, I fixed just the distance_to_similarity() function. But then it fails somewhere downstream due to memory overflow.


The idea here is that you need to provide dtype=np.float32 everywhere downstream. And wherever possible, you need to provide out= argument.

@tashrifbillah
Copy link
Contributor Author

The idea here is that you need to provide dtype=np.float32 everywhere downstream. And wherever possible, you need to provide out= argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant