Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p-type tip high memory usage #10

Open
eimrek opened this issue Oct 10, 2023 · 0 comments
Open

p-type tip high memory usage #10

eimrek opened this issue Oct 10, 2023 · 0 comments

Comments

@eimrek
Copy link
Member

eimrek commented Oct 10, 2023

When non-zero p_tip_ratios is specified, the script can run into memory problems.

The current design of the code is:

  1. class Cp2kGridOrbitals puts the orbitals on a grid and divides them up between MPI processes such that each process has some orbitals in their full spatial extent;
  2. class STM "re-divides" the orbitals such that every MPI process has all orbitals but only in a specific spatial region, that allows to run the STM analysis independently from other MPI processes.

If p_tip_ratios is enabled, the current implementation calculates the gradient of the orbitals before the "re-division" of 2) and then just keeps two grids for each process: the orbital and its gradient. Intuitively, this should double the memory usage. But usage shows that this could be even higher and often causes the program to run out of memory. Relevant code is here:

if self.ptip_enabled:
### Calculate and divide also the p-tip contribution,
### as derivatives are hard to account for after dividing the orbitals in space
p_tip_contrib = (np.gradient(self.cgo.morb_grids[ispin], axis=1)/self.dv[0])**2
p_tip_contrib += (np.gradient(self.cgo.morb_grids[ispin], axis=2)/self.dv[1])**2
for rank in range(self.mpi_size):
ix_start, ix_end = self.x_ind_per_rank(rank)
if self.mpi_rank == rank:
recvbuf = np.empty(sum(orbitals_per_rank)*num_spatial_points, dtype=self.cgo.dtype)
else:
recvbuf = None
sendbuf = p_tip_contrib[:, ix_start:ix_end, :, :].ravel()
self.mpi_comm.Gatherv(sendbuf=sendbuf,
recvbuf=[recvbuf, orbitals_per_rank*num_spatial_points], root=rank)
if self.mpi_rank == rank:
self.local_orbital_ptip.append(recvbuf.reshape(total_orb, self.local_cell_n[0], self.local_cell_n[1], self.local_cell_n[2]))

Why do i need to store the gradient at this step? because of the spatial division between the MPI processes. If i just calculated the gradient for each process separately, then there could be anomalies at the edges of the spatial region.

A possible alternative would be to divide the spatial region between the MPI processes such that neighbouring regions overlap slightly. this would allow each process to independently calculate the gradient when needed and likely would reduce the memory usage considerably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant