p-type tip high memory usage #10

eimrek · 2023-10-10T08:53:47Z

When non-zero p_tip_ratios is specified, the script can run into memory problems.

The current design of the code is:

class Cp2kGridOrbitals puts the orbitals on a grid and divides them up between MPI processes such that each process has some orbitals in their full spatial extent;
class STM "re-divides" the orbitals such that every MPI process has all orbitals but only in a specific spatial region, that allows to run the STM analysis independently from other MPI processes.

If p_tip_ratios is enabled, the current implementation calculates the gradient of the orbitals before the "re-division" of 2) and then just keeps two grids for each process: the orbital and its gradient. Intuitively, this should double the memory usage. But usage shows that this could be even higher and often causes the program to run out of memory. Relevant code is here:

cp2k-spm-tools/cp2k_spm_tools/cp2k_stm_sts.py

Lines 148 to 164 in abe0d0e

    
           if self.ptip_enabled: 
        
               ### Calculate and divide also the p-tip contribution, 
        
               ### as derivatives are hard to account for after dividing the orbitals in space 
        
               p_tip_contrib  = (np.gradient(self.cgo.morb_grids[ispin], axis=1)/self.dv[0])**2 
        
               p_tip_contrib += (np.gradient(self.cgo.morb_grids[ispin], axis=2)/self.dv[1])**2 
        
               for rank in range(self.mpi_size): 
        
                   ix_start, ix_end = self.x_ind_per_rank(rank) 
        
                   if self.mpi_rank == rank: 
        
                       recvbuf = np.empty(sum(orbitals_per_rank)*num_spatial_points, dtype=self.cgo.dtype) 
        
                   else: 
        
                       recvbuf = None 
        
                   sendbuf = p_tip_contrib[:, ix_start:ix_end, :, :].ravel() 
        
                   self.mpi_comm.Gatherv(sendbuf=sendbuf, 
        
                       recvbuf=[recvbuf, orbitals_per_rank*num_spatial_points], root=rank) 
        
                   if self.mpi_rank == rank: 
        
                       self.local_orbital_ptip.append(recvbuf.reshape(total_orb, self.local_cell_n[0], self.local_cell_n[1], self.local_cell_n[2]))

Why do i need to store the gradient at this step? because of the spatial division between the MPI processes. If i just calculated the gradient for each process separately, then there could be anomalies at the edges of the spatial region.

A possible alternative would be to divide the spatial region between the MPI processes such that neighbouring regions overlap slightly. this would allow each process to independently calculate the gradient when needed and likely would reduce the memory usage considerably.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p-type tip high memory usage #10

p-type tip high memory usage #10

eimrek commented Oct 10, 2023 •

edited

Loading

p-type tip high memory usage #10

p-type tip high memory usage #10

Comments

eimrek commented Oct 10, 2023 • edited Loading

eimrek commented Oct 10, 2023 •

edited

Loading