Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unknown error in the calculation of large system #315

Open
16 tasks
JTaozhang opened this issue Apr 19, 2024 · 0 comments
Open
16 tasks

unknown error in the calculation of large system #315

JTaozhang opened this issue Apr 19, 2024 · 0 comments

Comments

@JTaozhang
Copy link

Describe the bug

Hi there,
I am using abacus to calculate band structure of a large system(504 atoms). I have assigned 12 tasks (12 nodes, each node has 192GB memory) and 56 threads (each node has 56 cores) for each task for the job. However I am stucked by an error. the error like this

Error file:

terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

log file
ELEMENT ORBITALS NBASE NATOM XC
W 4s2p2d2f1g-8au 43 168
Te 2s2p2d1f-7au 25 336

Initial plane wave basis and FFT box

DONE(2.25912 SEC) : INIT PLANEWAVE

NONSELF-CONSISTENT :

START CHARGE : file

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 4 PID 28525 RUNNING AT node136
= KILLED BY SIGNAL: 6 (Aborted)

the details of inputting files are listed as follows,
INPUT file:
INPUT_PARAMETERS
suffix WTe2
ntype 2
nelec 0.0
lspinorb 1
pseudo_dir /share/home/zhangtao/work/WTe2/abacus/pseudo
orbital_dir /share/home/zhangtao/work/WTe2/abacus/orbital
ecutwfc 100 #unit Ryberg 13.606 eV
scf_thr 1e-6 #unit Ryberg 13.606 eV
basis_type lcao
calculation nscf
#parameters(vdw)
vdw_method d2
#Parameters (File)
init_chg file
out_band 1
out_dos 1

KPT:
K_POINTS
3
Line
0.0000000000 0.0000000000 0.0000000000 5 # G
0.5000000000 0.0000000000 0.0000000000 5 # X
0.5000000000 0.5000000000 0.0000000000 1 # S

part of information of STRU:
ATOMIC_SPECIES
W 183.841 W_ONCV_PBE_FR-1.0.upf
Te 127.603 Te_ONCV_PBE_FR-1.1.upf

NUMERICAL_ORBITAL
W_gga_8au_100Ry_4s2p2d2f1g.orb
Te_gga_7au_100Ry_2s2p2d1f.orb

LATTICE_CONSTANT
1.889726

LATTICE_VECTORS
32.21631806204 0.000000000000 0.000000000000
0.000000000000 28.28261899040 0.000000000000
0.000000000000 0.000000000000 29.99607617032

the Charge density file obtained from scf calculation also have been provided to this calculation. after executing the software, the running_nscf.log shows:
ETUP SEARCHING RADIUS FOR PROGRAM TO SEARCH ADJACENT ATOMS
longest orb rcut (Bohr) = 8
longest nonlocal projector rcut (Bohr) = 3.2
==> atom_arrange::search 224 GB 7.14 s
searching radius is (Bohr)) = 22.4
searching radius unit is (Bohr)) = 1.89
==> Atom_input::Atom_input 224 GB 7.14 s
==> Atom_input::Expand_Grid 224 GB 7.14 s
==> Atom_input::calculate_cells 224 GB 7.14 s
==> SLTK_Grid::init 224 GB 7.14 s
==> SLTK_Grid::setMemberVariables 224 GB 7.14 s
==> SLTK_Grid::Build_Cell 224 GB 7.14 s
==> SLTK_Grid::Build_Hash_Table 224 GB 7.14 s
==> SLTK_Grid::Fold_Hash_Table 224 GB 7.14 s
==> Grid_Technique::init 224 GB 7.32 s

SETUP EXTENDED REAL SPACE GRID FOR GRID INTEGRATION
real space grid = [ 400, 360, 375 ]
big cell numbers in grid = [ 80, 72, 125 ]
meshcell numbers in big cell = [ 5, 5, 3 ]
==> Grid_MeshCell::init_latvec 224 GB 7.32 s
==> Grid_BigCell::init_big_latvec 224 GB 7.32 s
==> Grid_BigCell::init_grid_expansion 224 GB 7.32 s
extended fft grid = [ 11, 11, 18 ]
dimension of extened grid = [ 103, 95, 162 ]
==> Grid_MeshK::cal_extended_cell 224 GB 7.32 s
UnitCellTotal = 27
==> Grid_BigCell::init_tau_in_bigcell 224 GB 7.32 s
==> Grid_MeshBall::delete_meshball_positions 224 GB 7.32 s
==> Grid_MeshBall::init_meshball 224 GB 7.32 s
==> Grid_Technique::init_atoms_on_grid 224 GB 7.41 s
==> Grid_Technique::get_startind 224 GB 7.41 s

Warning_Memory_Consuming allocated: GT::index2normal 6.05 MB
==> Grid_BigCell::grid_expansion_index 224 GB 7.41 s
No atoms on this sub-FFT-mesh.
==> Grid_Techinique::init_atoms_on_grid2 224 GB 7.55 s
==> Grid_Technique::cal_trace_lo 224 GB 7.55 s
Atom number in sub-FFT-grid = 0
Local orbitals number in sub-FFT-grid = 0
==> Record_adj::for_2d 224 GB 7.55 s
ParaV.nnr = 18004308
==> LCAO_nnr::cal_nnrg 224 GB 7.75 s
==> LCAO_nnr::cal_max_box_index 224 GB 7.75 s
nnrg = 0
==> LCAO_domain::grid_prepare 224 GB 7.75 s
==> Gint_k::prep_grid 224 GB 7.75 s
==> Potential::pot_register 224 GB 7.75 s
==> Potential::get_pot_type 224 GB 7.75 s
==> Potential::get_pot_type 224 GB 7.75 s
==> Potential::get_pot_type 224 GB 7.75 s
==> Veff::initialize_HR 224 GB 7.75 s
==> Gint::initialize_pvpR 224 GB 7.78 s
==> Gint_k::destroy_pvpR 224 GB 7.79 s
==> Gint_k::allocate_pvpR 224 GB 7.79 s
==> OverlapNew::initialize_SR 224 GB 7.79 s
==> EkineticNew::initialize_HR 223 GB 7.82 s
==> NonlocalNew::initialize_HR 223 GB 7.86 s

Warning_Memory_Consuming allocated: HamiltLCAO::hR 279 MB

Warning_Memory_Consuming allocated: HamiltLCAO::sR 161 MB
==> Local_Orbital_Charge::allocate_dm_wfc 223 GB 7.93 s
==> Local_Orbital_wfc::allocate_k 223 GB 7.93 s
==> Local_Orbital_Charge::allocate_k 223 GB 7.93 s
nnrg_last = 0
nnrg_now = 0
==> Charge::set_rho_core 223 GB 7.93 s
init_chg = file
try to read charge from file :
==> ModuleIO::read_rhog 223 GB 7.93 s
==> ModuleIO::read_cube 223 GB 7.93 s
Find the file, try to read charge from file.
read in fermi energy = 0.41

According the output information, one of my friend told me this was not due to out of memory and looked like the error occuring in the process of reading charge density. If someone could give me some advice, I would greatly appreciate the helps.

Best,
Tao

Expected behavior

No response

To Reproduce

STRU.txt
if you want reproduce it, for the scf part, the input file like this


INPUT_PARAMETERS
suffix WTe2
ntype 2
nelec 0.0
lspinorb 1
pseudo_dir /share/home/zhangtao/work/WTe2/abacus/pseudo
orbital_dir /share/home/zhangtao/work/WTe2/abacus/orbital

#Prameters(general)
ecutwfc 100 #unit Ryberg 13.606 eV
scf_thr 1e-6 #unit Ryberg 13.606 eV
basis_type lcao
symmetry 1
#gamma_only 1

#Parameters (Accuracy)
calculation scf
#force_thr_ev 0.01

#parameters(vdw)
vdw_method d2

#Parameters (smearing)
#smearing_method gauss
#smearing_sigma 0.01

#Parameters (File)
out_chg 1

kpoint file:
K_POINTS
0 //total number of k-point, `0' means generate automatically
Gamma //Gamma or MP
1 1 1 0 0 0 //first three number: subdivisions along reciprocal vectors //last three number: shift of the mesh

the STRU file has been attached here.

Environment

I used the version of abacus 3.6.1 calculating the nscf part and 3.4.1 version calculating the scf part. Both softwares are compilered by intel-oneapi2021 and they work well in other smal sytems.

Additional Context

No response

Task list for Issue attackers (only for developers)

  • Verify the issue is not a duplicate.
  • Describe the bug.
  • Steps to reproduce.
  • Expected behavior.
  • Error message.
  • Environment details.
  • Additional context.
  • Assign a priority level (low, medium, high, urgent).
  • Assign the issue to a team member.
  • Label the issue with relevant tags.
  • Identify possible related issues.
  • Create a unit test or automated test to reproduce the bug (if applicable).
  • Fix the bug.
  • Test the fix.
  • Update documentation (if necessary).
  • Close the issue and inform the reporter (if applicable).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant