gpudev update #92

cjknight · 2024-05-14T16:35:32Z

Merged with latest dev
Some "light" tuning of get_jk() CUDA kernels (light because need straightforward port to portable programming model)
Added mini-app for evaluating tall-skinny matrix transposes
Updated Makefile and arch files to support mini-app builds
Enabled "not with_k" branch in get_jk()
Added exploratory code to bypass dfobj.loop() copy ; not needed at moment
Added support for per-device streams and handles
Added multi-gpu support for get_jk()
Temporarily disabled gpu-offload of hessop_get_veff because of issue with multi-gpu support
Added OpenMP thread affinity info to output
Updated arch and builds after Polaris@ALCF software upgrade

Hack H1EZip to not "unroll" the Hilbert-space layer and the single state layer together, so that averaging over the latter can be abstracted into the fcibox.fcisolvers individual RDM generators.

LASSIS unittest fails because lroots now needs to be passed from one level of LASSIS prepare_states to another, along with charges, spins, smults, and wfnsyms.

This now breaks lasscf_rdm instead, because the guess items are not CI vectors but RDM arrays.

lasscf_rdm transparently works for now

Include unambiguous statement that they average over local states, resolving issue MatthewRHermes#34.

The casdm1rs and casdm2r go into the slot for "rootspace 0" and "rootspace 1", respectively, that the usual algorithm would make. Also use setUpModule in test_lasscf_rdm.py

Use n(elec,hole)(u,d) instead of n(elec,hole)(a,b) to compute number of charge-separated single excitations. Changes total energies in test_c2h4n4, but all delta-E tests in test_lassis_targets_slow still pass.

A much simpler, but less flexible/more costly choice.

Ignore both the s-change of each fragment and the putative m value of the hopping electron, and assume that electrons can always come out of both doubly- and singly-occupied orbitals and can always go into both singly-occupied and unoccupied orbitals.

Test on H2 for lasscf geometry optimization

must implement smart lindep handling for the branch to be viable

since spin-shuffle spoils bitwise agreement

storage space

Still need to fix signs, but in a way which doesn't break the op_o1 op_o1 agreement check

hopefully make tests work

LASSI op_o1 now aggressively identifies linearly-equivalent rootspaces and caches the relevant unitary transformation matrices, in order to reduce the number of interactions computed explicitly, except for in the ContractHamCI class, where I still don't know how exactly to implement this.

Profile the timing of lassi op_o1 more thoroughly. Move two fns from sitools to citools (the latter being safer to import). Tighten exact-equivalence tolerance in lassi op_o1 screening.

For op_o1 profiling, use more descriptive names in the output and add it around the HamS2Ovlp full overlap matrix steps.

"unique" messages demoted to DEBUG

Adding keyword to plot only active space orbitals

MatthewRHermes and others added 30 commits March 20, 2024 13:58

h1e zip fcisolver nroots loop behavior (MatthewRHermes#34)

03f3f24

Hack H1EZip to not "unroll" the Hilbert-space layer and the single state layer together, so that averaging over the latter can be abstracted into the fcibox.fcisolvers individual RDM generators.

Store local state averaging data in las fcisolvers

608aa6a

Merge branch 'issue34_laslayer' into issue34

b56e5dd

lroots and lweights in lasscf.state_average

9656f61

LASSIS unittest fails because lroots now needs to be passed from one level of LASSIS prepare_states to another, along with charges, spins, smults, and wfnsyms.

fix buffer reallocs ; host backend correct again

0062e9b

simply transpose kernel to prepare for opt

dbbaf38

move lasscf guess-len manip to get_init_guess_ci

d10e0de

This now breaks lasscf_rdm instead, because the guess items are not CI vectors but RDM arrays.

refactor las get_init_guess_ci

4990503

lasscf_rdm transparently works for now

productstate guess ci vector math error fix

7135f55

implement lasscf_rdm._combine_init_guess_ci

ca8e862

delete comment

28d1463

Merge branch 'get_init_guess_ci_bugcheck' into lsa_lasscf

10ffb83

Docstrings for *_make_*dm functions of LASSCF.

52dac91

Include unambiguous statement that they average over local states, resolving issue MatthewRHermes#34.

[skip ci] resolve issue MatthewRHermes#34

ef4bf4a

LASSCF RDMSolver dump_chk support.

433b43d

The casdm1rs and casdm2r go into the slot for "rootspace 0" and "rootspace 1", respectively, that the usual algorithm would make. Also use setUpModule in test_lasscf_rdm.py

Create lasscf.py

041e323

tall-skinny transpose mini-app

8bba23d

More complete logging in LASSIS

6d13c64

change ncharge counting

b873b86

Use n(elec,hole)(u,d) instead of n(elec,hole)(a,b) to compute number of charge-separated single excitations. Changes total energies in test_c2h4n4, but all delta-E tests in test_lassis_targets_slow still pass.

lassis lroots sanity check assert

61c4ad6

create debug_lassis_targets_slow.py

05d0482

Merge branch 'debug_lassis_file' into lassis_lroots_counting

ad2335d

SLM-enabled tranpose for nonsquare matrices

55adf9b

update Makefiles and archs for to support mini-apps

0b52b91

lassis ncharge = min(norb[src],norb[dest])

b1d6a8f

A much simpler, but less flexible/more costly choice.

lassis ncharge spin-agnostic

76992fc

Ignore both the s-change of each fragment and the putative m value of the hopping electron, and assume that electrons can always come out of both doubly- and singly-occupied orbitals and can always go into both singly-occupied and unoccupied orbitals.

Create h2-631g.py

e065002

Test on H2 for lasscf geometry optimization

support for not with_k + additional timers

f4e7432

missed timer fixes

266b25a

LASSI analyze method kwargs

b2f9b99

MatthewRHermes and others added 29 commits April 23, 2024 20:04

safety commit

38af989

must implement smart lindep handling for the branch to be viable

smart lindep handling in LASSI kernel

ec640ad

more rigorous equality checking LSTDM1

b6eb893

since spin-shuffle spoils bitwise agreement

Log # of linearly independent states in LASSI

ca1ab3e

entmap as nested tuple rather than ndarray

7661a67

storage space

better redundancy checking in op_o1

fec6f8a

Still need to fix signs, but in a way which doesn't break the op_o1 op_o1 agreement check

tighten lassi op_o1 identity checking tol

2e461f1

hopefully make tests work

sitools umat_dot_1frag_

1a3d502

profiling & code cleanup & tighten equiv tol

46a4522

Profile the timing of lassi op_o1 more thoroughly. Move two fns from sitools to citools (the latter being safer to import). Tighten exact-equivalence tolerance in lassi op_o1 screening.

lassi.states -> lassi.spaces & op_o1 profile

44062f5

For op_o1 profiling, use more descriptive names in the output and add it around the HamS2Ovlp full overlap matrix steps.

Adding keyword to plot only active space orbitals

9196c06

update for Polaris upgrade

8809b33

clean up debug

4d543f8

op_o1 reduce verbosity

ac175a1

"unique" messages demoted to DEBUG

Merge pull request MatthewRHermes#89 from JangidBhavnesh/dev

ab148b2

Adding keyword to plot only active space orbitals

Merge branch 'dev' of https://github.com/MatthewRHermes/mrh into dev

650637e

protect when gpu has no work

48e40e1

assert: ref, not copy

21a2b11

LASSI slight refactor: LASSCF obj -> spacelist

2c9fda3

LASSIS refactor: reduce repeated spin_shuffle

471ea28

undo

a6c2030

undo

52486aa

Memory efficiency lassis _spin_shuffle_ci_ (MatthewRHermes#90)

de53375

PySCF version update

9f6179a

lassis spin_shuffle_ci nonsinglet ref (MatthewRHermes#91)

5e4d244

add openmp affinity info

2c9e3f3

update polaris arch

64c8a67

Merge branch 'dev' of https://github.com/MatthewRHermes/mrh into gpudev

1e54635

MatthewRHermes merged commit 3ccde0d into MatthewRHermes:gpu May 14, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpudev update #92

gpudev update #92

cjknight commented May 14, 2024

gpudev update #92

gpudev update #92

Conversation

cjknight commented May 14, 2024