Problems with first and second derivatives o splines of density #10

gabrielasd · 2024-02-01T22:38:42Z

Description

A common issue across the compiled datasets with density properties (gaussian, numeric, slater) is that the tests for the 1st and 2nd derivatives of the density fail:

AtomDB/atomdb/test/test_gaussian.py

Line 78 in 85031c5

# FIXME: density derivatives tests fail.

AtomDB/atomdb/test/test_numeric.py

Line 143 in 85031c5

@pytest.mark.xfail

AtomDB/atomdb/test/test_wfn_slater.py

Line 128 in 85031c5

@pytest.mark.xfail

The wrong derivatives are computed through the interpolated splines of the density (cubic splines). E.g. evaluation of 1st derivative of density from the Slater dataset tests:

AtomDB/atomdb/test/test_wfn_slater.py

Lines 137 to 138 in 85031c5

    
           spline = sp.interpolate_dens(spin='ab', log=False) 
        
           gradient = spline(grid, deriv=1)

rather than done analytically.

The reference data being used to compare against comes from the raw data (usually the 1st derivatives were stored), for e.g in the case of the Slater dataset:
https://github.com/theochem/AtomDB/blob/85031c5fc08b7c920a093c84ab4f6458b5d48825/atomdb/test/test_wfn_slater.py#L139-142
or by using Numpy gradient.

One problem may be that the log keyword argument should be set to True, although I haven't seen before that it fixes the problem. It also may be that the values that make the tests crash are derivatives of the density close to the atomic center.

Tasks & Progress

Revise gradient tests
Check spline interpolation function cubic_interp
Fix tests

The text was updated successfully, but these errors were encountered:

msricher · 2024-02-14T20:00:46Z

I don't see anything wrong with the simple (non-log) spline. I think it may be that we should spline the derivatives directly rather than just taking derivatives of the original spline.

By the way, I think the proper use of np.gradient should be to specify the spacings rather than a single stepsize. I do this: gradient = np.gradient(dens, sp.rs).

I'm not sure what else we can do here.

PaulWAyers · 2024-02-15T00:45:05Z

After computing derivative values at the available points, I think that (re)splining the derivative and storing them is a good idea. In some cases we could do that directly from reference data, but I'm not sure it is worth it....

gabrielasd · 2024-02-16T00:38:07Z

I think it may be that we should spline the derivatives directly rather than just taking derivatives of the original spline.

We could do it for the databases based on Hartree-Fock numeric and using Slater basis. The problem are the other two that use Gaussian basis sets. For those we have the issue with the need to spherically average the density property.
These are comments in the old repository about this:
https://github.com/QuantumElephant/atomdb/issues/6#issuecomment-805169710
https://github.com/QuantumElephant/atomdb/issues/6#issuecomment-1079788959

So we ended only storing the densities (spherically averaged in the case where Gaussian basis are used), and getting the derivatives from the splines.

PaulWAyers · 2024-02-16T16:04:34Z

I was thinking something simpler: spline the density and compute the first derivative at the initial points. Then make a new spline of the derivative from that numeric data. This is slightly different from just taking the derivative of the spline, as it accomodates higher-order differentiation and (more importantly) means that one doesn't compute, but merely accesses, derivatives from their corresponding spline.

Does this make sense?

marco-2023 · 2024-02-16T16:47:45Z

I am sorry if this is a dumb question (this is new to me). If I understand it correctly the problem is for the database using gaussian functions. Yesterday @gabrielasd and I were playing with the following idea:

# 1. Create an atomic grid using grid.
atgrid = AtomGrid(...)

# 2. Compute the gradient at the grid points using gbasis
rho, grad = gbasis(...)

# compute the projection of the gradient in the radial coordinate for each point
rad_grad = np.einsum('ij,ij -> i', grad, atgrid.points) / np.linalg.norm(atgrid.points)

# 3. find the spherical average of the points' gradient radial component
# (this is done by grid and returns a spline)
ragd_grad_spline = spherical_average(...)

Can this approach be used?

gabrielasd · 2024-03-05T20:41:41Z

I was thinking something simpler: spline the density and compute the first derivative at the initial points. Then make a new spline of the derivative from that numeric data. This is slightly different from just taking the derivative of the spline, as it accomodates higher-order differentiation and (more importantly) means that one doesn't compute, but merely accesses, derivatives from their corresponding spline.

@PaulWAyers I tried this but I didn't found it to work either.
Because the numeric dataset has reference values for gradient and laplacian, I made the spline from that gradient array (skipping the 1st step you indicated), and computed the laplacian as the 1st derivative of that spline. But those values do not match the available reference data, there is a huge difference close to the nuclei center.

gabrielasd · 2024-03-05T21:01:20Z

@PaulWAyers @msricher @marco-2023
This is a status update of this issue:
We reviewed the tests for the gradient and laplacian of the density for the three datasets that support this type of properties (changes introduced in PR #28). The purpose was to verify whether these derivatives, obtained from the cubic spline of the density, were being computed correctly.

Like @msricher commented above, there doesn't seem to be any problem with the function cubic_interp itself:
https://github.com/theochem/AtomDB/blob/1830ecff53b7bcef9b47e6635d4fe8696750a083/atomdb/api.py#L441C5-L441C17
The interpolation is just not very good close to the nuclei (eve if log of density is what is being interpolated).

The bigger problem is the 2nd derivative, all tests for this fail for all datasets.
For the gradient, the tests based on the numeric and slater datasets (which have reference values to compare against) do pass, although excluding points close to the center with the numeric HF dataset. With the gaussian dataset both gradient and lapalcian tests still fail.

On the other hand, @marco-2023 figured a way to get the spherically averaged gradient and laplacian for the datasets that rely on gaussian basis sets; with this we would not need to get them through the splines of the density anymore. In the API we would still build the interpolation functions, but it would be directly from the stored arrays.

msricher · 2024-03-05T22:13:29Z

There's likely nothing we can do about the close-to-the-nucleus problem, the function can't be modelled by a cubic there. I think it's okay to test from a certain radius outwards. We should just check, when evaluating quantities that involve integrating over 3D space (esp. for a promolecule), that we recover sensible results.

gabrielasd · 2024-03-26T17:11:31Z

I'll close this issue since after PR #41 we no longer evaluate the derivatives of the density from its cubic splines.
Now we store the values of the derivatives evaluated analytically (two of the raw sources of data already supported them, the HF numeric and HF with Slater basis ones)

gabrielasd assigned gabrielasd and marco-2023 Feb 1, 2024

gabrielasd added the bug Something isn't working label Feb 1, 2024

gabrielasd closed this as completed Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with first and second derivatives o splines of density #10

Problems with first and second derivatives o splines of density #10

gabrielasd commented Feb 1, 2024 •

edited

Loading

msricher commented Feb 14, 2024

PaulWAyers commented Feb 15, 2024

gabrielasd commented Feb 16, 2024

PaulWAyers commented Feb 16, 2024

marco-2023 commented Feb 16, 2024

gabrielasd commented Mar 5, 2024

gabrielasd commented Mar 5, 2024

msricher commented Mar 5, 2024

gabrielasd commented Mar 26, 2024

Problems with first and second derivatives o splines of density #10

Problems with first and second derivatives o splines of density #10

Comments

gabrielasd commented Feb 1, 2024 • edited Loading

Description

Tasks & Progress

msricher commented Feb 14, 2024

PaulWAyers commented Feb 15, 2024

gabrielasd commented Feb 16, 2024

PaulWAyers commented Feb 16, 2024

marco-2023 commented Feb 16, 2024

gabrielasd commented Mar 5, 2024

gabrielasd commented Mar 5, 2024

msricher commented Mar 5, 2024

gabrielasd commented Mar 26, 2024

gabrielasd commented Feb 1, 2024 •

edited

Loading