Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with first and second derivatives o splines of density #10

Closed
3 tasks done
gabrielasd opened this issue Feb 1, 2024 · 9 comments
Closed
3 tasks done
Assignees
Labels
bug Something isn't working

Comments

@gabrielasd
Copy link
Collaborator

gabrielasd commented Feb 1, 2024

Description

A common issue across the compiled datasets with density properties (gaussian, numeric, slater) is that the tests for the 1st and 2nd derivatives of the density fail:

# FIXME: density derivatives tests fail.

@pytest.mark.xfail

@pytest.mark.xfail

The wrong derivatives are computed through the interpolated splines of the density (cubic splines). E.g. evaluation of 1st derivative of density from the Slater dataset tests:

spline = sp.interpolate_dens(spin='ab', log=False)
gradient = spline(grid, deriv=1)

rather than done analytically.

The reference data being used to compare against comes from the raw data (usually the 1st derivatives were stored), for e.g in the case of the Slater dataset:
https://github.com/theochem/AtomDB/blob/85031c5fc08b7c920a093c84ab4f6458b5d48825/atomdb/test/test_wfn_slater.py#L139-142
or by using Numpy gradient.

One problem may be that the log keyword argument should be set to True, although I haven't seen before that it fixes the problem. It also may be that the values that make the tests crash are derivatives of the density close to the atomic center.

Tasks & Progress

  • Revise gradient tests
  • Check spline interpolation function cubic_interp
  • Fix tests
@gabrielasd gabrielasd added the bug Something isn't working label Feb 1, 2024
@msricher
Copy link
Collaborator

I don't see anything wrong with the simple (non-log) spline. I think it may be that we should spline the derivatives directly rather than just taking derivatives of the original spline.

By the way, I think the proper use of np.gradient should be to specify the spacings rather than a single stepsize. I do this: gradient = np.gradient(dens, sp.rs).

I'm not sure what else we can do here.

@PaulWAyers
Copy link
Member

After computing derivative values at the available points, I think that (re)splining the derivative and storing them is a good idea. In some cases we could do that directly from reference data, but I'm not sure it is worth it....

@gabrielasd
Copy link
Collaborator Author

I think it may be that we should spline the derivatives directly rather than just taking derivatives of the original spline.

We could do it for the databases based on Hartree-Fock numeric and using Slater basis. The problem are the other two that use Gaussian basis sets. For those we have the issue with the need to spherically average the density property.
These are comments in the old repository about this:
https://github.com/QuantumElephant/atomdb/issues/6#issuecomment-805169710
https://github.com/QuantumElephant/atomdb/issues/6#issuecomment-1079788959

So we ended only storing the densities (spherically averaged in the case where Gaussian basis are used), and getting the derivatives from the splines.

@PaulWAyers
Copy link
Member

I was thinking something simpler: spline the density and compute the first derivative at the initial points. Then make a new spline of the derivative from that numeric data. This is slightly different from just taking the derivative of the spline, as it accomodates higher-order differentiation and (more importantly) means that one doesn't compute, but merely accesses, derivatives from their corresponding spline.

Does this make sense?

@marco-2023
Copy link
Collaborator

I am sorry if this is a dumb question (this is new to me). If I understand it correctly the problem is for the database using gaussian functions. Yesterday @gabrielasd and I were playing with the following idea:

# 1. Create an atomic grid using grid.
atgrid = AtomGrid(...)

# 2. Compute the gradient at the grid points using gbasis
rho, grad = gbasis(...)

# compute the projection of the gradient in the radial coordinate for each point
rad_grad = np.einsum('ij,ij -> i', grad, atgrid.points) / np.linalg.norm(atgrid.points)

# 3. find the spherical average of the points' gradient radial component
# (this is done by grid and returns a spline)
ragd_grad_spline = spherical_average(...)

Can this approach be used?

@gabrielasd
Copy link
Collaborator Author

I was thinking something simpler: spline the density and compute the first derivative at the initial points. Then make a new spline of the derivative from that numeric data. This is slightly different from just taking the derivative of the spline, as it accomodates higher-order differentiation and (more importantly) means that one doesn't compute, but merely accesses, derivatives from their corresponding spline.

@PaulWAyers I tried this but I didn't found it to work either.
Because the numeric dataset has reference values for gradient and laplacian, I made the spline from that gradient array (skipping the 1st step you indicated), and computed the laplacian as the 1st derivative of that spline. But those values do not match the available reference data, there is a huge difference close to the nuclei center.

@gabrielasd
Copy link
Collaborator Author

@PaulWAyers @msricher @marco-2023
This is a status update of this issue:
We reviewed the tests for the gradient and laplacian of the density for the three datasets that support this type of properties (changes introduced in PR #28). The purpose was to verify whether these derivatives, obtained from the cubic spline of the density, were being computed correctly.

Like @msricher commented above, there doesn't seem to be any problem with the function cubic_interp itself:
https://github.com/theochem/AtomDB/blob/1830ecff53b7bcef9b47e6635d4fe8696750a083/atomdb/api.py#L441C5-L441C17
The interpolation is just not very good close to the nuclei (eve if log of density is what is being interpolated).

The bigger problem is the 2nd derivative, all tests for this fail for all datasets.
For the gradient, the tests based on the numeric and slater datasets (which have reference values to compare against) do pass, although excluding points close to the center with the numeric HF dataset. With the gaussian dataset both gradient and lapalcian tests still fail.

On the other hand, @marco-2023 figured a way to get the spherically averaged gradient and laplacian for the datasets that rely on gaussian basis sets; with this we would not need to get them through the splines of the density anymore. In the API we would still build the interpolation functions, but it would be directly from the stored arrays.

@msricher
Copy link
Collaborator

msricher commented Mar 5, 2024

There's likely nothing we can do about the close-to-the-nucleus problem, the function can't be modelled by a cubic there. I think it's okay to test from a certain radius outwards. We should just check, when evaluating quantities that involve integrating over 3D space (esp. for a promolecule), that we recover sensible results.

@gabrielasd
Copy link
Collaborator Author

I'll close this issue since after PR #41 we no longer evaluate the derivatives of the density from its cubic splines.
Now we store the values of the derivatives evaluated analytically (two of the raw sources of data already supported them, the HF numeric and HF with Slater basis ones)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants