Implementation of the power spectrum inference with GPs #832

mlefkir · 2024-08-01T12:24:46Z

This is the Python-JAX implementation of a method to infer the power spectral density of irregular time series using Gaussian process regression. The method is described in a forthcoming paper and in the Julia package Pioran.jl, it relies on approximating a bending power-law model in a sum of scalable kernels implemented in tinygp.

Relevant Issue(s)/PR(s)

Provide an overview of the implemented solution or the fix and elaborate on the modifications.

Is there a new dependency introduced by your contribution? If so, please specify.

Any other comments?

pep8speaks · 2024-08-01T12:24:53Z

Hello @mlefkir! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file stingray/modeling/gpmodeling.py:

Line 98:80: E501 line too long (85 > 79 characters)
Line 157:80: E501 line too long (95 > 79 characters)
Line 185:80: E501 line too long (92 > 79 characters)
Line 186:80: E501 line too long (93 > 79 characters)
Line 187:80: E501 line too long (83 > 79 characters)
Line 205:80: E501 line too long (86 > 79 characters)
Line 216:80: E501 line too long (97 > 79 characters)
Line 263:80: E501 line too long (83 > 79 characters)
Line 266:80: E501 line too long (92 > 79 characters)
Line 267:80: E501 line too long (99 > 79 characters)
Line 270:80: E501 line too long (89 > 79 characters)
Line 325:80: E501 line too long (81 > 79 characters)
Line 328:80: E501 line too long (86 > 79 characters)
Line 352:80: E501 line too long (94 > 79 characters)
Line 390:80: E501 line too long (91 > 79 characters)
Line 391:80: E501 line too long (88 > 79 characters)
Line 394:80: E501 line too long (94 > 79 characters)
Line 396:80: E501 line too long (87 > 79 characters)
Line 397:80: E501 line too long (86 > 79 characters)
Line 412:80: E501 line too long (87 > 79 characters)
Line 413:80: E501 line too long (86 > 79 characters)
Line 426:80: E501 line too long (94 > 79 characters)
Line 454:80: E501 line too long (92 > 79 characters)
Line 484:80: E501 line too long (81 > 79 characters)
Line 487:80: E501 line too long (86 > 79 characters)
Line 498:80: E501 line too long (95 > 79 characters)
Line 539:80: E501 line too long (92 > 79 characters)
Line 542:80: E501 line too long (91 > 79 characters)
Line 590:80: E501 line too long (86 > 79 characters)
Line 594:80: E501 line too long (80 > 79 characters)
Line 606:80: E501 line too long (94 > 79 characters)
Line 609:80: E501 line too long (96 > 79 characters)
Line 614:80: E501 line too long (89 > 79 characters)
Line 702:80: E501 line too long (89 > 79 characters)
Line 995:80: E501 line too long (92 > 79 characters)
Line 1026:80: E501 line too long (83 > 79 characters)
Line 1044:80: E501 line too long (81 > 79 characters)
Line 1045:80: E501 line too long (89 > 79 characters)
Line 1046:80: E501 line too long (87 > 79 characters)
Line 1194:80: E501 line too long (81 > 79 characters)
Line 1195:80: E501 line too long (89 > 79 characters)
Line 1247:80: E501 line too long (94 > 79 characters)
Line 1317:80: E501 line too long (91 > 79 characters)
Line 1321:80: E501 line too long (92 > 79 characters)

In the file stingray/modeling/tests/test_gpmodeling.py:

Line 118:80: E501 line too long (81 > 79 characters)
Line 145:80: E501 line too long (90 > 79 characters)
Line 336:80: E501 line too long (94 > 79 characters)
Line 342:80: E501 line too long (80 > 79 characters)
Line 347:80: E501 line too long (83 > 79 characters)
Line 350:80: E501 line too long (91 > 79 characters)
Line 362:80: E501 line too long (98 > 79 characters)
Line 369:80: E501 line too long (91 > 79 characters)
Line 384:80: E501 line too long (91 > 79 characters)
Line 392:80: E501 line too long (90 > 79 characters)
Line 400:80: E501 line too long (91 > 79 characters)
Line 404:80: E501 line too long (83 > 79 characters)
Line 411:80: E501 line too long (89 > 79 characters)
Line 412:80: E501 line too long (96 > 79 characters)

Comment last updated at 2024-10-03 11:36:06 UTC

matteobachetti · 2024-09-11T08:16:18Z

@mlefkir thanks for your contribution to Stingray, and sorry for my late reply!
I'm not an expert in this kind of methods (@Gaurav17Joshi implemented the current GP model and might have some comments). Would you mind showing the usage of this method through an example?

From the point of view of the requirements for PRs to Stingray:

most formatting issues can be solved by running black on the new files. See https://docs.stingray.science/en/stable/contributing.html#coding-style-and-conventions. Don't worry about PEP8 still complaining after black has run ;)
A changelog entry is highly appreciated, it helps keeping track of the evolution of Stingray. https://docs.stingray.science/en/stable/contributing.html#updating-and-maintaining-the-changelog

Gaurav17Joshi · 2024-09-11T10:48:42Z

Yes, @mlefkir, do you have any use case example for this method (Something like a .ipynb notebook). Also @dhuppenkothen, can you also have a look into the usefulness of this method for Stingray, and whether it should go into the same file as the gpmodeling part?

mlefkir · 2024-09-19T16:48:38Z

@Gaurav17Joshi @matteobachetti I made two examples available here Examples, one uses nested sampling with the jaxns sampler already called in Stingray and the other one uses NUTS with NumPyro.
I will update the changelog at some point.

matteobachetti · 2024-10-03T15:11:10Z

@mlefkir I'm playing with your PR. Really sorry for the slow progress, but my knowledge of these methods is pretty poor and it takes me a lot of time to just understand how it works, and... my agenda is pretty full 😅 .
Just to guide me in the review: does your model also work with the spectral shapes that @Gaurav17Joshi set up? Is it just working with unevenly sampled data with error bars? What is the difference between your implementation and the existing one?
Also, feel free to advertise your paper when it is ready!

mlefkir · 2024-10-21T10:06:29Z

@matteobachetti This model is an improvement on available models. Currently, available models have a fixed low and high-frequency slope. For instance, the DRW/Exponential kernel has a Lorentzian power spectrum with a low-frequency slope 0 and a high-frequency slope of -2. This method allows modelling spectral shapes with flexible bends frequencies and slopes which can be between 0 and 4 using a sum of basis functions as shown in the figure below:

This method is designed for Gaussian process regression so it can be used for any Gaussian time series with (or without) error bars. While the algorithm in tinygp is fast there are limitations on the number of points in terms of computational cost so I would use the method only for irregularly sampled data or data with gaps with less than 10,000 points.
This is a simple implementation of the method which should be easier to maintain than the Python library I wrote a few months ago, I intend to archive it at some point.

mlefkir · 2025-01-22T21:21:59Z

@matteobachetti If you are interested, the preprint of the paper describing the method is available here: https://arxiv.org/pdf/2501.05886

matteobachetti

@mlefkir Sorry for the long wait. This method is very interesting and certainly deserves inclusion in Stingray. In my tests, I found that DefaultNestedSampler is not recognized, I think jaxns has changed its API, but monkeypatching it to point toNestedSampler made it work.

Besides this, everything seems to work and I think the best test is now put it out there and let people use it. The method itself, in the Python implementation, seems quite slow (the sampling took hours in my laptop). Do we need to do anything specific to make it work with GPUs? My computer does not have one, so I don't know how to test this.
We need to maybe have it documented properly. Would you mind also adding the Pioran stingray notebooks you already shared, possibly with some more in-depth explanation of the methods, to our notebook repository?

Minor thing: the branch needs to be merged or rebased with main. I also suggest to run black -l 100 stingray/ from the main directory and commit the results in order to solve the code style failure

matteobachetti · 2025-01-27T12:29:49Z

stingray/modeling/tests/test_gpmodeling.py

+        fig1, fig2 = run_prior_checks(
+            self.kernel_type, self.kernel_params, self.priors, loglike, 5.5e-2, 0.5
+        )
+        plt.fignum_exists(1)


Note that this would also work if some other test has opened a figure.

matteobachetti · 2025-01-27T12:30:47Z

stingray/modeling/gpmodeling.py

+    noise_color = "C5"
+    window_color = "k"
+
+    fig, ax = plt.subplots(1, 1, figsize=(6, 4))


Probably good to give a unique name to these plots (also good for testing purposes!)

codecov · 2025-02-08T07:30:54Z

Codecov Report

Attention: Patch coverage is 24.21053% with 144 lines in your changes missing coverage. Please review.

Project coverage is 94.71%. Comparing base (8896659) to head (30aab7f).

Files with missing lines	Patch %	Lines
stingray/modeling/gpmodeling.py	24.21%	144 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #832      +/-   ##
==========================================
- Coverage   96.03%   94.71%   -1.33%     
==========================================
  Files          48       48              
  Lines        9770     9950     +180     
==========================================
+ Hits         9383     9424      +41     
- Misses        387      526     +139

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dhuppenkothen

Hey @mlefkir! This is great, thank you for submitting this, and I'm sorry it's taken me so long to get to this!!!

Do you perhaps have a Jupyter notebook for inclusion in the documentation? If so, it'd be great if you could share that with me, because it'd also help me understand how the API is supposed to work. :D

Overall, great work! I'm so excited we get to have this in Stingray!

In terms of comments, I mostly have nitpicky comments about doc strings and code consistency:

make sure your docstrings are consistent with the code (I've found a few places where the docstring does not match the function definition).
expand your doc strings. Users will use these to try and understand what the function does, what its inputs are and what it returns. Make sure there's enough information for users to understand (1) what the function does and how, (2) what they need to put into the function, what form those inputs take, and (3) what comes out of the function. I've highlighted a bunch of places where I've had trouble figuring out what's happening, but probably not all of them.
code consistency: make sure that there's consistency in your code across functions. For example, ideally input variables of the same name should always take the same kind of input type and input information, and variable names should match the kind of information put in. Document extensively where you break that pattern for whatever reason.
Make sure that you implement consistent behaviour across functions. For example, some functions produce plots that are saved to file, other just return the Figure or Figure and Axes objects. Unless necessary, make sure that you're consistent in what these functions do, and document extensively where you break the pattern.

dhuppenkothen · 2025-02-12T10:19:55Z

stingray/modeling/gpmodeling.py

+        Log likelihood function.
+    n_samples : int
+        Number of samples. Default is 3000.
+    """


I have some nitpicks documentation questions here. From just looking at the docs/function, it's really hard to understand what's supposed to go in here. For example, kernel_params is missing from the list of parameters. Is this a list of strings? Of values? What goes into priors? A list of what? Functions? Something else? Be aware that these are the kinds of things a new user will ask themselves.

dhuppenkothen · 2025-02-12T10:21:05Z

stingray/modeling/gpmodeling.py

+
+    # get the prior model
+    prior_dict = dict(zip(kernel_params, priors))
+    prior_model = get_prior(kernel_params, prior_dict)


These lines imply that the priors in priors need to be in the correct order according to the order in kernel_params, yes? Why not automatically enforce this (and avoid user error), by forcing the user to put in that dictionary already, rather than construct it here from two lists?

dhuppenkothen · 2025-02-12T10:24:59Z

stingray/modeling/gpmodeling.py

+    approximate_with="SHO",
+    with_normalisation=False,
+):  # -> tuple[NDArray[Any], NDArray[Any]]:
+    """Get the PSD and the approximate PSD for a given set of parameters and samples.


I'm not entirely sure what this does? Specifically, why does this function need prior samples? Computing the PSD and approximate PSD should be analytical, no?

Note that there are parameters above that are not in the list of parameters. On the other hand, the docstring below claims that there's a parameter n_samples that does not appear in the list of parameters in the function definition. I think it might be worth going through the doc strings and making sure they're consistent. It'd also be good to expand the information in them (both in terms of what the function does, and especially in terms of what goes into the input attributes, to help new users understand what they need to do.

So, having read the function itself, I'm wondering whether it would make sense to split out the making of the PSD/approx PSD for a single set of parameter, and then have a function that loops over a set of samples (do they have to be prior samples, by the way, the way the function definition insists?). Feel free to tell me I'm wrong, but I could well imagine that someone would like to just take a single set of parameters and compute the PSD and approximate PSD from those.

dhuppenkothen · 2025-02-12T10:27:55Z

stingray/modeling/gpmodeling.py

+    for k in range(n_samples):
+        param_dict = {}
+        for i, params in enumerate(kernel_params):
+            if params[0:4] == "log_":


What does this line do? That seems to hardcode some kind of log-values? What for? In what ways could that break, either due to using a different kernel, or due to user error?

dhuppenkothen · 2025-02-12T10:30:31Z

stingray/modeling/gpmodeling.py

+        param_dict = {}
+        for i, params in enumerate(kernel_params):
+            if params[0:4] == "log_":
+                param_dict[params[4:]] = jnp.exp(prior_samples[params][k])


I'm probably just confused here, but: these two lines seem to indicate that if the first four entries in params start with log, we'll exponentiate the other ones? I.e. not the ones with log_ in front of the name? Is that what's intended or am I missing something here?

dhuppenkothen · 2025-02-12T15:05:45Z

stingray/modeling/gpmodeling.py

+
+
+def get_psd_approx_samples(
+    f, kernel_type, kernel_params, f_min, f_max, n_approx_components=20, approximate_with="SHO"


I'm mildly confused why this has "samples" in the name?

dhuppenkothen · 2025-02-12T15:06:17Z

stingray/modeling/gpmodeling.py

+        n_approx_components=n_approx_components,
+        approximate_with=approximate_with,
+    )
+    psd_SHO = SHO_power_spectrum(f, a[..., None], f_c[..., None]).sum(axis=0)


Does this function return anything? Right now, it doesn't seem to?

dhuppenkothen · 2025-02-12T15:06:38Z

stingray/modeling/gpmodeling.py


-def get_kernel(kernel_type, kernel_params):


Needs docstring?

dhuppenkothen · 2025-02-12T15:09:19Z

stingray/modeling/gpmodeling.py

+        raise NotImplementedError(f"Approximation {approximate_with} not implemented")
+
+
+def _psd_model(kernel_type, kernel_params):


Curious why this has a "_" in front of it? Do we not expect users to want to get the power spectrum model, or is there another function that fulfils this task?

dhuppenkothen · 2025-02-12T15:11:26Z

stingray/modeling/gpmodeling.py

@@ -50,11 +650,22 @@ def get_kernel(kernel_type, kernel_params):
    kernel_type: string


So, I think your code uses kernel_type to mean the power spectral shape, whereas the previous use was for the actual covariance function used in the GP, no? It'd be good to make sure these don't get confused?

matteobachetti requested review from dhuppenkothen and matteobachetti September 3, 2024 07:05

mlefkir added 20 commits October 3, 2024 13:36

add main method

94b2951

correct issue with kernels

0fdd187

set the bends as log

21e515e

update deprecated jaxns calls

e185f4a

change parametrisation of pl and add scale err

7db4d33

update method and add tests

c4e630c

add log transform

5686741

fix issue with log_transform

7f69f31

fix indentation

e6d0d87

add get approximate psd

a50a73b

add diagnostics for the approximation

27da9f1

remove the log of constant mean if logtransform of the data

653a028

add the missing line...

02f9676

formatting

7f7b2c7

add posterior ppc and change kernel def

35519a1

add tests and format file

99eba26

fix pep8 issues

0acdcb6

fix formatting and strange line

03abb8d

fix pep8 issues

6bf28d1

last pep8 issues fixed?

8a74cd9

matteobachetti force-pushed the gp_psd branch from ce34d01 to 8a74cd9 Compare October 3, 2024 11:36

matteobachetti requested changes Feb 5, 2025

View reviewed changes

Merge branch 'main' into gp_psd

30aab7f

dhuppenkothen requested changes Feb 12, 2025

View reviewed changes

matteobachetti mentioned this pull request Feb 24, 2025

AttributeError in GP modeling with jaxns integration #812

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of the power spectrum inference with GPs #832

Implementation of the power spectrum inference with GPs #832

mlefkir commented Aug 1, 2024

pep8speaks commented Aug 1, 2024 •

edited

Loading

matteobachetti commented Sep 11, 2024

Gaurav17Joshi commented Sep 11, 2024

mlefkir commented Sep 19, 2024

matteobachetti commented Oct 3, 2024

mlefkir commented Oct 21, 2024

mlefkir commented Jan 22, 2025

matteobachetti left a comment •

edited

Loading

matteobachetti Jan 27, 2025

matteobachetti Jan 27, 2025

codecov bot commented Feb 8, 2025 •

edited

Loading

dhuppenkothen left a comment

dhuppenkothen Feb 12, 2025

dhuppenkothen Feb 12, 2025

dhuppenkothen Feb 12, 2025

dhuppenkothen Feb 12, 2025

dhuppenkothen Feb 12, 2025

dhuppenkothen Feb 12, 2025

dhuppenkothen Feb 12, 2025

dhuppenkothen Feb 12, 2025

dhuppenkothen Feb 12, 2025

dhuppenkothen Feb 12, 2025

dhuppenkothen Feb 12, 2025



		def get_psd_approx_samples(
		f, kernel_type, kernel_params, f_min, f_max, n_approx_components=20, approximate_with="SHO"

		raise NotImplementedError(f"Approximation {approximate_with} not implemented")


		def _psd_model(kernel_type, kernel_params):

		@@ -50,11 +650,22 @@ def get_kernel(kernel_type, kernel_params):
		kernel_type: string

Implementation of the power spectrum inference with GPs #832

Are you sure you want to change the base?

Implementation of the power spectrum inference with GPs #832

Conversation

mlefkir commented Aug 1, 2024

Relevant Issue(s)/PR(s)

Provide an overview of the implemented solution or the fix and elaborate on the modifications.

Is there a new dependency introduced by your contribution? If so, please specify.

Any other comments?

pep8speaks commented Aug 1, 2024 • edited Loading

Comment last updated at 2024-10-03 11:36:06 UTC

matteobachetti commented Sep 11, 2024

Gaurav17Joshi commented Sep 11, 2024

mlefkir commented Sep 19, 2024

matteobachetti commented Oct 3, 2024

mlefkir commented Oct 21, 2024

mlefkir commented Jan 22, 2025

matteobachetti left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Feb 8, 2025 • edited Loading

Codecov Report

dhuppenkothen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Aug 1, 2024 •

edited

Loading

matteobachetti left a comment •

edited

Loading

codecov bot commented Feb 8, 2025 •

edited

Loading