Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive loading times for short-cadence curves #120

Open
LucaNap opened this issue Jul 14, 2024 · 2 comments
Open

Excessive loading times for short-cadence curves #120

LucaNap opened this issue Jul 14, 2024 · 2 comments

Comments

@LucaNap
Copy link

LucaNap commented Jul 14, 2024

Hi everyone. I just wanted to say that when trying to fit 20-second TESS data, the "loading times" (before the fit actually starts) are very long for me (30-90 minutes??). I found this to be the case with multiple targets, even when there are only two sectors. Is this happening to me only? I am currently using version 2.2.4 of juliet in Windows.

@LucaNap
Copy link
Author

LucaNap commented Jul 18, 2024

Ok, I admit the solution might be as simple as remembering to remove all datapoints with flux=0, as it seems that juliet is starting way quicker now. In any case, there should be a warning somewhere if there's something wrong in the light curve (such as null fluxes) which make the loading time incredibly long.

@Jayshil
Copy link
Contributor

Jayshil commented Nov 22, 2024

Hi @LucaNap (cc @nespinoza),

I think this may be a real issue with juliet, and it likely happens because of a large number of data points. The total number of data points in one TESS sector is about ~50000 (or more) for 20-second data. When there are multiple sectors, this number increases rapidly.

juliet internally creates big time and flux etc. arrays that store data points from all instruments in one array. juliet currently uses two for loops to create these arrays and that seems to be taking too much time. In particular, I think the second for loop in Line 314 is problematic: it basically loops over all data points and append them in one big array. When the total number of data points is large, this for loop would take a ridiculously long time.

@nespinoza: this should be fairly easy to fix. The way I see it, we can simply remove this second for loop in Line 314 and replace it with np.hstack command like the following (the following code stars from Line 313):

for instrument in instruments:
    all_times = np.hstack(( all_times, t[instrument] ))
    all_y = np.hstack(( all_y, y[instrument] ))
    all_yerr = np.hstack(( all_yerr, yerr[instrument] ))
    all_instruments = np.hstack(( all_instruments, np.repeat(instrument, len(t[instrument]) ) ))

This is way more faster than the for loops. It doesn't matter in general when the total number of data points is not big, but it really starts taking too much time when the total number of data point increases. I am not sure if there were any particular reasons for using the for loops, but I'd suggest to use the code above. Let me know what you think about this.

Cheers,
Jayshil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants