Excessive loading times for short-cadence curves #120

LucaNap · 2024-07-14T14:43:51Z

Hi everyone. I just wanted to say that when trying to fit 20-second TESS data, the "loading times" (before the fit actually starts) are very long for me (30-90 minutes??). I found this to be the case with multiple targets, even when there are only two sectors. Is this happening to me only? I am currently using version 2.2.4 of juliet in Windows.

LucaNap · 2024-07-18T17:05:32Z

Ok, I admit the solution might be as simple as remembering to remove all datapoints with flux=0, as it seems that juliet is starting way quicker now. In any case, there should be a warning somewhere if there's something wrong in the light curve (such as null fluxes) which make the loading time incredibly long.

Jayshil · 2024-11-22T15:06:13Z

Hi @LucaNap (cc @nespinoza),

I think this may be a real issue with juliet, and it likely happens because of a large number of data points. The total number of data points in one TESS sector is about ~50000 (or more) for 20-second data. When there are multiple sectors, this number increases rapidly.

juliet internally creates big time and flux etc. arrays that store data points from all instruments in one array. juliet currently uses two for loops to create these arrays and that seems to be taking too much time. In particular, I think the second for loop in Line 314 is problematic: it basically loops over all data points and append them in one big array. When the total number of data points is large, this for loop would take a ridiculously long time.

@nespinoza: this should be fairly easy to fix. The way I see it, we can simply remove this second for loop in Line 314 and replace it with np.hstack command like the following (the following code stars from Line 313):

for instrument in instruments:
    all_times = np.hstack(( all_times, t[instrument] ))
    all_y = np.hstack(( all_y, y[instrument] ))
    all_yerr = np.hstack(( all_yerr, yerr[instrument] ))
    all_instruments = np.hstack(( all_instruments, np.repeat(instrument, len(t[instrument]) ) ))

This is way more faster than the for loops. It doesn't matter in general when the total number of data points is not big, but it really starts taking too much time when the total number of data point increases. I am not sure if there were any particular reasons for using the for loops, but I'd suggest to use the code above. Let me know what you think about this.

Cheers,
Jayshil

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive loading times for short-cadence curves #120

Excessive loading times for short-cadence curves #120

LucaNap commented Jul 14, 2024

LucaNap commented Jul 18, 2024

Jayshil commented Nov 22, 2024

Excessive loading times for short-cadence curves #120

Excessive loading times for short-cadence curves #120

Comments

LucaNap commented Jul 14, 2024

LucaNap commented Jul 18, 2024

Jayshil commented Nov 22, 2024