Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a fast path for evenly-spaced abscissae #198

Merged
merged 14 commits into from
Jul 31, 2024
Merged

Add a fast path for evenly-spaced abscissae #198

merged 14 commits into from
Jul 31, 2024

Conversation

DaniGlez
Copy link
Contributor

@DaniGlez DaniGlez commented Oct 27, 2023

Currently, all interpolation methods but Lagrangian perform a two-step procedure to determine the index of the datapoints / spline weights to use in the corresponding expression: first, a call to bracketstrictlymontonic to get a bracket of the viable indices and then a call to searchsortedfirst (respectively, searchsortedlast) using the bracket as an initial guess¹. bracketstrictlymontonic performs an accelerating expansion of the bracket from a size 1 integer interval to the corresponding side; however, this behavior is implicitly disabled right now, because the provided initial guess is 0 which triggers a bailout path in which the function essentially does nothing.

The current PR enables this behavior with a linear guess. This is a "greedy" approach that aims to provide faster performance for cases where the data is equispaced (many cases, in practice) or nearly so, with hopefully a small penalty otherwise. The first goal is clearly met by the PR (puzzingly, it's even faster than the range shortcut added in #192, not sure why really), while the second one is not, at the moment (see benchmarks in the subsequent comment).

¹ As an additional note, looking into Base code it seems that the bracket is indeed expected to contain the "correct" solution, which would require a stronger guarantee than the one given in the bracketstrictlymontonic docstring (its behavior might provide it, though? not 100% sure).

@DaniGlez
Copy link
Contributor Author

Linearly equispaced data coordinates:

image

image

image

using BenchmarkTools, DataInterpolations

# Equispaced
begin
    exec_times = map(3:8) do x
        N = 10^x
        u = rand(N)
        tmax = exp(rand() * 10)
        t = (collect(1:N) .- 1) ./ (N - 1) .* tmax
        li = LinearInterpolation(u, t)
        b = @benchmark $li(rand() * $tmax)
        mean(b.times)
    end
    @show exec_times
end

# Log-distributed
begin
    logdist_times = map(3:8) do x
        N = 10^x
        u = rand(N)
        tmax = exp(rand() * 10)
        t = (exp.((collect(1:N) .- 1) ./ (N - 1)) .- 1)
        t .*= (tmax / t[end])
        li = LinearInterpolation(u, t)
        b = @benchmark $li(rand() * $tmax)
        mean(b.times)
    end
    @show logdist_times
end

# Irregularly distributed
begin
    randist_times = map(3:8) do x
        N = 10^x
        u = rand(N)
        tmax = exp(rand() * 10)
        tdiff = rand(N)
        t = cumsum(tdiff .- first(tdiff))
        t .*= (tmax / t[end])
        li = LinearInterpolation(u, t)
        b = @benchmark $li(rand() * $tmax)
        mean(b.times)
    end
    @show randist_times
end

# ranges

begin
    range_times = map(3:8) do x
        N = 10^x
        u = rand(N)
        tmax = exp(rand() * 10)
        t = ((1:N) .- 1) ./ (N - 1) .* tmax
        li = LinearInterpolation(u, t)
        b = @benchmark $li(rand() * $tmax)
        mean(b.times)
    end
    @show range_times
end

@DaniGlez
Copy link
Contributor Author

It's clear that the log-distributed case is quite bad for the PR with large amounts of data (or, potentially, just a smaller cache; I'm on a Ryzen 7800 X3D, which is an outlier in that sense), probably because it has to look up the coordinate array during both the bracketing and the searchsorted. Thus, a potential "best-of-both-worlds" alternative is to replace the bracketing by a simple check for the happy path, followed by a call to searchsorted* using the bisected side as bracket otherwise.

@ChrisRackauckas
Copy link
Member

It's clear that the log-distributed case is quite bad for the PR with large amounts of data (or, potentially, just a smaller cache; I'm on a Ryzen 7800 X3D, which is an outlier in that sense), probably because it has to look up the coordinate array during both the bracketing and the searchsorted. Thus, a potential "best-of-both-worlds" alternative is to replace the bracketing by a simple check for the happy path, followed by a call to searchsorted* using the bisected side as bracket otherwise.

that would be good. It would be nice if during construction time there was a type you can set to choose different search behaviors. That would also be a way to balance it.

@DaniGlez DaniGlez marked this pull request as draft November 21, 2023 11:45
ChrisRackauckas added a commit to SciML/FindFirstFunctions.jl that referenced this pull request Jan 25, 2024
For benchmarks see:

* SciML/DataInterpolations.jl#198
* SciML/DataInterpolations.jl#147

> While the cost of constructing the interpolator does not change, tracking the last index results in a best-case speedup of ~2.7x for CubicSpline, when successive values are close together (and a little higher for simpler interpolators). In the worst case (where successive values are always on opposite ends of the vectors), it can result in a ~15% slowdown due to the unhelpful expanding binary search at the beginning. However, the original approach of not tracking the index at all is also still available; it now involves essentially one extra if statement, which seems to be lost in the timing noise.
@ChrisRackauckas
Copy link
Member

One note is that this PR should get updated to call the functionality which has moved to https://github.com/SciML/FindFirstFunctions.jl.

As a way to finish it, what about the following. Calculate x = var(diff(t)) if close to linear x is sufficiently small:

using Statistics
var(diff(1:6)) # 0.0
var(diff(exp.(1:6))) # 10773.646024936865

we would just need a tunable cutoff point here then, and we can flip on "fastmode" if the constructed interpolation satisfied the criteria.

@DaniGlez
Copy link
Contributor Author

Sounds good (I was similarly thinking on using a threshold based on testing with randomly-spaced abscissae, though not based on variance). I have some doubts regarding implementation: do you think it would be best to add a field or type parameter to every interpolant cache recording that test, or do you envision some alternative?

@ChrisRackauckas
Copy link
Member

I think a field with a bool is fine. Runtime checking that is pretty cheap.

@DaniGlez
Copy link
Contributor Author

Alright, will try to reopen this soon

@DaniGlez DaniGlez closed this Jul 13, 2024
@DaniGlez
Copy link
Contributor Author

Revisiting this topic, I added a simple mechanism along the lines of what we discussed. In particular:

  • At interpolant creation time, we check for "quasilinearity" by the simple and cheap mechanism of comparing the middle element of the array of abscissae to the average of the first and last element, with a relative threshold of 10%. We store that into a flag.
  • At runtime, if the flag is set, we guess the linear index instead of the current mechanism where there's a cached index
  • This seems to be Pareto superior on the benchmarks in this thread, but note that this is random accesses instead of sequential. So I'd like to ask @SouthEndMusic if the outcome is any different on his application, since he added the current cached index mechanism.

Right now I have only added this mechanism in the LinearInterpolation cache, but if this design is OK'd then I'll add it onto the other interpolants.

image
image
image

(Note that these benchmarks are with a different computer than the previous ones).

@DaniGlez DaniGlez reopened this Jul 13, 2024
@DaniGlez DaniGlez marked this pull request as ready for review July 13, 2024 10:35
@DaniGlez DaniGlez changed the title Enable bracketstrictlymontonic Add a fast path for evenly-spaced abscissae Jul 13, 2024
@SouthEndMusic
Copy link
Member

@DaniGlez I like the concept, although I cannot make any assumptions about the distribution of t in my application as it is user input without restrictions on this distribution (apart from being sorted). I do feel like looks_quasilinear condition should be stricter, like looking at the variance of the intervals between the t as @ChrisRackauckas suggested. Because now something like this passes your test:

fig

@DaniGlez
Copy link
Contributor Author

Yeah, there are indeed false positives with this criterion; I felt that having minimal overhead with a cheap criterion is worth some false positives (particularly when you can manually opt-out at construction time) but it's absolutely up for debate.

@ChrisRackauckas
Copy link
Member

For a switchable mode, doing quasiregular = 1e-2 would make a variance check that the variance is below 1e-2 and makes quasiregular true/false, and then quasiregular = true/false would skip the computation and just use the assumption from the user. I think defaulting to the check is a good idea because most cases I see would be fairly regular and so the optimization should be good for "most cases", and with this you just need to find a reasonable variance cutoff for the default.

@DaniGlez
Copy link
Contributor Author

Alright, I made a new proposal, it's similar to the variance one but I take the different w.r.t. the straight line, which seems more directly relevant (you could build a contrived counterexample to straightforward var(diff(t)) with sawtooth increments that would nevertheless benefit from linear guessing). Let me know what you think about this one.

@ChrisRackauckas
Copy link
Member

Seems like the right path to me, though seems_linear and quasiregular should be unified into a single naming. Other than that, this seems good.

@DaniGlez
Copy link
Contributor Author

Ah, yeah, the flag name slipped. I'll change it later and apply it to other interpolant caches if everyone is happy with this.

@DaniGlez DaniGlez marked this pull request as draft July 28, 2024 22:41
@DaniGlez
Copy link
Contributor Author

Alright, finally got back to finishing this with remaining caches and suggestions. The docstring component is maybe a bit too wordy but everything seems to be working fine.

@DaniGlez DaniGlez marked this pull request as ready for review July 29, 2024 20:38
Copy link
Member

@sathvikbhagavan sathvikbhagavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments. Can you add a few tests which demonstrate the speedup mentioned (with regular spaced t) in comments in the PR to make sure everything is working correctly?

src/interpolation_caches.jl Show resolved Hide resolved
src/interpolation_caches.jl Outdated Show resolved Hide resolved
src/interpolation_caches.jl Outdated Show resolved Hide resolved
src/interpolation_utils.jl Outdated Show resolved Hide resolved
@DaniGlez
Copy link
Contributor Author

DaniGlez commented Jul 30, 2024

I've incorporated your suggestions, @sathvikbhagavan. I'm a bit unsure about the benchmark, because it's a bit expensive as a test (a few seconds) compared to most of the test suite and CPU timings are non-deterministic (thus, the test might fail because of some random stuff).

I also have a question regarding the looks_linear docstring; I added it to the @docs in the manual to stop CI from complaining (edit: it does still complain, do the docs expect it to be exported as well?), but does that make it public API? In my view it should remain as an internal implementation-specific function.

Copy link
Member

@sathvikbhagavan sathvikbhagavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of small comments.
I had a quick question - should this not be used in Lagrange Interpolation or even if it is used, it wouldn't make much difference?
Last step, run the julia formatter.

docs/src/manual.md Outdated Show resolved Hide resolved
src/interpolation_utils.jl Outdated Show resolved Hide resolved
@DaniGlez
Copy link
Contributor Author

A couple of small comments. I had a quick question - should this not be used in Lagrange Interpolation or even if it is used, it wouldn't make much difference? Last step, run the julia formatter.

In Lagrangian interpolation, each datapoint contribuyes a term towards the result, thus there is no need to call get_idx - you always use all your data to compute the interpolated value.

@DaniGlez
Copy link
Contributor Author

Not sure what's wrong with the format of docs/src/manual.md, running JuliaFormatter 1.0.59 on it does not make any changes.

@sathvikbhagavan
Copy link
Member

Not sure what's wrong with the format of docs/src/manual.md, running JuliaFormatter 1.0.59 on it does not make any changes.

Add a newline at the end of file.

@DaniGlez
Copy link
Contributor Author

I had thought that would not be an issue since the live version https://github.com/SciML/DataInterpolations.jl/blob/master/docs/src/manual.md?plain=1 seems to not have it either, let's see now

@ChrisRackauckas ChrisRackauckas merged commit d909639 into SciML:master Jul 31, 2024
10 checks passed
@ChrisRackauckas
Copy link
Member

Amazing, thanks for seeing this through.

It would be great to see if we could reuse this as well with the diffeq interpolations, but that can wait.

@sathvikbhagavan
Copy link
Member

It would be great to see if we could reuse this as well with the diffeq interpolations, but that can wait.

Do you mean, using DataInterpolations in DiffEqBase for interpolations? If so, are there any methods missing or its just the matter of trying it out?

@ChrisRackauckas
Copy link
Member

I mean, for the special interpolations for the ODE solvers, there is a similar search phase to find where to interpolate, and that just uses a naive findfirst. There's now a lot of optimizations used here to find the right time more quickly, and it would be nice to get those used there as well.

@SouthEndMusic
Copy link
Member

Maybe move the functionality to FindFirstFunctions?

@ChrisRackauckas
Copy link
Member

Possibly yeah, since this is really a smarter findfirst.

@sathvikbhagavan
Copy link
Member

yeah, abstracting this in findfirst would be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants