Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge validate_spatial and validate_series #76

Merged
merged 8 commits into from
Sep 30, 2024
Merged

Conversation

intarga
Copy link
Member

@intarga intarga commented Sep 17, 2024

No description provided.

@intarga intarga added the tech debt internal/maintainability improvements label Sep 17, 2024
@intarga intarga self-assigned this Sep 17, 2024
@intarga intarga mentioned this pull request Sep 25, 2024
5 tasks
@intarga intarga linked an issue Sep 25, 2024 that may be closed by this pull request
5 tasks
@intarga intarga requested a review from Lun4m September 25, 2024 10:44
@intarga intarga marked this pull request as ready for review September 25, 2024 10:44
@Lun4m
Copy link
Collaborator

Lun4m commented Sep 26, 2024

It's going to take a while 😅 In the meantime, I have a question regarding a doubt I've had since we merged the two caches: how expensive is it to build the R*-tree? We are building it every time we call fetch_data and maybe some pipelines don't even need to run spatial tests, so I was wondering if inside DataCache we should instead have an Option<SpatialTree>? And maybe a constructor that leverages the new SpaceSpec? Or it doesn't matter?

@intarga
Copy link
Member Author

intarga commented Sep 26, 2024

how expensive is it to build the R*-tree?

The algorithm is n * log(n), so I don't think it's a concern.

I do agree there may be a need to optimise the case of simple pipelines though, as the fresh pipelines will likely form the vast bulk of our request load. I think if we're going to push to optimise those though, we should be more aggressive in our optimisation strategy. I'm thinking a much simpler DataCache that removed the spatial elements and the outer vec entirely, along with a cache of the recent data (a hashmap keyed by timeseries id, containing ringbuffers of data) bypassing the need to query the DB and do a network call entirely. That was the idea with #77.

It would be good to see how the more general validate function performs with real pipelines, a representative data flow, and a real DB connector though, so we can see how much optimisation it needs, and have a target to aim for.

Copy link
Collaborator

@Lun4m Lun4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I only have minor comments

met_connectors/src/frost/fetch.rs Outdated Show resolved Hide resolved
met_connectors/src/frost/fetch.rs Show resolved Hide resolved
met_connectors/src/frost/fetch.rs Outdated Show resolved Hide resolved
met_connectors/src/frost/fetch.rs Outdated Show resolved Hide resolved
src/data_switch.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/scheduler.rs Outdated Show resolved Hide resolved
src/scheduler.rs Outdated Show resolved Hide resolved
met_connectors/src/frost/fetch.rs Outdated Show resolved Hide resolved
intarga added a commit that referenced this pull request Sep 30, 2024
@intarga intarga requested a review from Lun4m September 30, 2024 12:25
@intarga intarga merged commit 74f8711 into trunk Sep 30, 2024
1 check passed
@intarga intarga deleted the unify-spatial-series branch September 30, 2024 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tech debt internal/maintainability improvements
Projects
None yet
2 participants