Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seen / unseen observations: performance improvements #273

Open
niconoe opened this issue Dec 11, 2023 · 2 comments
Open

Seen / unseen observations: performance improvements #273

niconoe opened this issue Dec 11, 2023 · 2 comments
Assignees
Labels

Comments

@niconoe
Copy link
Member

niconoe commented Dec 11, 2023

As noted by @damianooldoni and others, some parts of the application (where seen/unseen observations must be counted, for example the "my alerts" page), are getting slower and slower over time.

A big part of the issue is that we add a new entry in the database each time a user mark an observation as seen. So 500.000 observation * 70 users (roughly what we have now on the RIPARIAS early alert website) leads to a table with 35.000.000 rows, which has a performance impact (also for the nightly import, the "mark all has seen" feature, ...). The implementation is similar to what's described at https://dba.stackexchange.com/questions/52355/database-design-for-holding-read-unread-content-state

I think it would be better to reverse the implementation and only create entries for unseen observations.

Pros:

  • better performance in many parts of the application, especially for users that have little "unseen" observations
  • This would be even better if, at user creation time, we consider all existing observations as already seen (I think we agreed this would be something nice to have in all cases)
  • Finally, we could imagine another mechanism that automatically marks observations as seen if a user ignored it for a long time (6+ months for example), so the total number of entries in the "unseen" table stays low => I think with all this those perfs/scalability issues will definitely be solved for now

Cons:

  • We would lose the ability to show to a user the first time they have seen a given observation for the first time (I guess it's a very minor feature?)
  • It'll take some development time (I estimate 3-4 days). I'd like to solve that soon but it's not easy to find this time in my agenda for now.

Some smaller mitigation approaches can be tried in the meantime, such as:

  • Check if the database access can be improved with the current schema (indexes, raw SQL, ...)
  • My alert page (and similiar): do the "unread counting" via AJAX requests so the whole page loading is not blocked.
@niconoe niconoe added enhancement New feature or request Cleanup / robustness labels Dec 11, 2023
@niconoe niconoe self-assigned this Dec 11, 2023
@damianooldoni
Copy link
Collaborator

Hi 👋
Some users are complaining about the performance of the LIFE RIPARIAS instance of gbif-alert (alert.riparias.be). They need to use our tool with externals and I can understand it's not easy to show a tool which takes tens of seconds to show the selected locations and the observations falling in it.

@niconoe: no idea how much you can work on this in the next days/weeks/months... And do you think that reducing the number of observations would help on the short term? Now we load data of the last 10 years. But again, an alert system doesn't need such historical data. If we set year >= 2020 in the data download query, we will move from 625k occurrences to 370k. What do you think about it?

@niconoe
Copy link
Member Author

niconoe commented May 14, 2024

@damianooldoni: sorry I only notice your message now! Indeed, the performance is getting slower for the reason stated above, in the initial ticket description.

I'd like to solve that cleanly, but it's not a small task and I'm not sure if I could tackle it before July.

In the meantime, having less observations in the system as you suggest should do the trick!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants