Skip to content

Feat - cron queue for master commits #2163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 19, 2025

Conversation

Jamesbarford
Copy link
Contributor

  • Toggle the cron running using RUN_CRON environment variable
  • Change the frequency of the cron using environment variable QUEUE_UPDATE_INTERVAL_SECONDS
  • Every 30 seconds by default would populate the benchmark_requests table with master commits

Copy link
Contributor

@Kobzol Kobzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I haven't tested it locally yet because it didn't compile (some things need to be shuffled around/fixed I think), but left a couple of comments. I'll test it then. Nevermind, that was some Cargo bug.

@Kobzol
Copy link
Contributor

Kobzol commented Jun 18, 2025

called `Result::unwrap()` on an `Err` value: Error { kind: ToSql(1), cause: Some(WrongType { postgres: Int4, rust: "u32" }) }

Yup, guessed it right 😆

@Jamesbarford Jamesbarford force-pushed the feat/queue-cron branch 2 times, most recently from fe73bd9 to 4fca54f Compare June 19, 2025 09:05
let ctxt = site_ctxt.clone();
let mut interval = time::interval(Duration::from_secs(seconds));

if let Some(ctxt_clone) = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition needs to be inside the loop. First we should tick the interval, than check if the ctxt is filled, and then run the cron_enqueue_jobs function. Both because the context is missing when cron_main is called (so the cron doesn't run at all), and also because if we took the context here, we would be using an old context for the whole duration of the cron job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I think this commit should handle that; a145daa

I .clone() outside of the loop which I think makes sense to increment the reference count. Then do as you say inside the loop and have moved the interval.tick() to be the first call to be invoked in the loop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clone is not necessary as you already hold a value of the Arc from the parameter, but it doesn't really matter.

Copy link
Contributor

@Kobzol Kobzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the created_at column is currently set to the date of master commits. I guess it doesn't matter that much, as in most cases it should be inserted into the table relatively soon after being merged. But it's a bit weird, since we probably treat created_at as the time when the actual entry in the table was created.

That also reminded me that we will want to show the total duration of the benchmark run on the website. This duration was simple before, but it gets more complicated with the job queue and backfilling. Possibly later we might want to add a column to benchmark_request that will store the duration of the whole benchmark? Essentially the time from the start of the first job to the completion of the last job, I suppose. But maybe we can reuse artifact_collection_duration for that. Anyway, we can deal with that later.

@Kobzol
Copy link
Contributor

Kobzol commented Jun 19, 2025

I'm fine with merging the current state, although I wouldn't enable it yet on production (it seems to work fine locally). Because currently we take all master commits from June onwards, and try to insert them all into the DB. This does in fact do 100+ (with 5 more each day) insert queries to the DB, every 30 seconds, which seems a bit wasteful for the production server for now.

I think that before enabling it on production, we should at least check that the inserted SHA is not in site_ctxt.index.commits, or create our own index of completed benchmark requests, because in the new scheme checking just the artifacts table is not enough.

@Jamesbarford
Copy link
Contributor Author

I'm fine with merging the current state, although I wouldn't enable it yet on production (it seems to work fine locally). Because currently we take all master commits from June onwards, and try to insert them all into the DB. This does in fact do 100+ (with 5 more each day) insert queries to the DB, every 30 seconds, which seems a bit wasteful for the production server for now.

I think that before enabling it on production, we should at least check that the inserted SHA is not in site_ctxt.index.commits, or create our own index of completed benchmark requests, because in the new scheme checking just the artifacts table is not enough.

We should be shielded from anything happening from the RUN_CRON environment variable. To make sure I am following you're suggesting doing something like;

let commits: Vec<Commits> = ctxt.index.commits();

if !commits.find(...)  {
   // insert master commit
}

I'm happy with either, I think as we have a test which shows that items can get inserted in the database and we at present only have a simple insert operation; waiting to enable it in production seems ok?

Copy link
Contributor

@Kobzol Kobzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it shouldn't run by default, I just wanted to mention that I'll only enable the environment variable or make the cron job run by default once we can filter the existing commits quickly, to avoid the useless queries.

Thank you!

@Kobzol Kobzol merged commit 39fab1a into rust-lang:master Jun 19, 2025
11 checks passed
@Jamesbarford Jamesbarford deleted the feat/queue-cron branch June 19, 2025 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants