-
Notifications
You must be signed in to change notification settings - Fork 157
Feat - cron queue for master commits #2163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I haven't tested it locally yet because it didn't compile (some things need to be shuffled around/fixed I think), but left a couple of comments. I'll test it then. Nevermind, that was some Cargo bug.
Yup, guessed it right 😆 |
c112c30
to
4b373d2
Compare
fe73bd9
to
4fca54f
Compare
4fca54f
to
cac2e28
Compare
site/src/job_queue.rs
Outdated
let ctxt = site_ctxt.clone(); | ||
let mut interval = time::interval(Duration::from_secs(seconds)); | ||
|
||
if let Some(ctxt_clone) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition needs to be inside the loop. First we should tick the interval, than check if the ctxt is filled, and then run the cron_enqueue_jobs
function. Both because the context is missing when cron_main
is called (so the cron doesn't run at all), and also because if we took the context here, we would be using an old context for the whole duration of the cron job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I think this commit should handle that; a145daa
I .clone()
outside of the loop which I think makes sense to increment the reference count. Then do as you say inside the loop and have moved the interval.tick()
to be the first call to be invoked in the loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clone is not necessary as you already hold a value of the Arc
from the parameter, but it doesn't really matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: the created_at
column is currently set to the date of master commits. I guess it doesn't matter that much, as in most cases it should be inserted into the table relatively soon after being merged. But it's a bit weird, since we probably treat created_at
as the time when the actual entry in the table was created.
That also reminded me that we will want to show the total duration of the benchmark run on the website. This duration was simple before, but it gets more complicated with the job queue and backfilling. Possibly later we might want to add a column to benchmark_request
that will store the duration of the whole benchmark? Essentially the time from the start of the first job to the completion of the last job, I suppose. But maybe we can reuse artifact_collection_duration
for that. Anyway, we can deal with that later.
8930a3b
to
a145daa
Compare
I'm fine with merging the current state, although I wouldn't enable it yet on production (it seems to work fine locally). Because currently we take all master commits from June onwards, and try to insert them all into the DB. This does in fact do 100+ (with 5 more each day) insert queries to the DB, every 30 seconds, which seems a bit wasteful for the production server for now. I think that before enabling it on production, we should at least check that the inserted SHA is not in |
We should be shielded from anything happening from the let commits: Vec<Commits> = ctxt.index.commits();
if !commits.find(...) {
// insert master commit
} I'm happy with either, I think as we have a test which shows that items can get inserted in the database and we at present only have a simple insert operation; waiting to enable it in production seems ok? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it shouldn't run by default, I just wanted to mention that I'll only enable the environment variable or make the cron job run by default once we can filter the existing commits quickly, to avoid the useless queries.
Thank you!
RUN_CRON
environment variableQUEUE_UPDATE_INTERVAL_SECONDS
30
seconds by default would populate thebenchmark_requests
table with master commits