Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rockset migration - move upload for queue_times_historical from rockset to s3 #5398

Merged
merged 4 commits into from
Jul 9, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions torchci/rockset/metrics/__sql/queued_jobs_by_label.sql
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ SELECT
COUNT(*) AS count,
MAX(queue_s) AS avg_queue_s,
machine_type,
CURRENT_TIMESTAMP() AS _event_time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment on why we need to use CURRENT_TIMESTAMP as _event_time here. I guess that this is because Rockset uses the time when it ingests the data as _event_time. So, re-importing records from S3 will set them all to the same _event_time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes _event_time is the time records get added to rockset if it is empty https://docs.rockset.com/documentation/docs/special-fields#the-_event_time-field

I decided to make a different field entirely so theres less confusion between the two

FROM
queued_jobs
GROUP BY
Expand Down
2 changes: 1 addition & 1 deletion torchci/rockset/prodVersions.json
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
"master_commit_red_percent_groups": "601949da23f80a28",
"master_jobs_red_avg": "7df76d4b0d79e067",
"number_of_force_pushes": "7c12c25f00d85d5d",
"queued_jobs_by_label": "9526771e44a48db3",
"queued_jobs_by_label": "acedda4f886e2e32",
"queued_jobs": "2a1fce1642bb412d",
"reverts": "f5bc84a10c4065a3",
"top_reds": "f1a1f5012d419fc2",
Expand Down
29 changes: 22 additions & 7 deletions torchci/scripts/updateQueueTimes.mjs
Original file line number Diff line number Diff line change
@@ -1,10 +1,23 @@
// We compute queue times by looking at a snapshot of jobs in CI that are
// currently queued and seeing how long they've existed. This approach doesn't
// give us historical data, so write our snapshot regularly to another Rockset
// collection so we can get a view of the queue over time.
// give us historical data, so write our snapshot regularly to s3 so we can get
// a view of the queue over time.
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3";
import rockset from "@rockset/client";
import { promises as fs } from "fs";

export function getS3Client() {
return new S3Client({
region: "us-east-1",
credentials: {
accessKeyId: process.env.OUR_AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.OUR_AWS_SECRET_ACCESS_KEY,
},
});
}

const s3client = getS3Client();

async function readJSON(path) {
const rawData = await fs.readFile(path);
return JSON.parse(rawData);
Expand All @@ -19,8 +32,10 @@ const response = await client.queryLambdas.executeQueryLambda(
{}
);

console.log(response);

await client.documents.addDocuments("metrics", "queue_times_historical", {
data: response.results,
});
s3client.send(
new PutObjectCommand({
Bucket: "ossci-raw-job-status",
Key: `queue_times_historical/${response.results[0]._event_time}`,
clee2000 marked this conversation as resolved.
Show resolved Hide resolved
Body: JSON.stringify(response.results),
})
);
Loading