Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Me/dpc 4545 fix aggregation #2485

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

MEspositoE14s
Copy link
Contributor

🎫 Ticket

https://jira.cms.gov/browse/DPC-4545

🛠 Changes

  • Updated completePartialBatch so that it updates the job_queue_batch table after each patient is processed.
  • Temporarily Increased the job timeout to 15 minutes.

ℹ️ Context

We noticed that jobs were getting stuck in the queue because the update_time in the job_queue_batch table wasn't getting set frequently enough. Sometimes it's because of issues in our DB code and sometimes it's because patients are taking a long time to process. This should clean up both of those.

🧪 Validation

  • Added a test to ensure update_time is set after each patient is processed.
  • Deployed to test env and watched the DB as the smoke tests ran.

I verified that the code does what I expected it to, but the only way to know for sure if this fixes our problem is to deploy it to prod and periodically check the logs to see if jobs are still getting stuck.

commit 2be580b
Author: Ashley Weaver <[email protected]>
Date:   Thu Feb 27 13:36:07 2025 -0500

    [DPC-4406] Upgrade JDK and Dropwizard (#2368)

    ## 🎫 Ticket

    https://jira.cms.gov/browse/DPC-4406

    ## 🛠 Changes

    Upgrades to JDK 17 and Dropwizard 4

    ## ℹ️ Context

    <!-- Why were these changes made? Add background context suitable for a
    non-technical audience. -->

    <!-- If any of the following security implications apply, this PR must
    not be merged without Stephen Walter's approval. Explain in this section
    and add @SJWalter11 as a reviewer.
      - Adds a new software dependency or dependencies.
      - Modifies or invalidates one or more of our security controls.
      - Stores or transmits data that was not stored or transmitted before.
    - Requires additional review of security implications for other reasons.
    -->

    ## 🧪 Validation

    Deploy is
    [passing](https://github.com/CMSgov/dpc-app/actions/runs/13445174778).

    ---------

    Co-authored-by: MEspositoE14s <[email protected]>
@MEspositoE14s MEspositoE14s requested a review from a team February 28, 2025 20:36
@@ -174,7 +174,7 @@ public Optional<JobQueueBatch> claimBatch(UUID aggregatorID) {
@SuppressWarnings("unchecked")
private void restartStuckBatches(Session session) {
// Find stuck batches
List<String> stuckBatchIDs = session.createNativeQuery("SELECT Cast(batch_id as varchar) batch_id FROM job_queue_batch WHERE status = 1 AND update_time < current_timestamp - interval '5 minutes' FOR UPDATE SKIP LOCKED")
List<String> stuckBatchIDs = session.createNativeQuery("SELECT Cast(batch_id as varchar) batch_id FROM job_queue_batch WHERE status = 1 AND update_time < current_timestamp - interval '15 minutes' FOR UPDATE SKIP LOCKED")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have DPC-4548 to figure out what time this should be set to, but I bumped it up in the meantime to prevent jobs from getting stuck until that ticket gets completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant