Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flyteadmin digest comparison should rely on database semantics #6058

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

popojk
Copy link
Contributor

@popojk popojk commented Nov 29, 2024

Tracking issue

Closes #4780

Why are the changes needed?

In current TaskManager CreateTask code, FlyteAdmin checks if a task with the same ID already exists in the database. If it does, FlyteAdmin verifies whether the registered task has a different digest compared to the existing task. If no task with the same ID is found in the database, FlyteAdmin proceeds to create the task in the database.

However, the current approach may lead to a race condition that prevents the digest comparison for two identical tasks from occurring. For example, consider two identical tasks (tasks with the same ID and digest), A and B, being registered to FlyteAdmin simultaneously. It is likely that the digest check will be skipped because the existing task is not yet present in the database. Consequently, one task will be created in the database, and the other will fail due to a primary key conflict. (Refer to the diagram below for a better understanding.)

截圖 2024-11-29 下午2 36 48

What changes were proposed in this pull request?

1.Do digest check in a transactional way:

The procedure of creating task should be 1. create task -> 2. if task id exists already(pramary key conflict) -> 3. do digest check. The pseudocode could look like

in the transaction:
try:
  create a task with given primary key
except:
  primary key already exists
  get existing entry with identical primary key
    if digest of existing == new entry's digest -> return `NewTaskExistsIdenticalStructureError`
    else -> return `NewTaskExistsDifferentStructureError`

In this way we can make sure that task digest will be checked even though 2 identical task registered at the same time frame. Refer to the diagram below for a better understanding.

截圖 2024-11-22 下午5 28 50

2.Write Task to DB before write Description in TaskRepo Create method:

In current TaskRepo Create method, task description is created before task. However, if TaskManger catches primary key conflict error from task description creation and try to get existing task in DB for digest check, a task not found error could possibly occurred as task is not yet created in DB, which does not make sense for user. In this PR it is proposed to write Task to DB before write Description in TaskRepo Create method.

How was this patch tested?

Set up a simple workflow with 2 tasks

截圖 2024-11-29 下午3 23 27

Write a shell script to request task registration 10 times at the same time to simulate hi concurrency situation. It is expected that each task will be registered successfully once only, otherwise the response message should shown AlreadyExists.

截圖 2024-11-29 下午3 31 06

The result show each task only registered once as we expected

截圖 2024-11-28 上午11 55 20 截圖 2024-11-28 上午11 55 46

Then, we make a shell script to register 2 groups of tasks with same ID but different digest at the same time. It is expected that TaskExistsDifferentStructureError will shown in the response

截圖 2024-11-29 下午3 40 48

The error shown as expected

截圖 2024-11-29 上午11 51 21

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

…vent TaskManager CreateTask method Task not found isue

Signed-off-by: Alex Wu <[email protected]>
Signed-off-by: Alex Wu <[email protected]>
Copy link

codecov bot commented Nov 29, 2024

Codecov Report

Attention: Patch coverage is 63.63636% with 8 lines in your changes missing coverage. Please review.

Project coverage is 37.11%. Comparing base (0585fba) to head (28481a0).
Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
flyteadmin/pkg/manager/impl/task_manager.go 60.00% 6 Missing and 2 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6058   +/-   ##
=======================================
  Coverage   37.10%   37.11%           
=======================================
  Files        1318     1318           
  Lines      132326   132337   +11     
=======================================
+ Hits        49099    49112   +13     
+ Misses      78955    78952    -3     
- Partials     4272     4273    +1     
Flag Coverage Δ
unittests-datacatalog 51.58% <ø> (ø)
unittests-flyteadmin 54.12% <63.63%> (+0.01%) ⬆️
unittests-flytecopilot 30.99% <ø> (ø)
unittests-flytectl 62.33% <ø> (+0.04%) ⬆️
unittests-flyteidl 7.23% <ø> (-0.01%) ⬇️
unittests-flyteplugins 53.82% <ø> (ø)
unittests-flytepropeller 42.63% <ø> (ø)
unittests-flytestdlib 57.59% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Alex Wu <[email protected]>
@popojk popojk force-pushed the Flyteadmin_digest_comparison_should_rely_on_database_semantics branch from e717145 to 28481a0 Compare December 5, 2024 10:07
@@ -30,12 +30,12 @@ func (r *TaskRepo) Create(ctx context.Context, input models.Task, descriptionEnt
}
return nil
}
tx := r.db.WithContext(ctx).Omit("id").Create(descriptionEntity)
tx := r.db.WithContext(ctx).Omit("id").Create(&input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is technically not necessary since this is all wrapped in a transaction, and if any insert fails then the whole transaction should be rolled back.

Copy link
Contributor

@katrogan katrogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! thank you so much for taking on these changes

for the testing:

Then, we make a shell script to register 2 groups of tasks with same ID but different digest at the same time.
did you modify the task definition to force a different task digest? I didn't quite follow from the description

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Housekeeping] Flyteadmin digest comparison should rely on database semantics
2 participants