Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] rgw/sfs: metadata update calls don't honor retry_raced_bucket_write mechanism #637

Open
giubacc opened this issue Jul 26, 2023 · 0 comments · May be fixed by aquarist-labs/ceph#240
Open
Assignees
Labels
area/rgw-sfs RGW & SFS related kind/bug Something isn't working triage/next-candidate This could be moved to the next milestone

Comments

@giubacc
Copy link

giubacc commented Jul 26, 2023

Executing 2 Metadata updates calls, for example: put_bucket_object_lock and put_bucket_tags, interleaved as this:

2023-07-25T16:17:36.321+0200 7f0f77baf6c0  5 req 0 0.003333382s s3:put_bucket_object_lock NOTICE: call to do_aws4_auth_completion
2023-07-25T16:17:36.321+0200 7f0f77baf6c0 10 req 0 0.003333382s s3:put_bucket_object_lock v4 auth ok -- do_aws4_auth_completion
2023-07-25T16:17:36.324+0200 7f0f77baf6c0  2 req 0 0.006666763s s3:put_bucket_object_lock completing
2023-07-25T16:17:36.324+0200 7f0f77baf6c0  2 req 0 0.010000145s s3:put_bucket_tags op status=0
2023-07-25T16:17:36.324+0200 7f0f77baf6c0  2 req 0 0.010000145s s3:put_bucket_tags http status=200
2023-07-25T16:17:36.324+0200 7f0f77baf6c0  1 ====== req done req=0x7f0f7732c6e0 op status=0 http_status=200 latency=0.010000145s ======
2023-07-25T16:17:36.324+0200 7f0f77baf6c0  1 beast: 0x7f0f7732c6e0: 127.0.0.1 - testid [25/Jul/2023:16:17:36.314 +0200] "PUT /cf8e3cfc-361b-466b-bcf9-c48e782f0cf1?tagging HTTP/1.1" 200 0 - "Botocore/1.27.59 Python/3.10.12 Linux/6.4.3-1-default" - latency=0.010000145s
2023-07-25T16:17:36.324+0200 7f0f77baf6c0  2 req 0 0.006666763s s3:put_bucket_object_lock op status=0
2023-07-25T16:17:36.324+0200 7f0f77baf6c0  2 req 0 0.006666763s s3:put_bucket_object_lock http status=200
2023-07-25T16:17:36.324+0200 7f0f77baf6c0  1 ====== req done req=0x7f0f772ab6e0 op status=0 http_status=200 latency=0.006666763s ======

It results that modifications done by put_bucket_tags are lost.
This is likely due to that, in this scenario, put_bucket_object_lock makes use of stale data.

To reproduce this it is sufficient to issue in parallel both put_bucket_tags and put_bucket_object_lock on the same bucket.

The same but serialized execution does what is expected.

Analysis:

put_bucket_tags and put_bucket_object_lock store data the same way and are allowed to race each other.
Metadata updates are "bundled" and not done individually to the db.
There is a retry mechanism: see retry_raced_bucket_write, that expects the update to fail if doesn't have the latest data.
We currently don't have a conditional update in put_info that detects a stale view and fails the op for the retry to pick up.
We also don't implement try_refresh_info which retry uses.

@github-actions github-actions bot added the triage/waiting Waiting for triage label Jul 26, 2023
@giubacc giubacc added kind/bug Something isn't working area/rgw-sfs RGW & SFS related labels Jul 26, 2023
@jhmarina jhmarina added priority/1 Should be fixed for next release and removed triage/waiting Waiting for triage labels Sep 7, 2023
@giubacc giubacc added this to the v0.22.0 milestone Oct 4, 2023
@giubacc giubacc modified the milestones: v0.22.0, v0.23.0 Oct 19, 2023
giubacc referenced this issue in giubacc/ceph Oct 26, 2023
Updating bucket's metadata concurrently by two or more threads is allowed in radosgw.
There is a retry mechanism: retry_raced_bucket_write(), that expects the bucket references to fetch the latest data from the persistent store.
rgw/sfs driver didn't implement try_refresh_info() in its bucket class definition; this could cause two references to the same bucket to potentially lead to partial metadata updates.

Fixes: https://github.com/aquarist-labs/s3gw/issues/637
Signed-off-by: Giuseppe Baccini <[email protected]>
@giubacc giubacc linked a pull request Oct 26, 2023 that will close this issue
11 tasks
giubacc referenced this issue in giubacc/ceph Oct 26, 2023
Updating bucket's metadata concurrently by two or more threads is allowed in radosgw.
There is a retry mechanism: retry_raced_bucket_write(), that expects the bucket references to fetch the latest data from the persistent store.
rgw/sfs driver didn't implement try_refresh_info() in its bucket class definition; this could cause two references to the same bucket to potentially lead to partial metadata updates.

Fixes: https://github.com/aquarist-labs/s3gw/issues/637
Signed-off-by: Giuseppe Baccini <[email protected]>
giubacc referenced this issue in giubacc/ceph Oct 27, 2023
Updating bucket's metadata concurrently by two or more threads is allowed in radosgw.
There is a retry mechanism: retry_raced_bucket_write(), that expects the bucket references to fetch the latest data from the persistent store.
rgw/sfs driver didn't implement try_refresh_info() in its bucket class definition; this could cause two references to the same bucket to potentially lead to partial metadata updates.

Fixes: https://github.com/aquarist-labs/s3gw/issues/637
Signed-off-by: Giuseppe Baccini <[email protected]>
giubacc referenced this issue in giubacc/ceph Nov 2, 2023
Updating bucket's metadata concurrently by two or more threads is allowed in radosgw.
There is a retry mechanism: retry_raced_bucket_write(), that expects the bucket references to fetch the latest data from the persistent store.
rgw/sfs driver didn't implement try_refresh_info() in its bucket class definition; this could cause two references to the same bucket to potentially lead to partial metadata updates.

Fixes: https://github.com/aquarist-labs/s3gw/issues/637
Signed-off-by: Giuseppe Baccini <[email protected]>
@giubacc giubacc added triage/needs-information Further information is requested and removed triage/needs-information Further information is requested labels Nov 17, 2023
giubacc referenced this issue in giubacc/ceph Nov 17, 2023
Updating bucket's metadata concurrently by two or more threads is allowed in radosgw.
There is a retry mechanism: retry_raced_bucket_write(), that expects the bucket references to fetch the latest data from the persistent store.
rgw/sfs driver didn't implement try_refresh_info() in its bucket class definition; this could cause two references to the same bucket to potentially lead to partial metadata updates.

Fixes: https://github.com/aquarist-labs/s3gw/issues/637
Signed-off-by: Giuseppe Baccini <[email protected]>
@jecluis jecluis modified the milestones: v0.23.0, v0.24.0 Nov 26, 2023
@jecluis jecluis removed this from the v0.24.0 milestone Mar 21, 2024
@jecluis jecluis added triage/next-candidate This could be moved to the next milestone and removed priority/1 Should be fixed for next release labels Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/rgw-sfs RGW & SFS related kind/bug Something isn't working triage/next-candidate This could be moved to the next milestone
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.

3 participants