-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable reading un-synced data in db stress test #12752
Conversation
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
utilities/fault_injection_fs.cc
Outdated
@@ -803,7 +803,7 @@ IOStatus FaultInjectionTestFS::GetFileSize(const std::string& f, | |||
// Need to report flushed size, not synced size | |||
MutexLock l(&mutex_); | |||
auto it = db_file_state_.find(f); | |||
if (it != db_file_state_.end()) { | |||
if (it != db_file_state_.end() && it->second.pos_ > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a comment, it's not clear why this check should be here.
The core problem seems to be that we aren't simulating unsynced data loss with direct_io, and because of that, we aren't setting pos_. Why should this code be quietly entangled with that deficit of other code (rather than fixing the other code)?
It seems like it would be less code to treat use_direct_io the same way as !use_direct_io and either not use direct_io in target_ or create an aligned buffer when we do append to target_.
Or easy to update pos_ with use_direct_io.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core problem seems to be that we aren't simulating unsynced data loss with direct_io, and because of that, we aren't setting pos_. Why should this code be quietly entangled with that deficit of other code (rather than fixing the other code)?
This is fair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it would be less code to treat use_direct_io the same way as !use_direct_io and either not use direct_io in target_ or create an aligned buffer when we do append to target_.
Or easy to update pos_ with use_direct_io.
Let me think about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the direct IO is meant for skipping the page cache and therefore we don't simulate un-synced data loss like loss in page cache for direct IO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After more discussion, we agreed upon direct IO skipping page cache doesn't necessarily there is no chance for data loss upon power loss. So we still want to simulate data loss for direct IO write in the future.
Fixed by "update pos_ with use_direct_io" for now.
ce27169
to
7bc57b7
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
7bc57b7
to
e522971
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Need to fix more places apparently https://github.com/facebook/rocksdb/actions/runs/9474358341/job/26103793559?pr=12752 |
I am just passing by but it appears to be PositionedAppend() considering it is direct I/O, and PositionedAppend() does not update the |
Looks good once that is resolved! |
e522971
to
a34ceb6
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
Fixed - turns out that we also truncate under direct IO so need to update the pos (now renamed to post_at_last_append) for better naming. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
utilities/fault_injection_fs.cc
Outdated
@@ -16,6 +16,8 @@ | |||
|
|||
#include "utilities/fault_injection_fs.h" | |||
|
|||
#include <sys/types.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems odd. Not sure how this got in here, and maybe not portable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make format somehow did that hmmm let me check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not make format
but VSCode probably
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
a34ceb6
to
3c03992
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Context/Summary:
There are a few blockers to enabling reading un-synced data in db stress test
(1) GetFileSize() will always return 0 for file written under direct IO because we don't track the last flushed position for
TestFSWritableFile
under direct IO. So it will surface as(2) A couple minor FIXME in left in #12729.
This PR fixed (1) and (2) and enabled reading un-synced data in stress test.
Test: