-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(core-clp): Defer each archive's global metadata updates until it has been completed (resolves #685). #705
Changes from 12 commits
1c13138
faecdf9
5d7aa3b
91f868f
719b97b
4f1e6c6
962d61f
957edbf
3b0b37a
8e59e74
8aaff6f
363cd56
48b0779
846062b
920cfb8
0628636
410ce5d
a83a420
0431259
b46f4e5
fccbd5f
c38ab47
b611b6d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -157,10 +157,6 @@ void Archive::open(UserConfig const& user_config) { | |
|
||
m_global_metadata_db = user_config.global_metadata_db; | ||
|
||
m_global_metadata_db->open(); | ||
m_global_metadata_db->add_archive(m_id_as_string, *m_local_metadata); | ||
m_global_metadata_db->close(); | ||
|
||
m_file = nullptr; | ||
|
||
// Open log-type dictionary | ||
|
@@ -238,6 +234,8 @@ void Archive::close() { | |
|
||
m_metadata_file_writer.close(); | ||
|
||
update_global_metadata(); | ||
|
||
m_global_metadata_db = nullptr; | ||
|
||
m_metadata_db.close(); | ||
|
@@ -557,8 +555,6 @@ void Archive::persist_file_metadata(vector<File*> const& files) { | |
|
||
m_metadata_db.update_files(files); | ||
|
||
m_global_metadata_db->update_metadata_for_files(m_id_as_string, files); | ||
|
||
// Mark files' metadata as clean | ||
for (auto file : files) { | ||
file->mark_metadata_as_clean(); | ||
|
@@ -593,16 +589,12 @@ void Archive::close_segment_and_persist_file_metadata( | |
|
||
for (auto file : files) { | ||
file->mark_as_in_committed_segment(); | ||
m_files_written.emplace_back(file); | ||
} | ||
|
||
m_global_metadata_db->open(); | ||
persist_file_metadata(files); | ||
update_metadata(); | ||
m_global_metadata_db->close(); | ||
|
||
for (auto file : files) { | ||
delete file; | ||
} | ||
files.clear(); | ||
} | ||
|
||
|
@@ -635,8 +627,6 @@ void Archive::update_metadata() { | |
m_metadata_file_writer.seek_from_begin(0); | ||
m_local_metadata->write_to_file(m_metadata_file_writer); | ||
|
||
m_global_metadata_db->update_archive_metadata(m_id_as_string, *m_local_metadata); | ||
|
||
if (m_print_archive_stats_progress) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This block should probably move into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you talking about the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The progress output is being used by the package to keep track of when new archives have been generated. The package expects that the archive exists in the global metadata database when it receives the progress update. Progress is probably a poor name for this option at this point since its use is not really for humans per se. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lol this is sketch, but I guess we don't have a good alternative at the moment. I looked through this code and it looks like it only uses the last progress update? As such, there is no difference between moving it to close and leaving as it is now? If we do what you are suggesting, we are essentially breaking the progress update feature in the cli. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True, it does use only the last update. For the sake of safety, can you add a comment indicating that the stats are printed even though the archive doesn't exist in the global metadata database? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay. The ordering now is slightly different. Previously we added to global DB before printing, now we print then add to globalDB. I assume this will be okay. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you want to be more safe. I could potential add a duplicate stat printing when the archive is closed (maybe with a new field like complete) lmk There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unrelated, but the old method was potentially unsafe in another way. The archive wasn't actually closed before it was uploaded to s3 (like write_dir_snapshot()) may have not been called, if there was only one archive There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Good idea. Sgtm. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
nlohmann::json json_msg; | ||
json_msg["id"] = m_id_as_string; | ||
|
@@ -647,6 +637,19 @@ void Archive::update_metadata() { | |
} | ||
} | ||
|
||
auto Archive::update_global_metadata() -> void { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about renaming |
||
m_global_metadata_db->open(); | ||
if (false == m_local_metadata.has_value()) { | ||
throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__); | ||
} | ||
m_global_metadata_db->add_archive(m_id_as_string, m_local_metadata.value()); | ||
m_global_metadata_db->update_metadata_for_files(m_id_as_string, m_files_written); | ||
for (auto* file : m_files_written) { | ||
delete file; | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just looking at the function name, one may not expect the function to call delete on m_files_written. I feel it would be more clear to put these 3 lines out side of the function, or maybe update the funciton name to include the "delete" part There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I moved the delete out of the function |
||
m_global_metadata_db->close(); | ||
} | ||
|
||
// Explicitly declare template specializations so that we can define the template methods in this | ||
// file | ||
template void Archive::write_log_event_ir<eight_byte_encoded_variable_t>( | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -279,6 +279,11 @@ class Archive { | |||||
*/ | ||||||
void update_metadata(); | ||||||
|
||||||
/** | ||||||
* Updates metadata in the global metadata database. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto comment for deleting m_files_written. You might also need to mention about exceptions if m_global_metadata_db's api throws any. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed the delete out of the function. None of the functions I am calling have any documented exceptions in the .hpp file. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
And we should probably update |
||||||
*/ | ||||||
auto update_global_metadata() -> void; | ||||||
|
||||||
// Variables | ||||||
boost::uuids::uuid m_id; | ||||||
std::string m_id_as_string; | ||||||
|
@@ -316,6 +321,10 @@ class Archive { | |||||
std::vector<File*> m_files_with_timestamps_in_segment; | ||||||
std::vector<File*> m_files_without_timestamps_in_segment; | ||||||
|
||||||
// Data for all files in this collection has been deallocated, and should only | ||||||
// contain metadata. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. without reading the rest of the code outside this PR, the comment is not very easy to follow. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The function
This function If you want i can modify the comment to say
We could also just remove the comment... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment makes sense to me. maybe we can rename the variables? something like m_file_metadata_for_global_update. |
||||||
std::vector<File*> m_files_written; | ||||||
|
||||||
size_t m_target_segment_uncompressed_size; | ||||||
Segment m_segment_for_files_with_timestamps; | ||||||
ArrayBackedPosIntSet<logtype_dictionary_id_t> | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wonder, can we use something like m_files_written.insert(files.begin(), files.end())?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but I figured the loop was already there. Not sure if better to use existing loop, or do what you proposed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't say for sure which one is better, so I am fine to keep the emplace.
I do notice that emplace construct the object in place and insert copies the object. but in our case, I guess emplace will also somehow have to copy via the copy constructor.
Anyway, I don't think the performance really matter here.
@LinZhihao-723 do you have any insights?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, can we directly just do
m_files_written = files
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this will be executed multiple times per archive. That would overwrite the previous call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ture, you are right. my bad