Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add serialization and API changes for post_array_schema_from_rest. #5237

Merged
merged 11 commits into from
Aug 27, 2024

Conversation

shaunrd0
Copy link
Contributor

@shaunrd0 shaunrd0 commented Aug 13, 2024

This factors out serialization and API changes in #5181 that are required for the HandleGetArraySchema route. These changes will need to be available on REST before we can enable the new route for loading the array schema.

There is a quick summary of the changes required in SC-52877.


TYPE: IMPROVEMENT
DESC: Add serialization and API changes for post_array_schema_from_rest.

@shaunrd0 shaunrd0 requested a review from KiterLuc August 13, 2024 20:23
@shaunrd0 shaunrd0 requested a review from a team as a code owner August 13, 2024 20:23
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

@shaunrd0 shaunrd0 force-pushed the smr/sc-52877/serialization-update-load-schema branch from 30d4a81 to 1829491 Compare August 14, 2024 17:14
@robertbindar robertbindar self-requested a review August 15, 2024 10:16
@teo-tsirpanis teo-tsirpanis self-requested a review August 16, 2024 15:52
Copy link
Contributor

@robertbindar robertbindar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left an inline comment which we'll need to triple check before merging.

@@ -1264,14 +1264,14 @@ struct LoadEnumerationsResponse {
struct LoadArraySchemaRequest {
config @0 :Config;
# Config

includeEnumerations @1 :Bool;
Copy link
Contributor

@robertbindar robertbindar Aug 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I think you might risk crashing older clients if you do this. Please check

return LoadArraySchemaRequest(reader.getIncludeEnumerations());

which is read whithout checking if the field exists on the message :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah bummer, it was exposed in #4399. I added it back and used it for this option, I documented that future options like this should be stored in the serialized Config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general note, whenever we see a field accessed without an if (has) wrapped around it, we should at the minimum ask a question about it, these things bite hard and lock us into supporting fields forever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I added the hasConfig check during deserialization. I reran tests this morning in #5181 with these changes and everything is passing, this is ready for review when you have some time.

* @param array_schema The array schema to be retrieved, or `NULL` upon error.
* @return `TILEDB_OK` for success and `TILEDB_OOM` or `TILEDB_ERR` for error.
*/
TILEDB_EXPORT int32_t tiledb_array_schema_load_with_options(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiledb_array_schema_load_with_config or load_v2 sounds like a better name. The word options does not appear in other APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 606 to 607
* Retrieves the schema of an array from the disk, creating an array schema
* struct.
* struct. The schema retrieved will always be the latest schema for the array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be Retrieves the latest schema of an array from the disk..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +233 to 235
ss << "rest.load_enumerations_on_array_open false\n";
ss << "rest.load_metadata_on_array_open true\n";
ss << "rest.load_non_empty_domain_on_array_open true\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these options restricted to REST only? Especially the first two have use cases outside of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah also crossed my mind while working on this.. I didn't introduce any new config options in this PR but maybe sm.load_X_on_array_open or sm.array_open.load_X would be more fitting for these? I think they were each introduced to tune REST behavior and that's probably where the names originated.

I opened SC-53109 to consider renaming them, probably best to handle the renaming separately.

Comment on lines 71 to 72
return config_.get<bool>(
"rest.load_enumerations_on_array_open", config_.must_find);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we read this value on construction? There is no reason to copy a whole config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

LOG_STATUS_NO_RETURN_VALUE(st);
save_error(ctx, st);
return TILEDB_OOM;
throw CAPIStatusException("Failed to allocate TileDB array schema object");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can throw bad_alloc, which will map to TILEDB_OOM in the C API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto&& [st, array_schema_response] =
rest_client.get_array_schema_from_rest(uri);
throw_if_not_ok(st);
return std::move(array_schema_response.value());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return std::move(array_schema_response.value());
return std::move(array_schema_response).value();

Nit: This will have the same behavior and use the rvalue reference-qualified overload of value().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, nice!

@@ -123,7 +123,7 @@ shared_ptr<ArraySchema> ArrayDirectory::load_array_schema_latest(
auto&& array_schema = load_array_schema_from_uri(
resources_.get(), schema_uri, encryption_key, memory_tracker);

array_schema->set_array_uri(uri_);
array_schema->set_array_uri(uri_.remove_trailing_slash());

return std::move(array_schema);
Copy link
Member

@teo-tsirpanis teo-tsirpanis Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had written a return std::move(x); in the past and CI had failed with a warning-turned-error. I'm surprised it does not fail here.

@shaunrd0 shaunrd0 force-pushed the smr/sc-52877/serialization-update-load-schema branch from 279b18f to a588737 Compare August 19, 2024 20:10
@shaunrd0 shaunrd0 force-pushed the smr/sc-52877/serialization-update-load-schema branch from 748eaea to 3b206ae Compare August 20, 2024 16:42
#include "tiledb/sm/enums/encryption_type.h"
#include "tiledb/sm/rest/rest_client.h"

namespace tiledb::api {

capi_return_t tiledb_array_schema_load(
int32_t tiledb_array_schema_load(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int32_t tiledb_array_schema_load(
capi_return_t tiledb_array_schema_load(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return TILEDB_OK;
}

int32_t tiledb_array_schema_load_with_config(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int32_t tiledb_array_schema_load_with_config(
capi_return_t tiledb_array_schema_load_with_config(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* @param ctx TileDB Context.
* @param uri The URI of the array to load schema.
* @param config TileDB Config. If null, the context config will be used.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bekadavis9 do you think this belongs here? I'm not sure we want to add the dependency to array directory if we don't have to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is fine. This file is intended to separate out the I/O-based ArraySchema operations and will undoubtedly rely on ArrayDirectory at some point; it just hasn't needed to yet. The array CAPI Handle will already need to link against capi_array_schema_stub to make use of its internals, so there's no additional overhead here that I can think of. It could be argued that it's a potentially unnecessary change, and the body could stay as-is in the C API, but I have no real qualms about the change.

Copy link
Member

@teo-tsirpanis teo-tsirpanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

@robertbindar robertbindar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work @shaunrd0!

@@ -1850,6 +1850,8 @@ void load_array_schema_request_to_capnp(
const LoadArraySchemaRequest& req) {
auto config_builder = builder.initConfig();
throw_if_not_ok(config_to_capnp(config, &config_builder));
// This boolean is only serialized to support clients using TileDB < 2.26.
// Future options should only be serialized within the Config object above.
builder.setIncludeEnumerations(req.include_enumerations());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent, thank you.

@KiterLuc KiterLuc changed the title Serialization and API changes for post_array_schema_from_rest. Add serialization and API changes for post_array_schema_from_rest. Aug 27, 2024
@KiterLuc KiterLuc merged commit 9b4e5ea into dev Aug 27, 2024
68 checks passed
@KiterLuc KiterLuc deleted the smr/sc-52877/serialization-update-load-schema branch August 27, 2024 19:32
@KiterLuc
Copy link
Contributor

/backport to release-2.26

Copy link
Contributor

Started backporting to release-2.26: https://github.com/TileDB-Inc/TileDB/actions/runs/10584654319

Copy link
Contributor

@KiterLuc backporting to release-2.26 failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: Serialization and C API handler changes.
Applying: Changes to loading enumerations.
Applying: Array schema load changes.
Applying: Add config docs to cpp_api/config.h
Applying: Fix CI failures.
Applying: Add test for loading enumerations for all schemas.
Applying: Changes from review.
Applying: Check hasConfig during deserialization.
Applying: Fix C API return types.
error: sha1 information is lacking or useless (tiledb/api/c_api/array/array_api.cc).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0009 Fix C API return types.
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

Copy link
Contributor

@KiterLuc an error occurred while backporting to release-2.26, please check the run log for details!

Error: git am failed, most likely due to a merge conflict.

KiterLuc pushed a commit that referenced this pull request Aug 28, 2024
…ray_schema_from_rest. (#5237) (#5261)

Backport of #5237 to release-2.26

---
TYPE: IMPROVEMENT
DESC: Add serialization and API changes for post_array_schema_from_rest.
KiterLuc added a commit that referenced this pull request Sep 10, 2024
… post_array_schema_from_rest. (#5237) (#5261)"

This reverts commit 6460a8b.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants