Skip to content

Commit

Permalink
fix(c++): Add FinalizeS3 function to release S3 resource (#571)
Browse files Browse the repository at this point in the history
Signed-off-by: acezen <[email protected]>
  • Loading branch information
acezen authored Aug 5, 2024
1 parent 1b6556e commit e6b95c8
Show file tree
Hide file tree
Showing 4 changed files with 52 additions and 1 deletion.
11 changes: 11 additions & 0 deletions cpp/src/graphar/filesystem.cc
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,17 @@ Result<std::shared_ptr<FileSystem>> FileSystemFromUriOrPath(
return std::make_shared<FileSystem>(arrow_fs);
}

Status InitializeS3() {
RETURN_NOT_ARROW_OK(
arrow::fs::InitializeS3(arrow::fs::S3GlobalOptions::Defaults()));
return Status::OK();
}

Status FinalizeS3() {
RETURN_NOT_ARROW_OK(arrow::fs::FinalizeS3());
return Status::OK();
}

/// template specialization for std::string
template Result<IdType> FileSystem::ReadFileToValue<IdType>(
const std::string&) const noexcept;
Expand Down
23 changes: 23 additions & 0 deletions cpp/src/graphar/filesystem.h
Original file line number Diff line number Diff line change
Expand Up @@ -153,4 +153,27 @@ class FileSystem {
Result<std::shared_ptr<FileSystem>> FileSystemFromUriOrPath(
const std::string& uri, std::string* out_path = nullptr);

/**
* @brief Initialize the S3 APIs.
*
* It is required to call this function at least once before using S3
* FileSystem. Once this function is called you MUST call FinalizeS3 before the
* end of the application in order to avoid a segmentation fault at shutdown.
*
* This function calls arrow:fs::Initialize() internally.
*
*/
Status InitializeS3();

/**
* @brief Shutdown the S3 APIs.
*
* This function should be called before the program exits to ensure that
* all S3 resources are properly released.
*
* This function calls arrow:fs::FinalizeS3() internally.
*
*/
Status FinalizeS3();

} // namespace graphar
7 changes: 6 additions & 1 deletion cpp/test/test_info.cc
Original file line number Diff line number Diff line change
Expand Up @@ -774,7 +774,10 @@ version: gar/v1
}
}

TEST_CASE_METHOD(GlobalFixture, "LoadFromS3", "[.hidden]") {
TEST_CASE_METHOD(GlobalFixture, "LoadFromS3") {
// explicitly call InitS3 to initialize S3 APIs before using
// S3 file system.
InitializeS3();
std::string path =
"s3://graphar/ldbc/ldbc.graph.yml"
"?endpoint_override=graphscope.oss-cn-beijing.aliyuncs.com";
Expand All @@ -787,5 +790,7 @@ TEST_CASE_METHOD(GlobalFixture, "LoadFromS3", "[.hidden]") {
const auto& edge_infos = graph_info->GetEdgeInfos();
REQUIRE(vertex_infos.size() == 8);
REQUIRE(edge_infos.size() == 23);
// explicitly call FinalizeS3 to avoid memory leak
FinalizeS3();
}
} // namespace graphar
12 changes: 12 additions & 0 deletions docs/libraries/cpp/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -276,3 +276,15 @@ is used to write the results to new generated data chunks.

Please refer to [more examples](examples/out-of-core.md) to learn
about the other available case studies utilizing GraphAr.

### Working with Cloud Storage (S3, OSS)

GraphAr supports reading and writing data from and to cloud storage, including
AWS S3 and Alibaba Cloud OSS.

To read data from cloud storage, you can specify the path of the data files
with URI schema, e.g., "s3://bucket-name/path/to/data" or "s3://\[access-key:secret-key\]@bucket-name/path/to/data".

[Code example](https://github.com/apache/incubator-graphar/blob/main/cpp/test/test_info.cc#L777-L792) demonstrates how to read data from S3.

Note that once you use cloud storage, you need to call `graphar::InitalizeS3` to initialize S3 APIs before starting the work and call`graphar::FinalizeS3()` to shut down the APIs after the work finish.

0 comments on commit e6b95c8

Please sign in to comment.