Skip to content

Commit

Permalink
[MINOR] improvement(docs): Polish fileset related document (#5483)
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

Polish and refine the document about fileset catalog.

### Why are the changes needed?

For a better user experience.


### Does this PR introduce _any_ user-facing change?

N/A.

### How was this patch tested?

N/A.

Co-authored-by: Qi Yu <[email protected]>
  • Loading branch information
github-actions[bot] and yuqi1129 authored Nov 6, 2024
1 parent a174615 commit a176856
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 4 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 6 additions & 4 deletions docs/hadoop-catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,13 +77,13 @@ In the meantime, you need to place the corresponding bundle jar [`gravitino-gcp-
In the meantime, you need to place the corresponding bundle jar [`gravitino-aliyun-bundle-${version}.jar`](https://repo1.maven.org/maven2/org/apache/gravitino/aliyun-bundle/) in the directory `${GRAVITINO_HOME}/catalogs/hadoop/libs`.

:::note
- Gravitino contains builtin file system providers for local file system(`builtin-local`) and HDFS(`builtin-hdfs`), that is to say if `filesystem-providers` is not set, Gravitino will still support local file system and HDFS. Apart from that, you can set the `filesystem-providerss` to support other file systems like S3, GCS, OSS or custom file system.
- Gravitino contains builtin file system providers for local file system(`builtin-local`) and HDFS(`builtin-hdfs`), that is to say if `filesystem-providers` is not set, Gravitino will still support local file system and HDFS. Apart from that, you can set the `filesystem-providers` to support other file systems like S3, GCS, OSS or custom file system.
- `default-filesystem-provider` is used to set the default file system provider for the Hadoop catalog. If the user does not specify the scheme in the URI, Gravitino will use the default file system provider to access the fileset. For example, if the default file system provider is set to `builtin-local`, the user can omit the prefix `file://` in the location.
:::

#### How to custom your own HCFS file system fileset?

Developers and users can custom their own HCFS file system fileset by implementing the `FileSystemProvider` interface in the jar [gravitino-catalog-hadoop](https://repo1.maven.org/maven2/org/apache/gravitino/catalog-hadoop/) . The `FileSystemProvider` interface is defined as follows:
Developers and users can custom their own HCFS file system fileset by implementing the `FileSystemProvider` interface in the jar [gravitino-catalog-hadoop](https://repo1.maven.org/maven2/org/apache/gravitino/catalog-hadoop/). The `FileSystemProvider` interface is defined as follows:

```java

Expand All @@ -97,13 +97,15 @@ Developers and users can custom their own HCFS file system fileset by implementi

// Name of the file system provider. 'builtin-local' for Local file system, 'builtin-hdfs' for HDFS,
// 's3' for AWS S3, 'gcs' for GCS, 'oss' for Aliyun OSS.

// You need to set catalog properties `filesystem-providers` to support this file system.
String name();
```

After implementing the `FileSystemProvider` interface, you need to put the jar file into the `$GRAVITINO_HOME/catalogs/hadoop/libs` directory. Then you can set the `filesystem-providers` property to use your custom file system provider.
In the meantime, `FileSystemProvider` uses Java SPI to load the custom file system provider. You need to create a file named `org.apache.gravitino.catalog.fs.FileSystemProvider` in the `META-INF/services` directory of the jar file. The content of the file is the full class name of the custom file system provider.
For example, the content of `S3FileSystemProvider` is as follows:
![img.png](assets/fileset/custom-filesystem-provider.png)

After implementing the `FileSystemProvider` interface, you need to put the jar file into the `$GRAVITINO_HOME/catalogs/hadoop/libs` directory. Then you can set the `filesystem-providers` property to use your custom file system provider.

### Authentication for Hadoop Catalog

Expand Down
9 changes: 9 additions & 0 deletions docs/manage-fileset-metadata-using-gravitino.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,9 @@ curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
"filesystem-providers": "s3"
}
}' http://localhost:8090/api/metalakes/metalake/catalogs

# For others HCFS like GCS, OSS, etc., the properties should be set accordingly. please refer to
# The following link about the catalog properties.
```

</TabItem>
Expand Down Expand Up @@ -106,6 +109,9 @@ Catalog s3Catalog = gravitinoClient.createCatalog("catalog",
"This is a S3 fileset catalog",
s3Properties);
// ...

// For others HCFS like GCS, OSS, etc., the properties should be set accordingly. please refer to
// The following link about the catalog properties.
```

</TabItem>
Expand All @@ -132,6 +138,9 @@ s3_catalog = gravitino_client.create_catalog(name="catalog",
provider="hadoop",
comment="This is a S3 fileset catalog",
properties=s3_properties)

# For others HCFS like GCS, OSS, etc., the properties should be set accordingly. please refer to
# The following link about the catalog properties.
```

</TabItem>
Expand Down

0 comments on commit a176856

Please sign in to comment.