Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]Data folder is not getting listed with storage type FILE #166

Open
1 task done
shraddhagrawal opened this issue Aug 20, 2024 · 1 comment
Open
1 task done
Labels
bug Something isn't working

Comments

@shraddhagrawal
Copy link

Is this a possible security vulnerability?

  • This is NOT a possible security vulnerability

Describe the bug

We are not able to see data folder created with storage type file and metadata folder have only metadata files and manifest list and manifest files are also not getting listed.
I am able to query table
https://github.com/polaris-catalog/polaris?tab=readme-ov-file#connecting-from-an-engine

To Reproduce

  1. https://github.com/polaris-catalog/polaris?tab=readme-ov-file#building-and-running
  2. https://github.com/polaris-catalog/polaris?tab=readme-ov-file#connecting-from-an-engine
  3. create namespace and table using spark
  4. check temp folder and check files in metadata folder

Actual Behavior

data folder is missing for table

Expected Behavior

data folder and manifest list and manifest files should be listed and visible

Additional context

No response

System information

macOs

@shraddhagrawal shraddhagrawal added the bug Something isn't working label Aug 20, 2024
@MonkeyCanCode
Copy link
Contributor

MonkeyCanCode commented Aug 27, 2024

Hello,

I am able to run this locally and here are what I did:

# (on terminal 1) start the application
./gradlew runApp

# (on terminal 2) create catalog, catalog role, principal, principal, grant, and namespace
export CLIENT_ID=xxxxx
export CLIENT_SECRET=xxxxx
./polaris catalogs create --storage-type file --default-base-location file:///tmp/test quickstart_catalog
./polaris principals create quickstart_user
./polaris principal-roles create quickstart_user_role
./polaris catalog-roles create --catalog quickstart_catalog quickstart_catalog_role
./polaris principal-roles grant --principal quickstart_user quickstart_user_role
./polaris catalog-roles grant --catalog quickstart_catalog --principal-role quickstart_user_role quickstart_catalog_role
./polaris privileges catalog grant --catalog quickstart_catalog --catalog-role quickstart_catalog_role CATALOG_MANAGE_CONTENT
./polaris namespaces create --catalog quickstart_catalog quickstart_namespace

# (on terminal 2) setup spark and default spark conf (sample spark conf below with credential from principal quickstart_user obtain from previous step)
spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog
spark.jars.packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.13:1.5.0,org.apache.hadoop:hadoop-aws:3.4.0,software.amazon.awssdk:bundle:2.23.19,software.amazon.awssdk:url-connection-client:2.23.19
spark.sql.iceberg.vectorization.enabled false
spark.sql.catalog.polaris.type rest
spark.sql.catalog.polaris org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.polaris.uri http://localhost:8181/api/catalog
spark.sql.catalog.polaris.token-refresh-enabled true
spark.sql.catalog.polaris.credential xxxxx:xxxxx
spark.sql.catalog.polaris.warehouse quickstart_catalog
spark.sql.catalog.polaris.scope PRINCIPAL_ROLE:ALL
spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation true
spark.sql.catalog.polaris.io-impl org.apache.iceberg.io.ResolvingFileIO

# (on terminal 2) create table via pyspark
>>> spark.sql("use polaris")
DataFrame[]
>>> spark.sql("use quickstart_namespace")
DataFrame[]
>>> spark.sql("create table test_tbl (id int, value int)")
DataFrame[]

# (on terminal 3) check fs path
xxxxx@DESKTOP:~/polaris(main)$ ls -l /tmp/test
total 4
drwxr-xr-x 3 yong yong 4096 Aug 26 21:30 quickstart_namespace
xxxxx@DESKTOP:~/polaris(main)$ ls -l /tmp/test/quickstart_namespace/
total 4
drwxr-xr-x 3 yong yong 4096 Aug 26 21:30 test_tbl
xxxxx@DESKTOP:~/polaris(main)$ ls -l /tmp/test/quickstart_namespace/test_tbl/
total 4
drwxr-xr-x 2 yong yong 4096 Aug 26 21:30 metadata

# (on terminal 2) insert dummy record
>>> spark.sql("insert into test_tbl values(1,2)")
DataFrame[]

# (on terminal 3) check fs path
xxxxx@DESKTOP:~/polaris(main)$ ls -l /tmp/test/quickstart_namespace/test_tbl/data/
total 4
-rw-r--r-- 1 yong yong 608 Aug 26 21:31 00000-0-063831d9-208d-4cc0-9cae-f316edff15c1-0-00001.parquet
xxxxx@DESKTOP:~/polaris(main)$ ls -l /tmp/test/quickstart_namespace/test_tbl/metadata/
total 24
-rw-r--r-- 1 yong yong 1028 Aug 26 21:30 00000-16559417-6d2b-481e-a4df-72cf8a17f9f6.metadata.json
-rw-r--r-- 1 yong yong 2111 Aug 26 21:31 00001-5124a0c6-6ebd-420f-a6e7-5c3b78b2f3c2.metadata.json
-rw-r--r-- 1 yong yong 6669 Aug 26 21:31 8a607974-9597-48ea-9471-3299891dd0a0-m0.avro
-rw-r--r-- 1 yong yong 4234 Aug 26 21:31 snap-7058842161523378569-1-8a607974-9597-48ea-9471-3299891dd0a0.avro

@shraddhagrawal can u try the step above and see if u still see the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants