Skip to content

Conversation

@sfc-gh-prsingh
Copy link
Owner

@sfc-gh-prsingh sfc-gh-prsingh commented Nov 7, 2025

About the change

Presently Horizon (Snowflake IRC catalog) returns a 403 if the table has FGAC policies defined, idea is for these table for which IRC returns 403 we should use the spark-snowflake connector to read the table and hence produce the result.
A user configures both IRC and JDBC configs in its spark job and this change enables to flip it.

Couple of things to note :
Spark's inbuilt Analyzer rule ResolveRelation halts execution when its sees a 403, so the only way to bypass this is from the catalog we throw a 404 in this case spark doesn't halts its execution.

This PR introduces a couple of classes :
SnowflakeFallbackCatalog this wraps the Iceberg's spark catalog as a delegations catalog and helps in overriding the behaviour observed from the server i.e 403 and turns it to a 404
ResolveSnowflakeRelations this is an analyzer rule which does the relation flipping from UnResolvedRelation to snowflake relation, so that the Spark Snowflake connector can come into effect and push parts of query fragment to Snowflake warehouse configured.
SnowflakeSparkSessionExtensions : where we can inject the analyzer rule to the spark

All these classes will be packaged in the existing runtime jar of the spark snowflake connector and work E2E.

Overall a sample spark job would look like this :

pyspark \
  --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.2,org.apache.iceberg:iceberg-aws-bundle:1.9.2 \
  --jars spark-snowflake_2.12-3.1.4.jar,snowflake-jdbc-3.24.0.jar \
  --conf "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.apache.spark.sql.snowflake.extensions.analyzer.SnowflakeSparkSessionExtensions" \
  --conf "spark.sql.defaultCatalog=COTOPAXI_CATALOG" \
  --conf "spark.sql.catalog.COTOPAXI_CATALOG=org.apache.spark.sql.snowflake.catalog.SnowflakeFallbackCatalog" \
  --conf "spark.sql.catalog.COTOPAXI_CATALOG.catalog-impl=org.apache.iceberg.spark.SparkCatalog" \
  --conf "spark.sql.catalog.COTOPAXI_CATALOG.type=rest" \
  --conf "spark.sql.catalog.COTOPAXI_CATALOG.uri=<CATALOG-URI>" \
  --conf "spark.sql.catalog.COTOPAXI_CATALOG.credential=<PAT>" \
  --conf "spark.sql.catalog.COTOPAXI_CATALOG.header.X-Iceberg-Access-Delegation=vended-credentials" \
  --conf "spark.sql.catalog.COTOPAXI_CATALOG.warehouse=COTOPAXI_CATALOG" \
  --conf "spark.sql.catalog.COTOPAXI_CATALOG.scope=session:role:COTOPAXI_ROLE" \
  --conf "spark.sql.iceberg.vectorization.enabled=false" \
  --conf "spark.snowflake.sfURL=<URL>" \
  --conf "spark.snowflake.sfUser=COTOPAXI_USER" \
  --conf "spark.snowflake.sfPassword=<>" \
  --conf "spark.snowflake.sfDatabase=COTOPAXI_CATALOG" \
  --conf "spark.snowflake.sfSchema=COTOPAXI_NAMESPACE" \
  --conf "spark.snowflake.sfRole=COTPAXI_ROLE" \
  --conf "spark.snowflake.sfWarehouse=REGRESS" \
  --conf "spark.snowflake.extensions.fgacJdbcFallback.enabled=true"

case _ => u
}

case i: InsertIntoStatement =>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to change the write path?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally yes, snowflake's FGAC in case of writes is not that comprehensive, what it does is snowflake only allows reading the filtered data (post FGAC enforcement) but allows writing the caller if it has grants for the same, hence but just redirecting to the JDBC source which already supports writes it should be taken care, IMHO, WDYT ?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while experiments i realized we don't need extension specially after we having Catalog client, we can get rid of extension completely

detiails :
https://github.com/sfc-gh-prsingh/spark-snowflake/pull/2/files#diff-71b9c228a35d1e1b665bd01825e11838f45dc78b3d8837a9e9cd18f61f262734R85

please let me know what do you think

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V1Table fallback is cleaner and removes the need for an extension. It is indeed a better solution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants