Add Iceberg Filter Pushdown Optimizer Rule for execution with Velox #20501

imjalpreet · 2023-08-05T00:53:28Z

Co-authors: @imjalpreet, @agrawalreetika and @tdcmeehan

Description

This PR introduces the below changes in Iceberg Connector in an ongoing effort to make it compatible with execution on Velox:

New config property and session property to control the behavior of certain features like Filter Pushdown based on Java or Native Worker
Add new fields in IcebergTableLayoutHandle to implement filter pushdown
Add Iceberg Filter Pushdown Optimizer Rule for execution with Velox

Motivation and Context

facebookincubator/velox#5977

Impact

Test Plan

These changes have been tested with Iceberg Catalog type Hive/Glue and Hadoop.

Contributor checklist

Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Iceberg Changes
* Add ``iceberg.pushdown-filter-enabled`` config property to Iceberg Connector. This config property will control the behaviour of Filter Pushdown in the iceberg connector.
* Add Iceberg Filter Pushdown Optimizer Rule for execution with Velox

imjalpreet · 2023-08-05T00:58:09Z

@yingsu00 As we discussed, this PR includes the changes required for implementing the new Filter Pushdown optimizer rule compatible with Prestissimo/Velox.

tdcmeehan · 2023-08-08T14:12:17Z

I realize this is a draft, but one early point of feedback is I wouldn't really find it advisable to encode the worker type directly in the connector. If anything this is an engine concept, not an Iceberg connector concept.

tdcmeehan · 2023-08-08T14:15:15Z

One design point of Aria scan that I wished we addressed was that it's not really the proper place for the connector to extract expressions for filter pushdown. This is something that ought to be handled in the engine.

In lieu of that major change, I wonder if there is any way to extract common logic from the Hive connector so it doesn't need to be completely re-done from scratch.

imjalpreet · 2023-08-08T14:41:36Z

I realize this is a draft, but one early point of feedback is I wouldn't really find it advisable to encode the worker type directly in the connector. If anything this is an engine concept, not an Iceberg connector concept.

Yes sure, this was one of the questions I was going to ask. I was also planning to include this in a common config class and system session properties.

yingsu00 · 2023-08-09T21:52:36Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergConfig.java

+        return workerType;
+    }
+
+    @Config("iceberg.execution.worker.type")


I think it would be better to add "pushdown-filter-enabled" instead of testing worker type. If it's enabled, then the rule will be executed and the TableLayout will contain relavant information

I thought it might be better to add worker type since there might be other changes that we might add that are specific to worker type as well. With the current changes, filter pushdown will only work with native workers. To support it for Java workers, we would need to make additional changes. So, if we add the config pushdown-filter-enabled it might be a little confusing, I think until we also support filter pushdown with Java workers.

To add some details, in this PR changes for the filter pushdown feature may not work as it is if we are using Java workers since there are no changes made in the Iceberg connector Worker code on the Java side as per the filter pushdown feature if required.

Yes, and we need to make sure this pushdown-filter-enabled is set to false by default

yingsu00 · 2023-08-09T22:01:28Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergTableLayoutHandle.java

+    private final RowExpression remainingPredicate;
+    private final Map<String, IcebergColumnHandle> predicateColumns;
+    private final TupleDomain<ColumnHandle> partitionColumnPredicate;
+    private final Optional<Set<IcebergColumnHandle>> requestedColumns;
    private final IcebergTableHandle table;
    private final TupleDomain<ColumnHandle> tupleDomain;


We need to either remove the original tupleDomain or subclass. This is used for Iceberg to filter files. Can you check how Hive does it when filter pushdown was set to false(In HiveMetadata.java ln 2742)? One possible way is to reconstruct it using the new fields when filtering the files. We just need to make sure the transformation is equivalent.

@yingsu00 Yes we can remove the original tupleDomain. It can be constructed by transforming domainPredicate field.

yingsu00 · 2023-08-17T01:44:15Z

I realize this is a draft, but one early point of feedback is I wouldn't really find it advisable to encode the worker type directly in the connector. If anything this is an engine concept, not an Iceberg connector concept.

Yes I agree. This should be "pushdown-filter-enabled" property, not worker type.

yingsu00 · 2023-08-17T01:54:03Z

One design point of Aria scan that I wished we addressed was that it's not really the proper place for the connector to extract expressions for filter pushdown. This is something that ought to be handled in the engine.

@tdcmeehan I think there was some debate over this. On one hand, it makes sense to handle this in the engine, but on the other hand, the engine does not know what the connectors can do. If we want the engine to handle this, the engine needs to ask each connector: can you do this? can you do that? There will have to be a pre-defined operations superset for what all connectors can do, and the engine logic becomes complex. I think this is what Trino chose to do. But Presto didn't do it this way, instead, it allows the connector to modify the plan shape because it knows what itself can do or cannot do. This way is more flexible and the has better separation of concerns. Both ways have their pros can cons and so far the Presto implementation seems working well.

In lieu of that major change, I wonder if there is any way to extract common logic from the Hive connector so it doesn't need to be completely re-done from scratch.

We actually talked about this a while ago, but @imjalpreet found some hurdles to just call directly into the Hive rule. Jalpreet, can you remind us what the issue was?

yingsu00 · 2023-08-17T02:45:53Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergFilterPushdown.java

+                        .transform(subfield -> !isEntireColumn(subfield) ? subfield : null)
+                        .getDomains()
+                        .orElse(ImmutableMap.of()))
+                .build());


Why is this part not present?

if (currentLayoutHandle.isPresent()) { entireColumnDomain = entireColumnDomain.intersect(((HiveTableLayoutHandle) (currentLayoutHandle.get())).getPartitionColumnPredicate()); }

This will be added as part of the changes that I am working on for partitioned tables since this is only needed for partitioned tables.

yingsu00 · 2023-08-17T02:50:19Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergFilterPushdown.java

+            }
+        }
+
+        org.apache.iceberg.Table icebergTable;


icebergTable is not used?

I might have included this change by mistake, it is part of the changes for partitioned tables. icebergTable will be used to get partition columns.

I will remove this variable from the current commit and it will be added back as part of the partition table changes.

yingsu00 · 2023-08-17T02:50:59Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergFilterPushdown.java

+                ConstraintEvaluator evaluator = new ConstraintEvaluator(rowExpressionService, session, columnHandles, deterministicPredicate);
+                constraint = new Constraint<>(entireColumnDomain, evaluator::isCandidate);
+            }
+        }


Most of the lines are the same between Hive and Iceberg. Consider extract them out as one or several utility functions in Hive.

yingsu00 · 2023-08-17T04:44:44Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergSessionProperties.java

@@ -78,6 +78,7 @@ public final class IcebergSessionProperties
    private static final String NESSIE_REFERENCE_HASH = "nessie_reference_hash";
    public static final String READ_MASKED_VALUE_ENABLED = "read_null_masked_parquet_encrypted_value_enabled";
    public static final String PARQUET_DEREFERENCE_PUSHDOWN_ENABLED = "parquet_dereference_pushdown_enabled";
+    public static final String WORKER_TYPE = "worker_type";


Use PUSHDOWN_FILTER_ENABLED, similar to Hive.

public static final String PUSHDOWN_FILTER_ENABLED = "pushdown_filter_enabled";

Done, as requested by you.

imjalpreet · 2023-08-22T21:48:27Z

In lieu of that major change, I wonder if there is any way to extract common logic from the Hive connector so it doesn't need to be completely re-done from scratch.

We actually talked about this a while ago, but @imjalpreet found some hurdles to just call directly into the Hive rule. Jalpreet, can you remind us what the issue was?

Yes, we saw some issues in our initial implementation. But the current implementation can be refactored by extracting common logic between Hive and Iceberg implementations. It's currently in progress in one of the draft commits in this PR.

yingsu00

@imjalpreet Is this PR read for review?
I see you still have this commit "Add iceberg.execution.worker.type config property to Iceberg Connector

@imjalpreet". Could you please sort out the commits so that each one is handling a sub problem? For example, the above commits shall be removed, or squashed with the updates later.

imjalpreet · 2023-08-29T06:36:39Z

@yingsu00 yes it is ready for review. I had planned to refine and re-order the commits after this review. For now I had kept separate commits, so that it would be easier to review the changes done post your last review.

Yes, the commit you have mentioned will get squashed and removed.

If you want me to re-order and squash unnecessary commits before the review, I can do that, please let me know.

yingsu00 · 2023-08-29T23:21:00Z

@yingsu00 yes it is ready for review. I had planned to refine and re-order the commits after this review. For now I had kept separate commits, so that it would be easier to review the changes done post your last review.

Yes, the commit you have mentioned will get squashed and removed.

If you want me to re-order and squash unnecessary commits before the review, I can do that, please let me know.

@imjalpreet Yes, please re-order and squash unnecessary commits before the review. The commits and their messages are part of the review targets.

yingsu00 · 2023-08-29T23:26:39Z

@imjalpreet Can you please also resolve the conflicts?

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java

presto-iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergFilterPushdown.java

yingsu00 · 2023-09-06T05:12:20Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergFilterPushdown.java

+            RowExpression filter,
+            Optional<ConnectorTableLayoutHandle> currentLayoutHandle)
+    {
+        Result result = checkConstantBooleanExpression(rowExpressionService, functionResolution, filter, currentLayoutHandle, metadata, session);


line 127 to 146 are still duplicate with HiveFilterPushdown. Is it possible to move it to the util class after the partitioned table support is added?

Sure, I will try to see if it can be extracted out to the util class in the partitioned table PR.

yingsu00 · 2023-09-13T07:31:14Z

presto-hive/src/main/java/com/facebook/presto/hive/util/FilterPushdownUtils.java

@@ -0,0 +1,532 @@
+/*


@imjalpreet I actually think it's better to subclass HiveFilterPushdown than creating a util class. The functions in this util class are all static, therefore you had to pass a lot of parameters to them, which were class member fields of HiveFilterPushdown. If we subclass HiveFilterPushdown, these common class fields can be used by the derived IcebergFilterPushdown, and the code can be a lot less. For that, we can change the metadata parameter to be ConnectorMetadata type. What do you think?

@yingsu00 I had thought about it as well, but I saw a few drawbacks. HiveFilterPushdown has some class fields like HiveTransactionManager, HivePartitionManager, etc. which are not required in IcebergFilterPushdown since it has its own versions like IcebergTransactionManager. Also, in HiveFilterPushdown's pushdownFilter method, we have HiveMetadata whereas IcebergFilterPushdown we have implementations of IcebergAbstractMetadata.

IMO directly extending HiveFilterPushdown class might not be ideal, if we want I think we can try creating a new base class extracting the common fields between Hive and Iceberg and then derive both HiveFilterPushdown and IcebergFilterPushdown from that base class.

Let me know what you think

@imjalpreet Making a base class is ok. I think most of the content can be in the base class and only very minimal difference between the two derived classes.

In filterPushdown(), metadata is only used at two places: metadata.getColumnHandles(session, tableHandle) and metadata.getTableLayout(...) and both are on ConnectorMetadata. So it's ok to change the metadata parameter from HiveMetadata and IcebergAbstractMetadata to ConnectorMetadata.

Now let's look at transactionManager and icebergTransactionManager. They are only used to get the Metadata in getMetadata(TableHandle tableHandle) method:

protected IcebergAbstractMetadata getMetadata(TableHandle tableHandle) { ConnectorMetadata metadata = icebergTransactionManager.get(tableHandle.getTransaction()); checkState(metadata instanceof IcebergAbstractMetadata, "metadata must be IcebergAbstractMetadata"); return (IcebergAbstractMetadata) metadata; }

The downcast was only to make sure the metadata is IcebergAbstractMetadata, but icebergTransactionManager would always return IcebergXXXMetadata so returning IcebergAbstractMetadata instead of ConnectorMetadata is not very necessary. If we make it just return a ConnectorMetadata, then the getMetadata() method can have the same signature and be put to the base class.

partitionManager was also only used once.

HivePartitionResult hivePartitionResult = partitionManager.getPartitions(metastore, tableHandle, constraint, session);

Similar to getMetadata(), we can have an abstract method in the base class, depending on how you express the Iceberg partition result. If that way doesn't work we can also break filterPushdown into multiple sections, with most sections having the same content.

So please go ahead and extract common base class.

@yingsu00 I have refactored the filter pushdown and introduced a new abstract class.

presto-iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergFilterPushdown.java

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergConfig.java

presto-hive/src/main/java/com/facebook/presto/hive/util/FilterPushdownUtils.java

imjalpreet · 2023-11-27T16:35:45Z

@yingsu00 I have pushed all the changes ~~apart from the test cases you just mentioned~~. The branch is also rebased on master now, merge conflicts have been resolved.

Update: Additional test cases have also been added.

tdcmeehan · 2023-11-28T21:31:33Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergFilterPushdown.java

+        protected final IcebergResourceFactory resourceFactory;
+        protected final HdfsEnvironment hdfsEnvironment;
+        protected final TypeManager typeManager;
+        protected IcebergTransactionManager icebergTransactionManager;


Why are these fields protected?

Suggested change

protected final IcebergResourceFactory resourceFactory;

protected final HdfsEnvironment hdfsEnvironment;

protected final TypeManager typeManager;

protected IcebergTransactionManager icebergTransactionManager;

protected final IcebergResourceFactory resourceFactory;

protected final HdfsEnvironment hdfsEnvironment;

protected final TypeManager typeManager;

protected final IcebergTransactionManager icebergTransactionManager;

My bad, just realized we can keep them private as well. I have made the change for all the protected fields.

tdcmeehan · 2023-11-28T21:32:19Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergFilterPushdown.java

+    protected final IcebergResourceFactory resourceFactory;
+    protected final HdfsEnvironment hdfsEnvironment;
+    protected final TypeManager typeManager;
+    protected IcebergTransactionManager icebergTransactionManager;


Why protected?

This has been removed

yingsu00 · 2023-11-29T04:45:50Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergFilterPushdown.java

+    protected final IcebergResourceFactory resourceFactory;
+    protected final HdfsEnvironment hdfsEnvironment;
+    protected final TypeManager typeManager;
+    protected IcebergTransactionManager icebergTransactionManager;


It seems icebergTransactionManager doesn't need to be persisted as a class member.

We would need to persist it here since we need to pass it to SubfieldExtractionRewriter

yingsu00 · 2023-11-29T04:46:59Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergFilterPushdown.java

+        protected final IcebergResourceFactory resourceFactory;
+        protected final HdfsEnvironment hdfsEnvironment;
+        protected final TypeManager typeManager;
+        protected IcebergTransactionManager icebergTransactionManager;


Ditto. icebergTransactionManager doesn't need to be persisted as a class member.

I agree, I have removed the class member here.

…package

…riter Co-authored-by: Tim Meehan <[email protected]>

Filter Pushdown is only supported with Native Worker

…Filter Pushdown 1. Add new fields in IcebergTableLayoutHandle and IcebergColumnHandle required for Filter Pushdown 2. Remove tupleDomain from IcebergTableLayoutHandle and instead use domainPredicate 3. Refactor IcebergTableLayoutHandle and IcebergColumnHandle to extend the Base classes 4. Add new utility methods for computing partition columns

tdcmeehan

Couple other nits otherwise LGTM

tdcmeehan · 2023-11-29T18:26:12Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergFilterPushdown.java

+        {
+            Table icebergTable;
+            if (metadata instanceof IcebergHiveMetadata) {
+                ExtendedHiveMetastore metastore = ((IcebergHiveMetadata) metadata).getMetastore();


Please add upfront validation that these casts are correct via checkArgument

Added a validation for both metadata and tableHandle

presto-iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergFilterPushdown.java

yingsu00 · 2023-11-30T08:46:05Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergFilterPushdown.java

+    private final IcebergResourceFactory resourceFactory;
+    private final HdfsEnvironment hdfsEnvironment;
+    private final TypeManager typeManager;
+    private final IcebergTransactionManager icebergTransactionManager;


icebergTransactionManager is still here. Is it not removable?

@yingsu00 yes, it is being used in the method com.facebook.presto.iceberg.optimizer.IcebergFilterPushdown#optimize

Co-authored-by: Reetika Agrawal <[email protected]> Co-authored-by: Tim Meehan <[email protected]>

mbasmanova · 2023-12-19T21:30:05Z

@yingsu00 @tdcmeehan Folks, we are seeing build failures in Meta and after fixing these we see crashes. Wondering, if you could summarize the changes to help us figure out what's going on.

Stack: [0x00007f0344226000,0x00007f0344326000],  sp=0x00007f0344239ff8,  free space=79k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 8020 C2 java.util.regex.Pattern$LastNode.match(Ljava/util/regex/Matcher;ILjava/lang/CharSequence;)Z (45 bytes) @ 0x00007f032e3eea00 [0x00007f032e3ee9e0+0x20]
C  0x0000000000000000

tdcmeehan · 2023-12-19T21:34:01Z

@mbasmanova the idea behind this change is we need similar filter pushdown logic for the Iceberg catalog as what we currently have in Hive. Because the code is rather large in the Hive connector, the idea is to extract it so it may be used for other Hive-adjacent connectors in the future, such as Delta and Hudi (the alternative would be to do such filter extraction and pushdown in the engine).

Can you share a more detailed error message?

mbasmanova · 2023-12-19T21:43:11Z

@tdcmeehan Tim, thank you for these additional details. Here is all I have in terms of an error message:

com.facebook.presto.spi.PrestoException: statement is too large (stack overflow during analysis)
	at com.facebook.presto.execution.SqlQueryExecution.createLogicalPlanAndOptimize(SqlQueryExecution.java:576)
	at com.facebook.presto.execution.SqlQueryExecution.start(SqlQueryExecution.java:451)
	at com.facebook.presto.$gen.Presto_0_286_SNAPSHOT_8fb73fc__0_286_20231219_012521_61____20231219_025901_1.run(Unknown Source)
	at com.facebook.presto.execution.SqlQueryManager.createQuery(SqlQueryManager.java:306)
	at com.facebook.presto.dispatcher.LocalDispatchQuery.lambda$startExecution$8(LocalDispatchQuery.java:211)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.StackOverflowError
	at java.base/java.util.regex.Pattern$GroupHead.match(Pattern.java:4804)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4749)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4747)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4747)
	at java.base/java.util.regex.Pattern$BranchConn.match(Pattern.java:4713)
	at java.base/java.util.regex.Pattern$GroupTail.match(Pattern.java:4863)
	at java.base/java.util.regex.Pattern$BmpCharPropertyGreedy.match(Pattern.java:4344)
	at java.base/java.util.regex.Pattern$GroupHead.match(Pattern.java:4804)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4749)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4747)
	at java.base/java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3964)
	at java.base/java.util.regex.Pattern$Start.match(Pattern.java:3619)
	at java.base/java.util.regex.Matcher.search(Matcher.java:1729)
	at java.base/java.util.regex.Matcher.find(Matcher.java:773)
	at java.base/java.util.Formatter.parse(Formatter.java:2702)
	at java.base/java.util.Formatter.format(Formatter.java:2655)
	at java.base/java.util.Formatter.format(Formatter.java:2609)
	at java.base/java.lang.String.format(String.java:2897)
	at com.facebook.presto.hive.HiveColumnHandle.<init>(HiveColumnHandle.java:99)
	at com.facebook.presto.hive.HiveColumnHandle.<init>(HiveColumnHandle.java:119)
	at com.facebook.presto.hive.HiveUtil.getRegularColumnHandles(HiveUtil.java:974)
	at com.facebook.presto.hive.HiveUtil.hiveColumnHandles(HiveUtil.java:949)
	at com.facebook.presto.hive.HiveMetadata.getColumnHandles(HiveMetadata.java:811)
	at com.facebook.presto.hive.rule.BaseSubfieldExtractionRewriter.pushdownFilter(BaseSubfieldExtractionRewriter.java:243)
	at com.facebook.presto.hive.PrismFilterPushdown$SubfieldExtractionRewriter.getConnectorPushdownFilterResult(PrismFilterPushdown.java:125)

mbasmanova · 2023-12-19T21:48:28Z

com.facebook.presto.spi.PrestoException: statement is too large (stack overflow during analysis)
	at com.facebook.presto.execution.SqlQueryExecution.createLogicalPlanAndOptimize(SqlQueryExecution.java:576)
	at com.facebook.presto.execution.SqlQueryExecution.start(SqlQueryExecution.java:451)
	at com.facebook.presto.$gen.Presto_0_286_SNAPSHOT_8fb73fc__0_286_20231219_012521_61____20231219_025901_1.run(Unknown Source)
	at com.facebook.presto.execution.SqlQueryManager.createQuery(SqlQueryManager.java:306)
	at com.facebook.presto.dispatcher.LocalDispatchQuery.lambda$startExecution$8(LocalDispatchQuery.java:211)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.StackOverflowError
	at java.base/java.util.regex.Pattern$GroupHead.match(Pattern.java:4804)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4749)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4747)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4747)
	at java.base/java.util.regex.Pattern$BranchConn.match(Pattern.java:4713)
	at java.base/java.util.regex.Pattern$GroupTail.match(Pattern.java:4863)
	at java.base/java.util.regex.Pattern$BmpCharPropertyGreedy.match(Pattern.java:4344)
	at java.base/java.util.regex.Pattern$GroupHead.match(Pattern.java:4804)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4749)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4747)
	at java.base/java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3964)
	at java.base/java.util.regex.Pattern$Start.match(Pattern.java:3619)
	at java.base/java.util.regex.Matcher.search(Matcher.java:1729)
	at java.base/java.util.regex.Matcher.find(Matcher.java:773)
	at java.base/java.util.Formatter.parse(Formatter.java:2702)
	at java.base/java.util.Formatter.format(Formatter.java:2655)
	at java.base/java.util.Formatter.format(Formatter.java:2609)
	at java.base/java.lang.String.format(String.java:2897)
	at com.facebook.presto.hive.HiveColumnHandle.<init>(HiveColumnHandle.java:99)
	at com.facebook.presto.hive.HiveColumnHandle.<init>(HiveColumnHandle.java:119)
	at com.facebook.presto.hive.HiveUtil.getRegularColumnHandles(HiveUtil.java:974)
	at com.facebook.presto.hive.HiveUtil.hiveColumnHandles(HiveUtil.java:949)
	at com.facebook.presto.hive.HiveMetadata.getColumnHandles(HiveMetadata.java:811)
	at com.facebook.presto.hive.rule.BaseSubfieldExtractionRewriter.pushdownFilter(BaseSubfieldExtractionRewriter.java:243)
	at com.facebook.presto.hive.PrismFilterPushdown$SubfieldExtractionRewriter.getConnectorPushdownFilterResult(PrismFilterPushdown.java:125)
	at com.facebook.presto.hive.rule.BaseSubfieldExtractionRewriter.pushdownFilter(BaseSubfieldExtractionRewriter.java:259)
	at com.facebook.presto.hive.PrismFilterPushdown$SubfieldExtractionRewriter.getConnectorPushdownFilterResult(PrismFilterPushdown.java:125)
	at com.facebook.presto.hive.rule.BaseSubfieldExtractionRewriter.pushdownFilter(BaseSubfieldExtractionRewriter.java:259)
	at com.facebook.presto.hive.PrismFilterPushdown$SubfieldExtractionRewriter.getConnectorPushdownFilterResult(PrismFilterPushdown.java:125)
	at com.facebook.presto.hive.rule.BaseSubfieldExtractionRewriter.pushdownFilter(BaseSubfieldExtractionRewriter.java:259)
	at com.facebook.presto.hive.PrismFilterPushdown$SubfieldExtractionRewriter.getConnectorPushdownFilterResult(PrismFilterPushdown.java:125)

The last lines repeat a million times it seems.

tdcmeehan · 2023-12-19T21:51:25Z

Would it be possible to share the implementation of PrismFilterPushdown?

mbasmanova · 2023-12-19T22:10:31Z

@tdcmeehan Sure. Let me know if you want to see the whole file.

        @Override
        public ConnectorPushdownFilterResult getConnectorPushdownFilterResult(
                Map<String, ColumnHandle> columnHandles,
                ConnectorMetadata connectorMetadata,
                ConnectorSession session,
                RemainingExpressions remainingExpressions,
                DomainTranslator.ExtractionResult<Subfield> decomposedFilter,
                RowExpression optimizedRemainingExpression,
                Constraint<ColumnHandle> constraint,
                Optional<ConnectorTableLayoutHandle> currentLayoutHandle,
                ConnectorTableHandle tableHandle)
        {
            try {
                BaseSubfieldExtractionRewriter.ConnectorPushdownFilterResult connectorPushdownFilterResult = super.pushdownFilter(
                        session,
                        connectorMetadata,
                        tableHandle,
                        optimizedRemainingExpression,
                        currentLayoutHandle);
                return pushdownFilterWithBucketInfo(connectorPushdownFilterResult, session, metadata, tableHandle);
            }
            catch (RuntimeException e) {
                throw rewriteMetastoreException(e);
            }
        }

tdcmeehan · 2023-12-20T01:22:41Z

Seems like super.pushdownFilter is returning to BaseSubfieldExtractionRewriter.pushdownFilter, causing infinite recursion. Does the Prism connector need to override this behavior at all or can it just use Hive?

mbasmanova · 2023-12-20T01:24:11Z

@tdcmeehan This is a good point.

CC: @shrinidhijoshi

imjalpreet requested a review from a team as a code owner August 5, 2023 00:53

imjalpreet self-assigned this Aug 5, 2023

imjalpreet requested a review from presto-oss August 5, 2023 00:53

imjalpreet marked this pull request as draft August 5, 2023 00:53

imjalpreet requested a review from yingsu00 August 5, 2023 00:58

imjalpreet force-pushed the icebergFilterPushdown branch from 065ce0f to e65a80e Compare August 6, 2023 09:47

yingsu00 reviewed Aug 9, 2023

View reviewed changes

yingsu00 reviewed Aug 17, 2023

View reviewed changes

imjalpreet force-pushed the icebergFilterPushdown branch 3 times, most recently from 6e56595 to 975b59b Compare August 22, 2023 21:44

imjalpreet force-pushed the icebergFilterPushdown branch 3 times, most recently from 1147612 to 8934ae5 Compare August 25, 2023 14:41

yingsu00 reviewed Aug 29, 2023

View reviewed changes

imjalpreet force-pushed the icebergFilterPushdown branch from 8934ae5 to d27662e Compare August 30, 2023 22:06

imjalpreet marked this pull request as ready for review August 30, 2023 22:21

imjalpreet requested a review from yingsu00 August 30, 2023 22:22

agrawalreetika mentioned this pull request Sep 1, 2023

Iceberg split changes for execution with Velox #20738

Merged

6 tasks

yingsu00 reviewed Sep 13, 2023

View reviewed changes

imjalpreet force-pushed the icebergFilterPushdown branch 2 times, most recently from e177934 to 2e689fe Compare November 27, 2023 18:06

yingsu00 approved these changes Nov 28, 2023

View reviewed changes

imjalpreet force-pushed the icebergFilterPushdown branch from 2e689fe to 4d98884 Compare November 28, 2023 16:10

tdcmeehan requested changes Nov 28, 2023

View reviewed changes

yingsu00 reviewed Nov 29, 2023

View reviewed changes

imjalpreet and others added 7 commits November 29, 2023 11:13

Introduce BaseHiveTableLayoutHandle

5fdf3c2

Introduce BaseHiveColumnHandle

411233b

Move HivePartialAggregationPushdown to com.facebook.presto.hive.rule …

6eecb7c

…package

Refactor Hive Filter Pushdown and Introduce BaseSubfieldExtractionRew…

e603fbc

…riter Co-authored-by: Tim Meehan <[email protected]>

Refactor HiveMetadata and Introduce MetadataUtils in presto-hive-common

82e7037

Add iceberg.pushdown-filter-enabled config property

2190bd7

Filter Pushdown is only supported with Native Worker

imjalpreet force-pushed the icebergFilterPushdown branch from 4d98884 to 89edaf4 Compare November 29, 2023 05:53

tdcmeehan approved these changes Nov 29, 2023

View reviewed changes

yingsu00 reviewed Nov 30, 2023

View reviewed changes

Add Iceberg Filter Pushdown Optimizer Rule for execution with Velox

c1ec0ff

Co-authored-by: Reetika Agrawal <[email protected]> Co-authored-by: Tim Meehan <[email protected]>

imjalpreet force-pushed the icebergFilterPushdown branch from 89edaf4 to c1ec0ff Compare December 4, 2023 05:56

yingsu00 approved these changes Dec 4, 2023

View reviewed changes

yingsu00 merged commit d865192 into prestodb:master Dec 4, 2023
56 checks passed

wanglinsong mentioned this pull request Feb 12, 2024

Add release notes for 0.286 #21906

Merged

64 tasks

Add Iceberg Filter Pushdown Optimizer Rule for execution with Velox #20501

Add Iceberg Filter Pushdown Optimizer Rule for execution with Velox #20501

Conversation

imjalpreet commented Aug 5, 2023 • edited by yingsu00 Loading

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

imjalpreet commented Aug 5, 2023

tdcmeehan commented Aug 8, 2023

tdcmeehan commented Aug 8, 2023

imjalpreet commented Aug 8, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yingsu00 commented Aug 17, 2023

yingsu00 commented Aug 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imjalpreet commented Aug 22, 2023

yingsu00 left a comment

Choose a reason for hiding this comment

imjalpreet commented Aug 29, 2023

yingsu00 commented Aug 29, 2023

yingsu00 commented Aug 29, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imjalpreet commented Nov 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imjalpreet Nov 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tdcmeehan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbasmanova commented Dec 19, 2023

tdcmeehan commented Dec 19, 2023

mbasmanova commented Dec 19, 2023

mbasmanova commented Dec 19, 2023

tdcmeehan commented Dec 19, 2023

mbasmanova commented Dec 19, 2023

tdcmeehan commented Dec 20, 2023

mbasmanova commented Dec 20, 2023

imjalpreet commented Aug 5, 2023 •

edited by yingsu00

Loading

yingsu00 commented Aug 17, 2023 •

edited

Loading

imjalpreet commented Nov 27, 2023 •

edited

Loading

imjalpreet Nov 29, 2023 •

edited

Loading