Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Iceberg Filter Pushdown Optimizer Rule for execution with Velox #20501

Merged
merged 8 commits into from
Dec 4, 2023

Conversation

imjalpreet
Copy link
Member

@imjalpreet imjalpreet commented Aug 5, 2023

Co-authors: @imjalpreet, @agrawalreetika and @tdcmeehan

Description

This PR introduces the below changes in Iceberg Connector in an ongoing effort to make it compatible with execution on Velox:

  • New config property and session property to control the behavior of certain features like Filter Pushdown based on Java or Native Worker
  • Add new fields in IcebergTableLayoutHandle to implement filter pushdown
  • Add Iceberg Filter Pushdown Optimizer Rule for execution with Velox

Motivation and Context

facebookincubator/velox#5977

Impact

Test Plan

These changes have been tested with Iceberg Catalog type Hive/Glue and Hadoop.

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Iceberg Changes
* Add ``iceberg.pushdown-filter-enabled`` config property to Iceberg Connector. This config property will control the behaviour of Filter Pushdown in the iceberg connector.
* Add Iceberg Filter Pushdown Optimizer Rule for execution with Velox

@imjalpreet imjalpreet requested a review from a team as a code owner August 5, 2023 00:53
@imjalpreet imjalpreet self-assigned this Aug 5, 2023
@imjalpreet imjalpreet requested a review from presto-oss August 5, 2023 00:53
@imjalpreet imjalpreet marked this pull request as draft August 5, 2023 00:53
@imjalpreet
Copy link
Member Author

@yingsu00 As we discussed, this PR includes the changes required for implementing the new Filter Pushdown optimizer rule compatible with Prestissimo/Velox.

@imjalpreet imjalpreet requested a review from yingsu00 August 5, 2023 00:58
@imjalpreet imjalpreet force-pushed the icebergFilterPushdown branch from 065ce0f to e65a80e Compare August 6, 2023 09:47
@tdcmeehan
Copy link
Contributor

I realize this is a draft, but one early point of feedback is I wouldn't really find it advisable to encode the worker type directly in the connector. If anything this is an engine concept, not an Iceberg connector concept.

@tdcmeehan
Copy link
Contributor

One design point of Aria scan that I wished we addressed was that it's not really the proper place for the connector to extract expressions for filter pushdown. This is something that ought to be handled in the engine.

In lieu of that major change, I wonder if there is any way to extract common logic from the Hive connector so it doesn't need to be completely re-done from scratch.

@imjalpreet
Copy link
Member Author

I realize this is a draft, but one early point of feedback is I wouldn't really find it advisable to encode the worker type directly in the connector. If anything this is an engine concept, not an Iceberg connector concept.

Yes sure, this was one of the questions I was going to ask. I was also planning to include this in a common config class and system session properties.

return workerType;
}

@Config("iceberg.execution.worker.type")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to add "pushdown-filter-enabled" instead of testing worker type. If it's enabled, then the rule will be executed and the TableLayout will contain relavant information

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it might be better to add worker type since there might be other changes that we might add that are specific to worker type as well. With the current changes, filter pushdown will only work with native workers. To support it for Java workers, we would need to make additional changes. So, if we add the config pushdown-filter-enabled it might be a little confusing, I think until we also support filter pushdown with Java workers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add some details, in this PR changes for the filter pushdown feature may not work as it is if we are using Java workers since there are no changes made in the Iceberg connector Worker code on the Java side as per the filter pushdown feature if required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and we need to make sure this pushdown-filter-enabled is set to false by default

private final RowExpression remainingPredicate;
private final Map<String, IcebergColumnHandle> predicateColumns;
private final TupleDomain<ColumnHandle> partitionColumnPredicate;
private final Optional<Set<IcebergColumnHandle>> requestedColumns;
private final IcebergTableHandle table;
private final TupleDomain<ColumnHandle> tupleDomain;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to either remove the original tupleDomain or subclass. This is used for Iceberg to filter files. Can you check how Hive does it when filter pushdown was set to false(In HiveMetadata.java ln 2742)? One possible way is to reconstruct it using the new fields when filtering the files. We just need to make sure the transformation is equivalent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yingsu00 Yes we can remove the original tupleDomain. It can be constructed by transforming domainPredicate field.

@yingsu00
Copy link
Contributor

I realize this is a draft, but one early point of feedback is I wouldn't really find it advisable to encode the worker type directly in the connector. If anything this is an engine concept, not an Iceberg connector concept.

Yes I agree. This should be "pushdown-filter-enabled" property, not worker type.

@yingsu00
Copy link
Contributor

yingsu00 commented Aug 17, 2023

One design point of Aria scan that I wished we addressed was that it's not really the proper place for the connector to extract expressions for filter pushdown. This is something that ought to be handled in the engine.

@tdcmeehan I think there was some debate over this. On one hand, it makes sense to handle this in the engine, but on the other hand, the engine does not know what the connectors can do. If we want the engine to handle this, the engine needs to ask each connector: can you do this? can you do that? There will have to be a pre-defined operations superset for what all connectors can do, and the engine logic becomes complex. I think this is what Trino chose to do. But Presto didn't do it this way, instead, it allows the connector to modify the plan shape because it knows what itself can do or cannot do. This way is more flexible and the has better separation of concerns. Both ways have their pros can cons and so far the Presto implementation seems working well.

In lieu of that major change, I wonder if there is any way to extract common logic from the Hive connector so it doesn't need to be completely re-done from scratch.

We actually talked about this a while ago, but @imjalpreet found some hurdles to just call directly into the Hive rule. Jalpreet, can you remind us what the issue was?

.transform(subfield -> !isEntireColumn(subfield) ? subfield : null)
.getDomains()
.orElse(ImmutableMap.of()))
.build());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this part not present?

        if (currentLayoutHandle.isPresent()) {
            entireColumnDomain = entireColumnDomain.intersect(((HiveTableLayoutHandle) (currentLayoutHandle.get())).getPartitionColumnPredicate());
        }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be added as part of the changes that I am working on for partitioned tables since this is only needed for partitioned tables.

}
}

org.apache.iceberg.Table icebergTable;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

icebergTable is not used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have included this change by mistake, it is part of the changes for partitioned tables. icebergTable will be used to get partition columns.

I will remove this variable from the current commit and it will be added back as part of the partition table changes.

ConstraintEvaluator evaluator = new ConstraintEvaluator(rowExpressionService, session, columnHandles, deterministicPredicate);
constraint = new Constraint<>(entireColumnDomain, evaluator::isCandidate);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the lines are the same between Hive and Iceberg. Consider extract them out as one or several utility functions in Hive.

@@ -78,6 +78,7 @@ public final class IcebergSessionProperties
private static final String NESSIE_REFERENCE_HASH = "nessie_reference_hash";
public static final String READ_MASKED_VALUE_ENABLED = "read_null_masked_parquet_encrypted_value_enabled";
public static final String PARQUET_DEREFERENCE_PUSHDOWN_ENABLED = "parquet_dereference_pushdown_enabled";
public static final String WORKER_TYPE = "worker_type";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use PUSHDOWN_FILTER_ENABLED, similar to Hive.

public static final String PUSHDOWN_FILTER_ENABLED = "pushdown_filter_enabled";

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, as requested by you.

@imjalpreet imjalpreet force-pushed the icebergFilterPushdown branch 3 times, most recently from 6e56595 to 975b59b Compare August 22, 2023 21:44
@imjalpreet
Copy link
Member Author

In lieu of that major change, I wonder if there is any way to extract common logic from the Hive connector so it doesn't need to be completely re-done from scratch.

We actually talked about this a while ago, but @imjalpreet found some hurdles to just call directly into the Hive rule. Jalpreet, can you remind us what the issue was?

Yes, we saw some issues in our initial implementation. But the current implementation can be refactored by extracting common logic between Hive and Iceberg implementations. It's currently in progress in one of the draft commits in this PR.

@imjalpreet imjalpreet force-pushed the icebergFilterPushdown branch 3 times, most recently from 1147612 to 8934ae5 Compare August 25, 2023 14:41
Copy link
Contributor

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imjalpreet Is this PR read for review?
I see you still have this commit "Add iceberg.execution.worker.type config property to Iceberg Connector

@imjalpreet". Could you please sort out the commits so that each one is handling a sub problem? For example, the above commits shall be removed, or squashed with the updates later.

@imjalpreet
Copy link
Member Author

@yingsu00 yes it is ready for review. I had planned to refine and re-order the commits after this review. For now I had kept separate commits, so that it would be easier to review the changes done post your last review.

Yes, the commit you have mentioned will get squashed and removed.

If you want me to re-order and squash unnecessary commits before the review, I can do that, please let me know.

@yingsu00
Copy link
Contributor

@yingsu00 yes it is ready for review. I had planned to refine and re-order the commits after this review. For now I had kept separate commits, so that it would be easier to review the changes done post your last review.

Yes, the commit you have mentioned will get squashed and removed.

If you want me to re-order and squash unnecessary commits before the review, I can do that, please let me know.

@imjalpreet Yes, please re-order and squash unnecessary commits before the review. The commits and their messages are part of the review targets.

@yingsu00
Copy link
Contributor

@imjalpreet Can you please also resolve the conflicts?

@imjalpreet imjalpreet force-pushed the icebergFilterPushdown branch from 8934ae5 to d27662e Compare August 30, 2023 22:06
@imjalpreet imjalpreet marked this pull request as ready for review August 30, 2023 22:21
@imjalpreet imjalpreet requested a review from yingsu00 August 30, 2023 22:22
RowExpression filter,
Optional<ConnectorTableLayoutHandle> currentLayoutHandle)
{
Result result = checkConstantBooleanExpression(rowExpressionService, functionResolution, filter, currentLayoutHandle, metadata, session);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 127 to 146 are still duplicate with HiveFilterPushdown. Is it possible to move it to the util class after the partitioned table support is added?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will try to see if it can be extracted out to the util class in the partitioned table PR.

@@ -0,0 +1,532 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imjalpreet I actually think it's better to subclass HiveFilterPushdown than creating a util class. The functions in this util class are all static, therefore you had to pass a lot of parameters to them, which were class member fields of HiveFilterPushdown. If we subclass HiveFilterPushdown, these common class fields can be used by the derived IcebergFilterPushdown, and the code can be a lot less. For that, we can change the metadata parameter to be ConnectorMetadata type. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yingsu00 I had thought about it as well, but I saw a few drawbacks. HiveFilterPushdown has some class fields like HiveTransactionManager, HivePartitionManager, etc. which are not required in IcebergFilterPushdown since it has its own versions like IcebergTransactionManager. Also, in HiveFilterPushdown's pushdownFilter method, we have HiveMetadata whereas IcebergFilterPushdown we have implementations of IcebergAbstractMetadata.

IMO directly extending HiveFilterPushdown class might not be ideal, if we want I think we can try creating a new base class extracting the common fields between Hive and Iceberg and then derive both HiveFilterPushdown and IcebergFilterPushdown from that base class.

Let me know what you think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imjalpreet Making a base class is ok. I think most of the content can be in the base class and only very minimal difference between the two derived classes.

In filterPushdown(), metadata is only used at two places: metadata.getColumnHandles(session, tableHandle) and metadata.getTableLayout(...) and both are on ConnectorMetadata. So it's ok to change the metadata parameter from HiveMetadata and IcebergAbstractMetadata to ConnectorMetadata.

Now let's look at transactionManager and icebergTransactionManager. They are only used to get the Metadata in getMetadata(TableHandle tableHandle) method:

    protected IcebergAbstractMetadata getMetadata(TableHandle tableHandle)
    {
        ConnectorMetadata metadata = icebergTransactionManager.get(tableHandle.getTransaction());
        checkState(metadata instanceof IcebergAbstractMetadata, "metadata must be IcebergAbstractMetadata");
        return (IcebergAbstractMetadata) metadata;
    }

The downcast was only to make sure the metadata is IcebergAbstractMetadata, but icebergTransactionManager would always return IcebergXXXMetadata so returning IcebergAbstractMetadata instead of ConnectorMetadata is not very necessary. If we make it just return a ConnectorMetadata, then the getMetadata() method can have the same signature and be put to the base class.

partitionManager was also only used once.

HivePartitionResult hivePartitionResult = partitionManager.getPartitions(metastore, tableHandle, constraint, session);

Similar to getMetadata(), we can have an abstract method in the base class, depending on how you express the Iceberg partition result. If that way doesn't work we can also break filterPushdown into multiple sections, with most sections having the same content.

So please go ahead and extract common base class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yingsu00 I have refactored the filter pushdown and introduced a new abstract class.

@imjalpreet
Copy link
Member Author

imjalpreet commented Nov 27, 2023

@yingsu00 I have pushed all the changes apart from the test cases you just mentioned. The branch is also rebased on master now, merge conflicts have been resolved.

Update: Additional test cases have also been added.

@imjalpreet imjalpreet force-pushed the icebergFilterPushdown branch 2 times, most recently from e177934 to 2e689fe Compare November 27, 2023 18:06
@imjalpreet imjalpreet force-pushed the icebergFilterPushdown branch from 2e689fe to 4d98884 Compare November 28, 2023 16:10
Comment on lines 119 to 122
protected final IcebergResourceFactory resourceFactory;
protected final HdfsEnvironment hdfsEnvironment;
protected final TypeManager typeManager;
protected IcebergTransactionManager icebergTransactionManager;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these fields protected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
protected final IcebergResourceFactory resourceFactory;
protected final HdfsEnvironment hdfsEnvironment;
protected final TypeManager typeManager;
protected IcebergTransactionManager icebergTransactionManager;
protected final IcebergResourceFactory resourceFactory;
protected final HdfsEnvironment hdfsEnvironment;
protected final TypeManager typeManager;
protected final IcebergTransactionManager icebergTransactionManager;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, just realized we can keep them private as well. I have made the change for all the protected fields.

Comment on lines 75 to 78
protected final IcebergResourceFactory resourceFactory;
protected final HdfsEnvironment hdfsEnvironment;
protected final TypeManager typeManager;
protected IcebergTransactionManager icebergTransactionManager;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why protected?

Copy link
Member Author

@imjalpreet imjalpreet Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been removed

protected final IcebergResourceFactory resourceFactory;
protected final HdfsEnvironment hdfsEnvironment;
protected final TypeManager typeManager;
protected IcebergTransactionManager icebergTransactionManager;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems icebergTransactionManager doesn't need to be persisted as a class member.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to persist it here since we need to pass it to SubfieldExtractionRewriter

protected final IcebergResourceFactory resourceFactory;
protected final HdfsEnvironment hdfsEnvironment;
protected final TypeManager typeManager;
protected IcebergTransactionManager icebergTransactionManager;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. icebergTransactionManager doesn't need to be persisted as a class member.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I have removed the class member here.

imjalpreet and others added 7 commits November 29, 2023 11:13
Filter Pushdown is only supported with Native Worker
…Filter Pushdown

1. Add new fields in IcebergTableLayoutHandle and IcebergColumnHandle required for Filter Pushdown
2. Remove tupleDomain from IcebergTableLayoutHandle and instead use domainPredicate
3. Refactor IcebergTableLayoutHandle and IcebergColumnHandle to extend the Base classes
4. Add new utility methods for computing partition columns
@imjalpreet imjalpreet force-pushed the icebergFilterPushdown branch from 4d98884 to 89edaf4 Compare November 29, 2023 05:53
Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple other nits otherwise LGTM

{
Table icebergTable;
if (metadata instanceof IcebergHiveMetadata) {
ExtendedHiveMetastore metastore = ((IcebergHiveMetadata) metadata).getMetastore();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add upfront validation that these casts are correct via checkArgument

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a validation for both metadata and tableHandle

private final IcebergResourceFactory resourceFactory;
private final HdfsEnvironment hdfsEnvironment;
private final TypeManager typeManager;
private final IcebergTransactionManager icebergTransactionManager;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

icebergTransactionManager is still here. Is it not removable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yingsu00 yes, it is being used in the method com.facebook.presto.iceberg.optimizer.IcebergFilterPushdown#optimize

@imjalpreet imjalpreet force-pushed the icebergFilterPushdown branch from 89edaf4 to c1ec0ff Compare December 4, 2023 05:56
@yingsu00 yingsu00 merged commit d865192 into prestodb:master Dec 4, 2023
56 checks passed
@mbasmanova
Copy link
Contributor

@yingsu00 @tdcmeehan Folks, we are seeing build failures in Meta and after fixing these we see crashes. Wondering, if you could summarize the changes to help us figure out what's going on.

Stack: [0x00007f0344226000,0x00007f0344326000],  sp=0x00007f0344239ff8,  free space=79k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 8020 C2 java.util.regex.Pattern$LastNode.match(Ljava/util/regex/Matcher;ILjava/lang/CharSequence;)Z (45 bytes) @ 0x00007f032e3eea00 [0x00007f032e3ee9e0+0x20]
C  0x0000000000000000

@tdcmeehan
Copy link
Contributor

@mbasmanova the idea behind this change is we need similar filter pushdown logic for the Iceberg catalog as what we currently have in Hive. Because the code is rather large in the Hive connector, the idea is to extract it so it may be used for other Hive-adjacent connectors in the future, such as Delta and Hudi (the alternative would be to do such filter extraction and pushdown in the engine).

Can you share a more detailed error message?

@mbasmanova
Copy link
Contributor

@tdcmeehan Tim, thank you for these additional details. Here is all I have in terms of an error message:

com.facebook.presto.spi.PrestoException: statement is too large (stack overflow during analysis)
	at com.facebook.presto.execution.SqlQueryExecution.createLogicalPlanAndOptimize(SqlQueryExecution.java:576)
	at com.facebook.presto.execution.SqlQueryExecution.start(SqlQueryExecution.java:451)
	at com.facebook.presto.$gen.Presto_0_286_SNAPSHOT_8fb73fc__0_286_20231219_012521_61____20231219_025901_1.run(Unknown Source)
	at com.facebook.presto.execution.SqlQueryManager.createQuery(SqlQueryManager.java:306)
	at com.facebook.presto.dispatcher.LocalDispatchQuery.lambda$startExecution$8(LocalDispatchQuery.java:211)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.StackOverflowError
	at java.base/java.util.regex.Pattern$GroupHead.match(Pattern.java:4804)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4749)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4747)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4747)
	at java.base/java.util.regex.Pattern$BranchConn.match(Pattern.java:4713)
	at java.base/java.util.regex.Pattern$GroupTail.match(Pattern.java:4863)
	at java.base/java.util.regex.Pattern$BmpCharPropertyGreedy.match(Pattern.java:4344)
	at java.base/java.util.regex.Pattern$GroupHead.match(Pattern.java:4804)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4749)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4747)
	at java.base/java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3964)
	at java.base/java.util.regex.Pattern$Start.match(Pattern.java:3619)
	at java.base/java.util.regex.Matcher.search(Matcher.java:1729)
	at java.base/java.util.regex.Matcher.find(Matcher.java:773)
	at java.base/java.util.Formatter.parse(Formatter.java:2702)
	at java.base/java.util.Formatter.format(Formatter.java:2655)
	at java.base/java.util.Formatter.format(Formatter.java:2609)
	at java.base/java.lang.String.format(String.java:2897)
	at com.facebook.presto.hive.HiveColumnHandle.<init>(HiveColumnHandle.java:99)
	at com.facebook.presto.hive.HiveColumnHandle.<init>(HiveColumnHandle.java:119)
	at com.facebook.presto.hive.HiveUtil.getRegularColumnHandles(HiveUtil.java:974)
	at com.facebook.presto.hive.HiveUtil.hiveColumnHandles(HiveUtil.java:949)
	at com.facebook.presto.hive.HiveMetadata.getColumnHandles(HiveMetadata.java:811)
	at com.facebook.presto.hive.rule.BaseSubfieldExtractionRewriter.pushdownFilter(BaseSubfieldExtractionRewriter.java:243)
	at com.facebook.presto.hive.PrismFilterPushdown$SubfieldExtractionRewriter.getConnectorPushdownFilterResult(PrismFilterPushdown.java:125)

@mbasmanova
Copy link
Contributor

com.facebook.presto.spi.PrestoException: statement is too large (stack overflow during analysis)
	at com.facebook.presto.execution.SqlQueryExecution.createLogicalPlanAndOptimize(SqlQueryExecution.java:576)
	at com.facebook.presto.execution.SqlQueryExecution.start(SqlQueryExecution.java:451)
	at com.facebook.presto.$gen.Presto_0_286_SNAPSHOT_8fb73fc__0_286_20231219_012521_61____20231219_025901_1.run(Unknown Source)
	at com.facebook.presto.execution.SqlQueryManager.createQuery(SqlQueryManager.java:306)
	at com.facebook.presto.dispatcher.LocalDispatchQuery.lambda$startExecution$8(LocalDispatchQuery.java:211)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.StackOverflowError
	at java.base/java.util.regex.Pattern$GroupHead.match(Pattern.java:4804)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4749)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4747)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4747)
	at java.base/java.util.regex.Pattern$BranchConn.match(Pattern.java:4713)
	at java.base/java.util.regex.Pattern$GroupTail.match(Pattern.java:4863)
	at java.base/java.util.regex.Pattern$BmpCharPropertyGreedy.match(Pattern.java:4344)
	at java.base/java.util.regex.Pattern$GroupHead.match(Pattern.java:4804)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4749)
	at java.base/java.util.regex.Pattern$Branch.match(Pattern.java:4747)
	at java.base/java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3964)
	at java.base/java.util.regex.Pattern$Start.match(Pattern.java:3619)
	at java.base/java.util.regex.Matcher.search(Matcher.java:1729)
	at java.base/java.util.regex.Matcher.find(Matcher.java:773)
	at java.base/java.util.Formatter.parse(Formatter.java:2702)
	at java.base/java.util.Formatter.format(Formatter.java:2655)
	at java.base/java.util.Formatter.format(Formatter.java:2609)
	at java.base/java.lang.String.format(String.java:2897)
	at com.facebook.presto.hive.HiveColumnHandle.<init>(HiveColumnHandle.java:99)
	at com.facebook.presto.hive.HiveColumnHandle.<init>(HiveColumnHandle.java:119)
	at com.facebook.presto.hive.HiveUtil.getRegularColumnHandles(HiveUtil.java:974)
	at com.facebook.presto.hive.HiveUtil.hiveColumnHandles(HiveUtil.java:949)
	at com.facebook.presto.hive.HiveMetadata.getColumnHandles(HiveMetadata.java:811)
	at com.facebook.presto.hive.rule.BaseSubfieldExtractionRewriter.pushdownFilter(BaseSubfieldExtractionRewriter.java:243)
	at com.facebook.presto.hive.PrismFilterPushdown$SubfieldExtractionRewriter.getConnectorPushdownFilterResult(PrismFilterPushdown.java:125)
	at com.facebook.presto.hive.rule.BaseSubfieldExtractionRewriter.pushdownFilter(BaseSubfieldExtractionRewriter.java:259)
	at com.facebook.presto.hive.PrismFilterPushdown$SubfieldExtractionRewriter.getConnectorPushdownFilterResult(PrismFilterPushdown.java:125)
	at com.facebook.presto.hive.rule.BaseSubfieldExtractionRewriter.pushdownFilter(BaseSubfieldExtractionRewriter.java:259)
	at com.facebook.presto.hive.PrismFilterPushdown$SubfieldExtractionRewriter.getConnectorPushdownFilterResult(PrismFilterPushdown.java:125)
	at com.facebook.presto.hive.rule.BaseSubfieldExtractionRewriter.pushdownFilter(BaseSubfieldExtractionRewriter.java:259)
	at com.facebook.presto.hive.PrismFilterPushdown$SubfieldExtractionRewriter.getConnectorPushdownFilterResult(PrismFilterPushdown.java:125)

The last lines repeat a million times it seems.

@tdcmeehan
Copy link
Contributor

Would it be possible to share the implementation of PrismFilterPushdown?

@mbasmanova
Copy link
Contributor

@tdcmeehan Sure. Let me know if you want to see the whole file.

        @Override
        public ConnectorPushdownFilterResult getConnectorPushdownFilterResult(
                Map<String, ColumnHandle> columnHandles,
                ConnectorMetadata connectorMetadata,
                ConnectorSession session,
                RemainingExpressions remainingExpressions,
                DomainTranslator.ExtractionResult<Subfield> decomposedFilter,
                RowExpression optimizedRemainingExpression,
                Constraint<ColumnHandle> constraint,
                Optional<ConnectorTableLayoutHandle> currentLayoutHandle,
                ConnectorTableHandle tableHandle)
        {
            try {
                BaseSubfieldExtractionRewriter.ConnectorPushdownFilterResult connectorPushdownFilterResult = super.pushdownFilter(
                        session,
                        connectorMetadata,
                        tableHandle,
                        optimizedRemainingExpression,
                        currentLayoutHandle);
                return pushdownFilterWithBucketInfo(connectorPushdownFilterResult, session, metadata, tableHandle);
            }
            catch (RuntimeException e) {
                throw rewriteMetastoreException(e);
            }
        }

@tdcmeehan
Copy link
Contributor

Seems like super.pushdownFilter is returning to BaseSubfieldExtractionRewriter.pushdownFilter, causing infinite recursion. Does the Prism connector need to override this behavior at all or can it just use Hive?

@mbasmanova
Copy link
Contributor

@tdcmeehan This is a good point.

CC: @shrinidhijoshi

@wanglinsong wanglinsong mentioned this pull request Feb 12, 2024
64 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants