Skip to content

Conversation

soumyakanti3578
Copy link
Contributor

@soumyakanti3578 soumyakanti3578 commented Sep 8, 2025

What changes were proposed in this pull request?

Added a deserializer to convert JSON plans to logical plans (RelNodes)

Why are the changes needed?

While we can serialize a plan to JSON with explain cbo formatted, we didn't have a deserializer to convert back to a RelNode.

Does this PR introduce any user-facing change?

No

How was this patch tested?

mvn test -pl ql -Dtest=org.apache.hadoop.hive.ql.optimizer.calcite.TestRelPlanParser

Copy link

@soumyakanti3578 soumyakanti3578 changed the title [WIP] - DO NOT REVIEW - Deserializer hive 28197 HIVE-28197: Add deserializer to convert JSON plans to RelNodes Sep 24, 2025
@soumyakanti3578 soumyakanti3578 marked this pull request as ready for review September 24, 2025 17:46
Copy link
Member

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't fully went through the changes but sending a first batch of comments in order not to lose them. Let me finalize the review before starting making code changes. For comments that simply require an answer feel free to share your thoughts.

String enable = pk.isEnable_cstr()? "ENABLE": "DISABLE";
String validate = pk.isValidate_cstr()? "VALIDATE": "NOVALIDATE";
String rely = pk.isRely_cstr()? "RELY": "NORELY";
enableValidateRely.put(pk.getNn_name(), ImmutableList.of(enable, validate, rely));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change necessary?

Comment on lines +56 to +57
this(input.getCluster(), input.getTraitSet(), input.getInput(),
input.getBitSet("group"), input.getBitSetList("groups"), input.getAggregateCalls("aggs"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the following work?

Suggested change
this(input.getCluster(), input.getTraitSet(), input.getInput(),
input.getBitSet("group"), input.getBitSetList("groups"), input.getAggregateCalls("aggs"));
super(input);

If yes then can I do the same on the other RelNodes?

Comment on lines +118 to +129
public HiveMultiJoin(RelInput input) {
this(
input.getCluster(),
input.getInputs(),
input.getExpression("condition"),
input.getRowType("rowType"),
(List<Pair<Integer, Integer>>) input.get("getJoinInputsForHiveMultiJoin"),
(List<JoinRelType>) input.get("getJoinTypesForHiveMultiJoin"),
input.getExpressionList("filters")
);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we to modify this class? Normally we shouldn't need to serialize/deserialize MultiJoin expressions cause they never appear in the final plan.

Comment on lines +30 to +37
static Stream<RelNode> stream(RelNode node) {
return Stream.concat(
Stream.of(node),
node.getInputs()
.stream()
.flatMap(HiveRelNode::stream)
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep this we should add appropriate Javadoc. In addition, putting static methods in interfaces is not a good pattern; it is better to move it to a utility class.

Other than that the most common way to traverse RelNode tree is via visitor and shuttles so not sure if this kind of Stream based traversal is something that will be well adopted.

* @param t
* @return
*/
private long getMaxNulls(RexCall call, HiveTableScan t) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing the selectivity estimator?

RelPlanParser parser = new RelPlanParser(cluster, conf);
RelNode deserializedPlan = parser.parse(jsonPlan);
// Apply partition pruning to compute partition list in HiveTableScan
deserializedPlan = applyPartitionPruning(conf, deserializedPlan, cluster, planner);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the partition list? Can't we deserialize the plan without it?

// Apply partition pruning to compute partition list in HiveTableScan
deserializedPlan = applyPartitionPruning(conf, deserializedPlan, cluster, planner);
if (LOG.isDebugEnabled()) {
LOG.debug("Deserialized plan: \n{}", RelOptUtil.toString(deserializedPlan));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing logging from this API. Same reasons as the one mentioned before.

return null;
}

return HiveRelEnumTypes.toEnum(enumName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of HiveRelEnumTypes seems a bit of an overkill. Can't we simply create the instance directly and drop the entire RelEnumTypes copy?

Suggested change
return HiveRelEnumTypes.toEnum(enumName);
return HiveTableScanTrait.valueOf(enumName);

Comment on lines +368 to +370
if (enumName == null) {
return null;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there cases where we don't serialize the trait? Can we ever have null here?

}

JSONObject outJSONObject = new JSONObject(new LinkedHashMap<>());
outJSONObject.put("CBOPlan", serializeWithPlanWriter(plan, new HiveRelJsonImpl()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the extra wrapping attribute for "CBOPlan".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants