-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-28197: Add deserializer to convert JSON plans to RelNodes #6067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
HIVE-28197: Add deserializer to convert JSON plans to RelNodes #6067
Conversation
a63b1f9
to
39b0ec0
Compare
8dad604
to
92b45f8
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't fully went through the changes but sending a first batch of comments in order not to lose them. Let me finalize the review before starting making code changes. For comments that simply require an answer feel free to share your thoughts.
String enable = pk.isEnable_cstr()? "ENABLE": "DISABLE"; | ||
String validate = pk.isValidate_cstr()? "VALIDATE": "NOVALIDATE"; | ||
String rely = pk.isRely_cstr()? "RELY": "NORELY"; | ||
enableValidateRely.put(pk.getNn_name(), ImmutableList.of(enable, validate, rely)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change necessary?
this(input.getCluster(), input.getTraitSet(), input.getInput(), | ||
input.getBitSet("group"), input.getBitSetList("groups"), input.getAggregateCalls("aggs")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the following work?
this(input.getCluster(), input.getTraitSet(), input.getInput(), | |
input.getBitSet("group"), input.getBitSetList("groups"), input.getAggregateCalls("aggs")); | |
super(input); |
If yes then can I do the same on the other RelNodes?
public HiveMultiJoin(RelInput input) { | ||
this( | ||
input.getCluster(), | ||
input.getInputs(), | ||
input.getExpression("condition"), | ||
input.getRowType("rowType"), | ||
(List<Pair<Integer, Integer>>) input.get("getJoinInputsForHiveMultiJoin"), | ||
(List<JoinRelType>) input.get("getJoinTypesForHiveMultiJoin"), | ||
input.getExpressionList("filters") | ||
); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we to modify this class? Normally we shouldn't need to serialize/deserialize MultiJoin
expressions cause they never appear in the final plan.
static Stream<RelNode> stream(RelNode node) { | ||
return Stream.concat( | ||
Stream.of(node), | ||
node.getInputs() | ||
.stream() | ||
.flatMap(HiveRelNode::stream) | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep this we should add appropriate Javadoc. In addition, putting static methods in interfaces is not a good pattern; it is better to move it to a utility class.
Other than that the most common way to traverse RelNode tree is via visitor and shuttles so not sure if this kind of Stream based traversal is something that will be well adopted.
* @param t | ||
* @return | ||
*/ | ||
private long getMaxNulls(RexCall call, HiveTableScan t) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we changing the selectivity estimator?
RelPlanParser parser = new RelPlanParser(cluster, conf); | ||
RelNode deserializedPlan = parser.parse(jsonPlan); | ||
// Apply partition pruning to compute partition list in HiveTableScan | ||
deserializedPlan = applyPartitionPruning(conf, deserializedPlan, cluster, planner); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the partition list? Can't we deserialize the plan without it?
// Apply partition pruning to compute partition list in HiveTableScan | ||
deserializedPlan = applyPartitionPruning(conf, deserializedPlan, cluster, planner); | ||
if (LOG.isDebugEnabled()) { | ||
LOG.debug("Deserialized plan: \n{}", RelOptUtil.toString(deserializedPlan)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing logging from this API. Same reasons as the one mentioned before.
return null; | ||
} | ||
|
||
return HiveRelEnumTypes.toEnum(enumName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of HiveRelEnumTypes
seems a bit of an overkill. Can't we simply create the instance directly and drop the entire RelEnumTypes
copy?
return HiveRelEnumTypes.toEnum(enumName); | |
return HiveTableScanTrait.valueOf(enumName); |
if (enumName == null) { | ||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there cases where we don't serialize the trait? Can we ever have null here?
} | ||
|
||
JSONObject outJSONObject = new JSONObject(new LinkedHashMap<>()); | ||
outJSONObject.put("CBOPlan", serializeWithPlanWriter(plan, new HiveRelJsonImpl())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the extra wrapping attribute for "CBOPlan".
What changes were proposed in this pull request?
Added a deserializer to convert JSON plans to logical plans (RelNodes)
Why are the changes needed?
While we can serialize a plan to JSON with
explain cbo formatted
, we didn't have a deserializer to convert back to a RelNode.Does this PR introduce any user-facing change?
No
How was this patch tested?