This repository has been archived by the owner on May 18, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Physical Plans (Java operators VS Scala operators VS raw SQL queries) #15
Comments
These query plans are from a group by where we count the number of lines for each level of Severity With Java Operators
=> The presence of a condition is because of the implementation of our API AggregateQuery() that can handle a condition. By default if no condition is given by the user, a True condition is generated. We can see here that this True condition generate extra steps. With Scala Operators
=> We can see that Spark had been able to optimize the query by removing the filter that is useless With a raw SQL query
=> We can notice that it is exactly the same physical plan that the one with Scala operators |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
We can look at the physical plans generated by the tranformations that are applied on our dataframes and see that the result is quite different depending on the operator that we are using. This can be an explanation about the performance gap that we can notice during the benchmarks.
These query plans are from a filter performing the operation Severity = 4
With Java Operators
=> We can see that the condition is not clearly understand by Spark, and so no optimization is available
With Scala Operators
=> We can notice that the condition is clearly understand by Spark
With a raw SQL query
=> We have the exact same result than with the scala operators
The text was updated successfully, but these errors were encountered: