-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preparation for the coming Kudo support #11667
Preparation for the coming Kudo support #11667
Conversation
Some code refactor for easily adding in the Kudo support in Shuffle coalesce and join execs. Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuShuffleCoalesceExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuShuffleCoalesceExec.scala
Outdated
Show resolved
Hide resolved
…alesceExec.scala Co-authored-by: Renjie Liu <[email protected]>
…alesceExec.scala Co-authored-by: Renjie Liu <[email protected]>
Signed-off-by: Firestarman <[email protected]>
build |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuShuffleCoalesceExec.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuShuffleCoalesceExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuShuffleCoalesceExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuShuffleCoalesceExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuShuffleCoalesceExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuShuffleCoalesceExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuShuffleCoalesceExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuShuffleCoalesceExec.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits, but I would like a few more sets of eyes on this.
|
||
object CoalesceReadOption { | ||
def apply(conf: RapidsConf): CoalesceReadOption = { | ||
// TODO get the value from conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to have.
It will be quite easy to add in new fields that can be got from the conf
. And you do not need to update all the places where the CoalesceReadOption(conf)
is called.
metricsMap: Map[String, GpuMetric], | ||
prefetchFirstBatch: Boolean = false): Iterator[ColumnarBatch] = { | ||
val hostIter = if (readOption.kudoEnabled) { | ||
// TODO replace with the actual Kudo host iterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please throw an exception instead? If this is true we have problems and making the data disappear feels wrong to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, done
def getCoalescedBufferSize(concated: AnyRef): Long = concated match { | ||
// TODO add the Kudo case | ||
case c: HostConcatResult => c.getTableHeader.getDataLen | ||
case g => GpuColumnVector.getTotalDeviceMemoryUsed(g.asInstanceOf[ColumnarBatch]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we please have some better error messages here or something? I don't like that we have g.asInstanceOf
, when the match should do that for us. It also means that the error message if something goes wrong will just be a class cast exception instead of a cleaner message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is removed.
Thx for review, but I will drop this PR since it is not necessary for only the table operator part. @liurenjie1024 Seems it would be better to have a single PR for the Kudo support. |
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but it would be good to have @revans2 have a second look.
Signed-off-by: Firestarman <[email protected]>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, just a nit
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuShuffleCoalesceExec.scala
Show resolved
Hide resolved
@revans2 Could you help take a look ? Thanks. |
contributes to #11590
This is some code refactor for easily adding in the Kudo support in Shuffle coalesce and join execs. The main idea is to abstract the batch operations (
SerializedTableOperator
) from the coalesce read process.