Skip to content
Soup edited this page May 8, 2017 · 13 revisions

Welcome to the tispark wiki!

For now, our skeleton code bypassed part of the original datasource API for aggressive pushdown calculations into tikv's coprocessor and we don't uses stable api of Filters and prunes traits since they seems too narrow for coprocessor.

For calculation pushdown, what we do is direct matching subplan for pattern like Aggregates->Filter/Projection->TiRelation (for now, limit and sort not considered at least for May for simplicity)

We extract Filter/Projections and translate it to

case AttributeReference -> ColumnReference for Coprocessor results

case Filter and Projection Expression -> test if supported by tikv and translate to proto expressions, for the remaining part generate a supplementary plan on the top of TiRelation

case Aggregate -> split into partial and spark aggregate while partial part is translated into proto request, and spark one is prepend on the top of Filter/Projection (if any, otherwise TiRelation directly); for any aggregates that cannot pushed, we need to fall all the way back to no aggregate-pushdown logic

Spark aggregates itself should have two phase as in original spark since our data is not partitioned via any hash key and we still need to have a shuffle stage. But one thing need to consider and might get optimizer involve is type promotion: for example, we need to promote sum(int) into sum(long) since coprocessor result is likely promoted avoiding overflow. This likely to be done in optimizer phase or we bypass original planning and do it our own.

Clone this wiki locally