-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unnecessary RDD repartition if RDD is already indexed. [Improvement] #76
Comments
I agree with you on this ticket. It requires a physical planning strategy to handle this case. I think it would be fine to add some heuristics in |
since I can not fork the standalone branch, I can not send PR for this feature. Can you release the standalone version. |
How about your test on the current standalone version? Do you think it is ready to stand as the master branch? BTW, I think you can just fork the repo and it will fork all branches for you. You just need to switch them back. If you already have a forked repo, one solution is to set the main repo as a remote and pull from it then push it back to your forked repo. |
We are still testing the standalone version, yes, we need to postpone this
later.
Yelp.
The standard way like the spark, is that each dev fork the project, push to
the forked project, then send PR. Because the forked version do not have
the standalone version, this make a little different.
…On Wed, Dec 7, 2016 at 4:48 PM, Dong Xie ***@***.***> wrote:
How about your test on the current standalone version? Do you think it is
ready to stand as the master branch?
BTW, I think you can just fork the repo and it will fork all branches for
you. You just need to switch them back. If you already have a forked repo,
one solution is to set the main repo as a remote and pull from it then push
it back to your forked repo.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#76 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABXY-boVIR7khnlviRmLemj2D5RO3gFXks5rF1PJgaJpZM4LHRPl>
.
|
For the distance join like RDJSpark,
the left RDD is always repartitioned based on the STRPartition.
However, suppose that the left RDD is already indexed and partitioned, this redundant repartition is painful. how about we add function inside the STRPartition to check whether the index partitioner is existed or not? This can avoid the unnecessary shuffle cost.
The text was updated successfully, but these errors were encountered: