-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
polygon and point join [new feature] #75
Comments
I think it is a crucial feature, and I try to take some time think about it. The core problem in this context is actually how to partition a group of polygons. If there is no assumption on the data set, polygons can be very general and easily overflow the heap memory of an executor. Advanced load balancing strategy need to be designed. Besides, current polygon utilities are relied on JTS 1.14, which probably too heavy to deal with something as simple as |
consider the USA county example, polygons are not heavy skew distributions and without strange polygons cover whole place , the load balance would not be that bad. we can rewrite the contains and intersect functions as magellan. |
One thing I think we can do for this feature is implementing a predicate called intersects, and then do Cartesian product + filtering. It is not fast but at least it will work. What do you think? |
yes, we can do it via this way, but the cartesian product is too bad in the spark as magellan. one thing come to mind is following. |
Spatial join is a high priority on my ToDo list. once we are done with the
trajectory feature that we are currently working on, we will start working
on Spatial join (of arbitrary geometry shapes).
…On Tue, Dec 20, 2016 at 12:01 PM, Mingjie Tang ***@***.***> wrote:
yes, we can do it via this way, but the cartesian product is too bad in
the spark as magellan.
one thing come to mind is following.
suppose we have two tables
(1) outer table is the polygon (2) inner table is the points.
when we do this join,
(a) partition the space based on the inner table
(b) use the partitioner of the step 1 to duplicate the outer table based
on overlapping
(c) zip operation and nest loop.
(d) reduce step to filter the duplicate results.
this is very basic, could be better than the cartesian.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#75 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEX3p4tKitGXGqCOj27C59vB2GlQwVzBks5rKCX_gaJpZM4LHMt9>
.
|
Great!
I also can come to this task later, because I have other emergency things
to do it right now. Once I have done, I will ping you, then we can allocate
the workload evenly.
On Tue, Dec 20, 2016 at 11:36 PM, Feifei Li <[email protected]>
wrote:
… Spatial join is a high priority on my ToDo list. once we are done with the
trajectory feature that we are currently working on, we will start working
on Spatial join (of arbitrary geometry shapes).
On Tue, Dec 20, 2016 at 12:01 PM, Mingjie Tang ***@***.***>
wrote:
> yes, we can do it via this way, but the cartesian product is too bad in
> the spark as magellan.
>
> one thing come to mind is following.
> suppose we have two tables
> (1) outer table is the polygon (2) inner table is the points.
> when we do this join,
> (a) partition the space based on the inner table
> (b) use the partitioner of the step 1 to duplicate the outer table based
> on overlapping
> (c) zip operation and nest loop.
> (d) reduce step to filter the duplicate results.
> this is very basic, could be better than the cartesian.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#75 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/
AEX3p4tKitGXGqCOj27C59vB2GlQwVzBks5rKCX_gaJpZM4LHMt9>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#75 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABXY-TSlQLt0vIgDKcDyUp_BoYpuyEacks5rKNcYgaJpZM4LHMt9>
.
|
That sounds great! We expect to start working on this in early Spring.
On Wed, Dec 21, 2016 at 12:54 AM, Mingjie Tang <[email protected]>
wrote:
… Great!
I also can come to this task later, because I have other emergency things
to do it right now. Once I have done, I will ping you, then we can allocate
the workload evenly.
On Tue, Dec 20, 2016 at 11:36 PM, Feifei Li ***@***.***>
wrote:
> Spatial join is a high priority on my ToDo list. once we are done with
the
> trajectory feature that we are currently working on, we will start
working
> on Spatial join (of arbitrary geometry shapes).
>
> On Tue, Dec 20, 2016 at 12:01 PM, Mingjie Tang ***@***.***
>
> wrote:
>
> > yes, we can do it via this way, but the cartesian product is too bad in
> > the spark as magellan.
> >
> > one thing come to mind is following.
> > suppose we have two tables
> > (1) outer table is the polygon (2) inner table is the points.
> > when we do this join,
> > (a) partition the space based on the inner table
> > (b) use the partitioner of the step 1 to duplicate the outer table
based
> > on overlapping
> > (c) zip operation and nest loop.
> > (d) reduce step to filter the duplicate results.
> > this is very basic, could be better than the cartesian.
> >
> > —
> > You are receiving this because you are subscribed to this thread.
> > Reply to this email directly, view it on GitHub
> > <#75 (comment)
>,
> > or mute the thread
> > <https://github.com/notifications/unsubscribe-auth/
> AEX3p4tKitGXGqCOj27C59vB2GlQwVzBks5rKCX_gaJpZM4LHMt9>
> > .
>
> >
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#75 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-
auth/ABXY-TSlQLt0vIgDKcDyUp_BoYpuyEacks5rKNcYgaJpZM4LHMt9>
> .
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#75 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEX3pwaCEdLdkyp3R6q6T4KN7ICS8b5Rks5rKNswgaJpZM4LHMt9>
.
|
Hi, did you start working on this feature?. I am working on a comparative study of spatial data management tools based on the spatial join operation between points and polygons (within, contains, ...) and I would like to add Simba to this work. |
Sorry for the late response. We did not implement it yet in the system. We can implement a simple version use the same algorithms (which is most likely SJMR) adopted by other major systems. I believe the main difference on performance then will be on detailed system design choices. |
Thank you for your answer. I agree with you, system design choices can make the difference. Even implementation choices can do it. |
Magellan project support the join between polygons and points, and join relationship can be inside, intersect or other. Users are interest in this feature, since this polygon can represent one county, and points can be the ubers, the join results can show which county is more visited and etc.
The current Simba does not support this, does any plan for this function? or any proposal or issue a JERA to track this feature?
The text was updated successfully, but these errors were encountered: