polygon and point join [new feature] #75

merlintang · 2016-12-07T22:29:27Z

Magellan project support the join between polygons and points, and join relationship can be inside, intersect or other. Users are interest in this feature, since this polygon can represent one county, and points can be the ubers, the join results can show which county is more visited and etc.

The current Simba does not support this, does any plan for this function? or any proposal or issue a JERA to track this feature?

dongx-psu · 2016-12-07T23:54:47Z

I think it is a crucial feature, and I try to take some time think about it.

The core problem in this context is actually how to partition a group of polygons. If there is no assumption on the data set, polygons can be very general and easily overflow the heap memory of an executor. Advanced load balancing strategy need to be designed.

Besides, current polygon utilities are relied on JTS 1.14, which probably too heavy to deal with something as simple as contains and intersect predicates.

merlintang · 2016-12-08T00:33:15Z

consider the USA county example, polygons are not heavy skew distributions and without strange polygons cover whole place , the load balance would not be that bad.

we can rewrite the contains and intersect functions as magellan.

dongx-psu · 2016-12-20T18:49:40Z

One thing I think we can do for this feature is implementing a predicate called intersects, and then do Cartesian product + filtering. It is not fast but at least it will work. What do you think?

merlintang · 2016-12-20T19:01:18Z

yes, we can do it via this way, but the cartesian product is too bad in the spark as magellan.

one thing come to mind is following.
suppose we have two tables
(1) outer table is the polygon (2) inner table is the points.
when we do this join,
(a) partition the space based on the inner table
(b) use the partitioner of the step 1 to duplicate the outer table based on overlapping
(c) zip operation and nest loop.
(d) reduce step to filter the duplicate results.
this is very basic, could be better than the cartesian.

ricosfeifei · 2016-12-21T07:36:56Z

Spatial join is a high priority on my ToDo list. once we are done with the trajectory feature that we are currently working on, we will start working on Spatial join (of arbitrary geometry shapes).

…

On Tue, Dec 20, 2016 at 12:01 PM, Mingjie Tang ***@***.***> wrote: yes, we can do it via this way, but the cartesian product is too bad in the spark as magellan. one thing come to mind is following. suppose we have two tables (1) outer table is the polygon (2) inner table is the points. when we do this join, (a) partition the space based on the inner table (b) use the partitioner of the step 1 to duplicate the outer table based on overlapping (c) zip operation and nest loop. (d) reduce step to filter the duplicate results. this is very basic, could be better than the cartesian. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#75 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEX3p4tKitGXGqCOj27C59vB2GlQwVzBks5rKCX_gaJpZM4LHMt9> .

merlintang · 2016-12-21T07:54:22Z

Great! I also can come to this task later, because I have other emergency things to do it right now. Once I have done, I will ping you, then we can allocate the workload evenly. On Tue, Dec 20, 2016 at 11:36 PM, Feifei Li <[email protected]> wrote:

…

Spatial join is a high priority on my ToDo list. once we are done with the trajectory feature that we are currently working on, we will start working on Spatial join (of arbitrary geometry shapes). On Tue, Dec 20, 2016 at 12:01 PM, Mingjie Tang ***@***.***> wrote: > yes, we can do it via this way, but the cartesian product is too bad in > the spark as magellan. > > one thing come to mind is following. > suppose we have two tables > (1) outer table is the polygon (2) inner table is the points. > when we do this join, > (a) partition the space based on the inner table > (b) use the partitioner of the step 1 to duplicate the outer table based > on overlapping > (c) zip operation and nest loop. > (d) reduce step to filter the duplicate results. > this is very basic, could be better than the cartesian. > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#75 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ AEX3p4tKitGXGqCOj27C59vB2GlQwVzBks5rKCX_gaJpZM4LHMt9> > . > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#75 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABXY-TSlQLt0vIgDKcDyUp_BoYpuyEacks5rKNcYgaJpZM4LHMt9> .

ricosfeifei · 2016-12-21T08:03:41Z

That sounds great! We expect to start working on this in early Spring. On Wed, Dec 21, 2016 at 12:54 AM, Mingjie Tang <[email protected]> wrote:

…

Great! I also can come to this task later, because I have other emergency things to do it right now. Once I have done, I will ping you, then we can allocate the workload evenly. On Tue, Dec 20, 2016 at 11:36 PM, Feifei Li ***@***.***> wrote: > Spatial join is a high priority on my ToDo list. once we are done with the > trajectory feature that we are currently working on, we will start working > on Spatial join (of arbitrary geometry shapes). > > On Tue, Dec 20, 2016 at 12:01 PM, Mingjie Tang ***@***.*** > > wrote: > > > yes, we can do it via this way, but the cartesian product is too bad in > > the spark as magellan. > > > > one thing come to mind is following. > > suppose we have two tables > > (1) outer table is the polygon (2) inner table is the points. > > when we do this join, > > (a) partition the space based on the inner table > > (b) use the partitioner of the step 1 to duplicate the outer table based > > on overlapping > > (c) zip operation and nest loop. > > (d) reduce step to filter the duplicate results. > > this is very basic, could be better than the cartesian. > > > > — > > You are receiving this because you are subscribed to this thread. > > Reply to this email directly, view it on GitHub > > <#75 (comment) >, > > or mute the thread > > <https://github.com/notifications/unsubscribe-auth/ > AEX3p4tKitGXGqCOj27C59vB2GlQwVzBks5rKCX_gaJpZM4LHMt9> > > . > > > > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#75 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe- auth/ABXY-TSlQLt0vIgDKcDyUp_BoYpuyEacks5rKNcYgaJpZM4LHMt9> > . > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#75 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEX3pwaCEdLdkyp3R6q6T4KN7ICS8b5Rks5rKNswgaJpZM4LHMt9> .

mithdrann · 2017-12-22T10:08:49Z

Hi, did you start working on this feature?. I am working on a comparative study of spatial data management tools based on the spatial join operation between points and polygons (within, contains, ...) and I would like to add Simba to this work.

dongx-psu · 2017-12-28T01:03:27Z

Sorry for the late response. We did not implement it yet in the system. We can implement a simple version use the same algorithms (which is most likely SJMR) adopted by other major systems. I believe the main difference on performance then will be on detailed system design choices.

mithdrann · 2017-12-28T12:29:12Z

Thank you for your answer. I agree with you, system design choices can make the difference. Even implementation choices can do it.
It would be nice to have this simple version but I only have 3 weeks to finish this work. I wonder whether you could implement it before this deadline.

merlintang changed the title ~~polygon and point join~~ polygon and point join [new feature] Dec 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

polygon and point join [new feature] #75

polygon and point join [new feature] #75

merlintang commented Dec 7, 2016

dongx-psu commented Dec 7, 2016

merlintang commented Dec 8, 2016

dongx-psu commented Dec 20, 2016

merlintang commented Dec 20, 2016

ricosfeifei commented Dec 21, 2016 via email

merlintang commented Dec 21, 2016 via email

ricosfeifei commented Dec 21, 2016 via email

mithdrann commented Dec 22, 2017

dongx-psu commented Dec 28, 2017

mithdrann commented Dec 28, 2017

polygon and point join [new feature] #75

polygon and point join [new feature] #75

Comments

merlintang commented Dec 7, 2016

dongx-psu commented Dec 7, 2016

merlintang commented Dec 8, 2016

dongx-psu commented Dec 20, 2016

merlintang commented Dec 20, 2016

ricosfeifei commented Dec 21, 2016 via email

merlintang commented Dec 21, 2016 via email

ricosfeifei commented Dec 21, 2016 via email

mithdrann commented Dec 22, 2017

dongx-psu commented Dec 28, 2017

mithdrann commented Dec 28, 2017