Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert regular DF to simba supported geometries #87

Open
geoHeil opened this issue Mar 20, 2017 · 9 comments
Open

convert regular DF to simba supported geometries #87

geoHeil opened this issue Mar 20, 2017 · 9 comments

Comments

@geoHeil
Copy link

geoHeil commented Mar 20, 2017

Does simba have som UDF to support creation of a simbaDF out of a regular data frame? I.e. like magellans df.withColumn("point", point('x, 'y))

If I am required to manually map all points / polygons to simba Geometry, how can I represent additional fiels?
val ps = (0 until 10000).map(x => PointData(Point(Array(x.toDouble, x.toDouble)), x + 1)).toDS

How can I parse WKT polygons to a simba supported geometry format?

@dongx-psu
Copy link
Member

Theoretically, you can do anything supported by Spark SQL DataFrame to a Simba DataFrame. As Simba DataFrame inherits from that of Spark SQL.

To represent additional fields, you simply add them to your structure. For example, you can define:

case class PointData(x: Point, payload: Int, tag: String)

And Simba will be able to automatically detect its fields and build the data frame. It will give you a schema like:

-- DataFrame
|----- x : ShapeType
|----- payload : Integer
|----- tag : String

@geoHeil
Copy link
Author

geoHeil commented Mar 20, 2017

I see. And What about polygons? You seem to use Polygon.apply(Array(Point(Array(-1.0, -1.0)), Point(Array(1.0, -1.0)), If I have WKT polygon strings how could these be converted?

@geoHeil
Copy link
Author

geoHeil commented Mar 20, 2017

So assuming a Data frame with Polygons like below

case class MyClass(a:String, b:int, wktString:String)
val df = Seq(MyClass("a", 1, "POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))"), MyClass("b",2,"POLYGON ((30 1, 40 40, 20 50, 10 20, 30 1))")).toDS()
val dfGeom = df.map(x => Polygon.fromWKB(x.wktString.toCharArray.map(_.toByte)))

is this how the conversion is supposed to be?
As for me this will fail with a code generator exception when calling dfGeom.show

17/03/20 20:26:50 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 102, Column 31: Assignment conversion not possible from type "org.apache.spark.sql.simba.spatial.Shape" to type "org.apache.spark.sql.simba.spatial.Polygon"

@dongx-psu
Copy link
Member

I think you can try this:

case class MyClass(a:String, b:int, wktString:Polygon)
val df = Seq(("a", 1, "POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))"),("b",2,"POLYGON ((30 1, 40 40, 20 50, 10 20, 30 1))")).map(x => MyClass(x._1, x._2, Polygon.fromWKB(x._3.toCharArray.map(_.toByte)))).toDS()
df.show()

I don't know if it can work, but you can try.

@geoHeil
Copy link
Author

geoHeil commented Mar 20, 2017

This would fail with com.vividsolutions.jts.io.ParseException: Unknown WKB type 71 already when trying to parse the WKT.

@dongx-psu
Copy link
Member

Well, I think this is a parsing problem of JTS, which is out of my scope now. And just to remind, general geometric objects including polygons are still under development.

@geoHeil
Copy link
Author

geoHeil commented Mar 20, 2017

Would about:

def toPolygon(s:String, u:String):SPolygon = {
    @transient lazy val reader = new WKTReader()
    reader.read(s) match {
      case poly: Polygon => {
        poly.setUserData(u)
        SPolygon.fromJTSPolygon(poly)
      }
    }
  }
  val df = Seq(("a", 1, "POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))"),("b",2,"POLYGON ((30 1, 40 40, 20 50, 10 20, 30 1))")).map(x => MyClass(x._1, x._2, toPolygon(x._3, "foobar"))).toDS
df.show

not sure if it will join later on, but df.show works.

@dongx-psu
Copy link
Member

dongx-psu commented Mar 20, 2017

df.show() should work. There must be something wrong with my fromWKT function.

Nevertheless, I don't think it will work for joins since our current join algorithm does not support polygons, which is technically caused by no partitioner for polygons and it assumes the join keys will be evaluated as Point. This is coming from our legacy hacks for its original prototype (designed just for points). Still, I treat partitioning general geometry objects as a research problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants