Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError on tile_to_layout for old SpaceTime data #705

Open
gauchm opened this issue Mar 10, 2019 · 0 comments
Open

ValueError on tile_to_layout for old SpaceTime data #705

gauchm opened this issue Mar 10, 2019 · 0 comments

Comments

@gauchm
Copy link

gauchm commented Mar 10, 2019

I create a RasterLayer of type SPACETIME like so:

temporal_projected_extent = gps.TemporalProjectedExtent(extent=extent, proj4=crs, instant=datetime.datetime(1955,1,4))
tile = gps.Tile.from_numpy_array(var_data_at_instant, no_data_value)
tiles = [(temporal_projected_extent, tile)]

rdd = spark_ctx.parallelize(tiles)
raster_layer = gps.RasterLayer.from_numpy_rdd(layer_type=gps.LayerType.SPACETIME, numpy_rdd=rdd)

When running

tiled_raster_layer = raster_layer.tile_to_layout(gps.LocalLayout(y, x))

I get an exception:

2019-03-10 17:05:43 ERROR Executor:91 - Exception in task 2.0 in stage 2.0 (TID 10)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "C:\Users\me\spark\python\lib\pyspark.zip\pyspark\worker.py", line 376, in main
  File "C:\Users\me\spark\python\lib\pyspark.zip\pyspark\worker.py", line 371, in process
  File "C:\Users\me\spark\python\lib\pyspark.zip\pyspark\serializers.py", line 142, in dump_stream
    self._write_with_length(obj, stream)
  File "C:\Users\me\spark\python\lib\pyspark.zip\pyspark\serializers.py", line 152, in _write_with_length
    serialized = self.dumps(obj)
  File "C:\Users\me\AppData\Local\Programs\Python\Python36\lib\site-packages\geopyspark\geotrellis\protobufserializer.py", line 75, in dumps
    return self._dumps(obj)
  File "C:\Users\me\AppData\Local\Programs\Python\Python36\lib\site-packages\geopyspark\geotrellis\protobufserializer.py", line 56, in _dumps
    return self.encoding_method(obj)
  File "C:\Users\me\AppData\Local\Programs\Python\Python36\lib\site-packages\geopyspark\geotrellis\protobufcodecs.py", line 650, in tuple_encoder
    tup.temporalProjectedExtent.CopyFrom(to_pb_temporal_projected_extent(obj[0]))
  File "C:\Users\me\AppData\Local\Programs\Python\Python36\lib\site-packages\geopyspark\geotrellis\protobufcodecs.py", line 553, in to_pb_temporal_projected_extent
    tpex.instant = _convert_to_unix_time(obj.instant)
ValueError: Value out of range: -473126400000
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:452)
	at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:588)
	at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:571)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:406)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
	at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
	at org.apache.spark.scheduler.Task.run(Task.scala:121)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)

The problem seems to be that geopyspark converts the date to Milliseconds since 1970 (which is -473126400000), and this value is too large.

Running the same code on the same rdd but with instant e.g. 1980 works just fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant