-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update pyramid tile image #685
Comments
Hey, @mingnet! As I mentioned in your other issue, I'm really sorry about responding to your issue so late 🙁 The reason you're getting that error is because The most straightforward way around this would be to read all of your files a once and then save them together as one layer. I know you said that you're working with a small cluster, but if you can show me the script you're using as well as give me some info about your cluster, I may be able to point out places where you could improve the performance. I think we should try this first before going to the next alternative (which is more involved/complicated). |
I have tried another solution, but I still have some problems. Maybe you are interested to know. I tried to generate a global KeyBounds at the beginning. I am writing another function in the file(./geopyspark-backend/geotrellis/src/main/scala/geopyspark/geotrellis/io/LayerWriterWrapper.scala) def writeSpatialGlobal(
layerName: String,
spatialRDD: TiledRasterLayer[SpatialKey],
indexStrategy: String
): Unit = {
val id =
spatialRDD.zoomLevel match {
case Some(zoom) => LayerId(layerName, zoom)
case None => LayerId(layerName, 0)
}
val indexKeyBounds = KeyBounds[SpatialKey](SpatialKey(0, 0), SpatialKey(spatialRDD.rdd.metadata.layout.layoutCols, spatialRDD.rdd.metadata.layout.layoutRows))
val indexMethod = getSpatialIndexMethod(indexStrategy)
val keyIndex = indexMethod.createIndex(indexKeyBounds)
layerWriter.write(id, spatialRDD.rdd, keyIndex)
}
I plan to call this function when processing the first batch. This has a global KeyBounds. Then update the data of other batches. But this function is very very slow to execute. As a result, I was very difficult to finish the first batch. Because I don't know enough about geotrellis. So I don't understand why. Just generate a different index. I think it should be as fast as the writeSpatial function. |
@mingnet I see. Based on the work you showed, it looks like everything should work okay. What backend are you trying to write to? There can be a lot of I/O involved for some of them, which could greatly increase the running time. Other than what I just mentioned, there could be other causes for slowdown, but I won't be able to say for sure without seeing your Python code. |
I am trying to ingest a batch of large tiff images. And my spark cluster doesn't have a lot of memory and resources. So I tried to ingest images in multiple batches
I plan to generate a pyramid of the first tiff image and then write it to disk. Then generate a pyramid of the second tiff image and update to the same directory
I am trying to add an update code
./geopyspark-backend/geotrellis/src/main/scala/geopyspark/geotrellis/io/LayerWriterWrapper.scala
./geopyspark/geotrellis/catalog.py
Then I ingest the data like this.
Then run an error and prompt
What should I do, what good advice?
The text was updated successfully, but these errors were encountered: