-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Perf thoughts #72
Comments
If the OGR bindings are available, I'm exporting the geometry as WKB exactextract/python/src/exactextract/feature.py Lines 15 to 16 in 53309e4
which is then read by GEOS: exactextract/python/src/pybindings/feature_bindings.cpp Lines 152 to 156 in 53309e4
I hadn't looked at it, but it appears that you can use it like this:
This isn't something I know much about, but I can l look into it. Having a benchmark would be helpful.
I'd be happy to discuss it. To be clear, the conversion happening in this package would only be OGR Geometry -> WKB -> GEOS, unless the user has decided not to use OGR. I didn't spend much time looking at conversion options for fiona and GeoPandas because it hasn't been a significant part of runtime in my testing. I'm guessing it's somehow unsafe to just grab the GEOS geometry pointers directly from a GeoPandas dataframe? |
I think this works on conda but maybe not pypi...I think PyGEOS used to do it with Shapely before the two merged. Shapely has a C API for this but I don't know the details of how/when to use it from another package. |
I believe this is only valid when you can ensure that the underlying GEOS libraries are exactly the same, but in general you can't know what version the other library is using. PyGEOS and Shapely used to have a warning when they checked that the versions of GEOS were not the same that conversions between the two would be slower. Presumably that meant serializing to WKB and deserializing. |
👋 Feel free to ignore me but was just reading through the code
You're currently using the OGR bindings or Fiona to load a GeoJSON FeatureCollection to Python. Have you looked at pyogrio at all? For loading entire files it can be easily 5-10x faster than fiona because it's vectorized instead of a python loop.
In my experience, I've found that serializing GeoJSON between Python objects and native code can be really slow. What would you think about an API that (optionally) returned indices into the passed-in objects instead of the features themselves so that users can skip the overhead of serializing the data back to Python?
It would be great to release the GIL while in pure C++ code, especially if you're able to pass a whole chunk of rasters/vectors at once to C++, so you aren't acquiring and releasing the GIL on every iteration of the loop. Looking at the pybind docs it says
so I currently your code always holds the GIL?
If you had interest, I'd love to discuss how exactextract could use GeoArrow. It's an exact candidate for why I think GeoArrow has so much potential: existing binary data in GEOS objects in GeoPandas or Shapely has to serialize to GeoJSON and the GEOS -> GeoJSON -> exactextract -> GeoJSON -> GEOS conversion is really slow. There's ongoing work to handle GeoArrow <--> GEOS interop in C/C++, which would simplify getting geometries back into GEOS for use in core exactextract.
The text was updated successfully, but these errors were encountered: