Excessive memory usage with prequantization enabled #213

Zaczero · 2023-11-30T23:33:33Z

I am primarily posting this issue for future people facing a similar problem.

In my case, when the prequantize option is enabled (which is the default setting), the toposimplify method consumes 25GB of memory. However, when I disable the prequantize option, memory usage peaks at just 5GB. I utilize shapely for simplification.

Reproduction steps

Download both parts of the archive:
countries1.zip
countries2.zip
Combine the archives:

cat countries1.zip countries2.zip > countries.zip

Unzip it.
Execute the following Python code snippet:

with open('countries.geojson', 'rb') as f:
    features = json.load(f)['features']
countries_geoms = [shape(f['geometry']) for f in features]
topo = tp.Topology(countries_geoms)
topo.toposimplify(0.00001, inplace=True)

Monitor memory usage.
To resolve the issue, replace topo with:

topo = tp.Topology(countries_geoms, prequantize=False)

By the way, should prequantization be enabled by default? I personally find it odd that the library performs certain calculations by default, even if they don't apply to my use case and don't provide any benefit. I can only understand such default behavior if it benefits everyone. Otherwise, this should be an opt-in operation (the same as simplification is opt-in).

The text was updated successfully, but these errors were encountered:

mattijn · 2023-12-02T10:36:49Z

Thank you for raising the issue and it is great to see you find this package useful for your need!
Until now, speed has been the main bottleneck, but if we can reduce the memory footprint, that would be great too.
It's worth to profile the code to find the main culprit that is causing the memory to blow up.

Zaczero · 2023-12-02T11:52:25Z

🙂! If you are interested, I use this package to run https://github.com/Zaczero/osm-countries-geojson. It finally resolved the issue with overlaps/gaps produced during the simplification process. And now it's perfect!

mattijn · 2023-12-02T23:28:30Z

Thanks for showing your package! May I ask how the directed graph of networkx is being utilised for your use-case? That seems interesting!

I was looking to your referenced geojson and noticed at least two things that you might check.

it seems there is a (part of a) country missing near Morocco:

something is doing odd in the south of the Netherlands:

Again, thanks for reaching out!

Zaczero · 2023-12-02T23:51:40Z

This is simply the nature of OSM data. In regions of conflict, it's common to encounter such situations. Sometimes, you might even come across two countries at the same time:
This appears to be a bug with GitHub's GeoJSON visualizer. They seem to apply their own simplification for rendering. Here's how this location appears on OSM:

And this is how it looks when rendered locally (which is acceptable for such a high level of simplification):

I understand that the documentation for the countries generator is lacking. Essentially, the directed graph is utilized to reconstruct country polygons efficiently from split and randomly ordered line segments. OSM data does not store countries in predefined shapes but rather as a collection of lines. The directed graph (compared to undirected) improves performance by reducing the number of paths simple cycles has to traverse. Each node represents an intersection (lines endpoints), and each edge represents a line segment.

mattijn · 2023-12-03T00:05:38Z

Interesting! Halfway in the computation of a topology the line segments are also split where the order is not always clear. In the hashmap-step I use a _hash_order() to determine the order. Maybe I could have used a directed graph there as well.
Regarding 1), I can understand the claim of a single place by multiple countries, but I didn't expect a place not being claimed by any country.
Regarding 2), the OSM location seems to be OK, the border is a bit messy there. Maybe it's a glitch when zooming out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive memory usage with prequantization enabled #213

Excessive memory usage with prequantization enabled #213

Zaczero commented Nov 30, 2023 •

edited

Loading

mattijn commented Dec 2, 2023

Zaczero commented Dec 2, 2023 •

edited

Loading

mattijn commented Dec 2, 2023

Zaczero commented Dec 2, 2023 •

edited

Loading

mattijn commented Dec 3, 2023 •

edited

Loading

Excessive memory usage with prequantization enabled #213

Excessive memory usage with prequantization enabled #213

Comments

Zaczero commented Nov 30, 2023 • edited Loading

mattijn commented Dec 2, 2023

Zaczero commented Dec 2, 2023 • edited Loading

mattijn commented Dec 2, 2023

Zaczero commented Dec 2, 2023 • edited Loading

mattijn commented Dec 3, 2023 • edited Loading

Zaczero commented Nov 30, 2023 •

edited

Loading

Zaczero commented Dec 2, 2023 •

edited

Loading

Zaczero commented Dec 2, 2023 •

edited

Loading

mattijn commented Dec 3, 2023 •

edited

Loading