Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geometry Mismatch on World import in placex table #533

Open
aferrar opened this issue Feb 28, 2024 · 5 comments
Open

Geometry Mismatch on World import in placex table #533

aferrar opened this issue Feb 28, 2024 · 5 comments
Labels

Comments

@aferrar
Copy link

aferrar commented Feb 28, 2024

Describe the bug

While attempting to load world osm data into the placex table of a Nominatim docker installation using an external postgresql database, the process fails due to a mismatch between the SRID of the geometry being loaded and the SRID expected by the column in the table. The error reported is Geometry SRID (5350) does not match column SRID (4326), indicating that the geometries being imported are in a different spatial reference system than what the placex table expects.

Traceback (most recent call last):
File "/usr/local/bin/nominatim", line 12, in
exit(cli.nominatim(module_dir='/usr/local/lib/nominatim/module',
File "/usr/local/lib/nominatim/lib-python/nominatim/cli.py", line 225, in nominatim
return get_set_parser().run(**kwargs)
File "/usr/local/lib/nominatim/lib-python/nominatim/cli.py", line 121, in run
return args.command.run(args)
File "/usr/local/lib/nominatim/lib-python/nominatim/clicmd/setup.py", line 121, in run
database_import.load_data(args.config.get_libpq_dsn(), num_threads)
File "/usr/local/lib/nominatim/lib-python/nominatim/tools/database_import.py", line 222, in load_data
conn.wait()
File "/usr/local/lib/nominatim/lib-python/nominatim/db/async_connection.py", line 128, in wait
wait_select(self.conn)
File "/usr/lib/python3/dist-packages/psycopg2/extras.py", line 762, in wait_select
state = conn.poll()
psycopg2.errors.InvalidParameterValue: Geometry SRID (5350) does not match column SRID (4326)
CONTEXT: PL/pgSQL function placex_insert() line 15 at assignment

To Reproduce

docker run
-d
--name='Nominatim'
--net='bridge'
-e TZ="America/Time_Zone"
-e HOST_OS="Unraid"
-e HOST_HOSTNAME="Tower"
-e HOST_CONTAINERNAME="Nominatim"
-e 'PBF_PATH'='/gis/osm/planet/planet-latest.osm.pbf' <-- Taken from here
-e 'NOMINATIM_TOKENIZER'='icu'
-e 'NOMINATIM_DATABASE_DSN'='pgsql:dbname=nominatim;host=192.168.1.2;user=postgres;password=password;port=25432'
-e 'PGHOST'='192.168.1.2'
-e 'PGDATABASE'='nominatim'
-e 'PGUSER'='postgres'
-e 'PGPASSWORD'='password'
-e 'PGPORT'='25432'
-e 'REPLICATION_URL'='https://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/replication/day/'
-e 'IMPORT_US_POSTCODES'='true'
-e 'IMPORT_WIKIPEDIA'='true'
-e 'IMPORT_TIGER_ADDRESSES'='true'
-l net.unraid.docker.managed=dockerman
-l net.unraid.docker.webui='http://[IP]:[PORT:1865]'
-l net.unraid.docker.icon='https://avatars.githubusercontent.com/u/19649492?s=48&v=4'
-p '1865:8080/tcp'
-v '/mnt/user/gis/':'/gis':'rw'
-v '/mnt/user/gis/nominatim/flatnode':'/nominatim/flatnode':'rw'
--shm-size=64G 'mediagis/nominatim:4.3'

Expected behavior

The expected behavior was for the data to be loaded successfully into the placex table without encountering a SRID mismatch error. Geometries should align with the expected SRID (4326) of the placex table's geometry column.

Screenshots & Logs

Not applicable as no screenshots or additional logs were provided. The error message detailed in the bug report serves as the primary diagnostic output.

Desktop / Server (please complete the following information):

  • OS & Version: UnRAID 12.8
  • Docker Version: 24.0.9, build 2936816
  • Nominatim Version: 4.3

Additional context

The error indicates a likely issue with the data being imported or a possible misconfiguration of the database schema. Further investigation into the source data's SRID and the database schema setup for the spatial_reference table show no existing 5350 SRID.

Update

Thanks to mtmail, we were able to determine the issue was a corrupted pbf file. Thank you, mtmail!

@aferrar aferrar added the bug label Feb 28, 2024
@mtmail
Copy link
Contributor

mtmail commented Feb 28, 2024

Can you provide steps to reproduce including how you generated or where you got the data you imported?

Skip the AI generated output please.

@aferrar
Copy link
Author

aferrar commented Feb 28, 2024

Certainly, thank you for the reply and sorry for the AI stuff. I think this would summarize it nicely, please let me know if I can provide any additional information and thank you again for the support and this awesome project!

Please see below:

docker run
-d
--name='Nominatim'
--net='bridge'
-e TZ="America/Time_Zone"
-e HOST_OS="Unraid"
-e HOST_HOSTNAME="Tower"
-e HOST_CONTAINERNAME="Nominatim"
-e 'PBF_PATH'='/gis/osm/planet/planet-latest.osm.pbf' <-- Taken from here
-e 'NOMINATIM_TOKENIZER'='icu'
-e 'NOMINATIM_DATABASE_DSN'='pgsql:dbname=nominatim;host=192.168.1.2;user=postgres;password=password;port=25432'
-e 'PGHOST'='192.168.1.2'
-e 'PGDATABASE'='nominatim'
-e 'PGUSER'='postgres'
-e 'PGPASSWORD'='password'
-e 'PGPORT'='25432'
-e 'REPLICATION_URL'='https://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/replication/day/'
-e 'IMPORT_US_POSTCODES'='true'
-e 'IMPORT_WIKIPEDIA'='true'
-e 'IMPORT_TIGER_ADDRESSES'='true'
-l net.unraid.docker.managed=dockerman
-l net.unraid.docker.webui='http://[IP]:[PORT:1865]'
-l net.unraid.docker.icon='https://avatars.githubusercontent.com/u/19649492?s=48&v=4'
-p '1865:8080/tcp'
-v '/mnt/user/gis/':'/gis':'rw'
-v '/mnt/user/gis/nominatim/flatnode':'/nominatim/flatnode':'rw'
--shm-size=64G 'mediagis/nominatim:4.3'

@mtmail
Copy link
Contributor

mtmail commented Feb 28, 2024

This error shouldn't happen and it's the first time I've seen a projection issue anywhere in Nominatim. What this import step does is read rows from the place table and inserts them into the placex table. In the database there's a trigger attached to the placex table that runs some logic to reformat the place, lookup it's hierarchy. If you noticed that your CPU(s) go to 100% that's the logic that causes this. You can find it in https://github.com/osm-search/Nominatim/blob/master/lib-sql/functions/placex_triggers.sql

But both place and placex table only have geometry columns with SRID 4326 set. You can verify that by looking at the table schemas

\d place
                          Table "public.place"
   Column    |          Type           | Collation | Nullable | Default
-------------+-------------------------+-----------+----------+---------
 osm_type    | character(1)            |           | not null |
 osm_id      | bigint                  |           | not null |
[...]
 geometry    | geometry(Geometry,4326) |           | not null |

\d placex
                             Table "public.placex"
     Column      |            Type             | Collation | Nullable | Default
-----------------+-----------------------------+-----------+----------+---------
 place_id        | bigint                      |           | not null |
 parent_place_id | bigint                      |           |          |
 linked_place_id | bigint                      |           |          |
[...]
 geometry        | geometry(Geometry,4326)     |           | not null |
[...]
 centroid        | geometry(Geometry,4326)     |           |          |

I think we can rule out that you manually changed the table schema. Does a 5350 SRID even exist? I can't find any information and SELECT * FROM spatial_ref_sys WHERE srid = 5350; returns empty for me.

What do the following queries return for you?

select count(*) from place; -- expect over 300 million
select count(*) from placex; -- unclear, depends where the import stopped

Possible the input data is corrupted. As first step check if the file size matches that on https://planet.openstreetmap.org/pbf/ As second step you can also compare the md5 checksum against the .pbf.md5 file. For example:

md5sum planet-240219.osm.pbf
fa4fce44b6de5b9d9fad9ad6e0bbf354

curl -L https://planet.openstreetmap.org/pbf/planet-240219.osm.pbf.md5
fa4fce44b6de5b9d9fad9ad6e0bbf354  planet-240219.osm.pbf

It's not impossible, just unlikely, that the database is corrupted somehow. I mean the data on disk.

@aferrar
Copy link
Author

aferrar commented Feb 28, 2024

Thank you for this awesome response. I've checked and 5350 isn't a valid SRID. I ran your count on the place table and returned 0 records. I assume placex will also yield zero. I'm currently running md5sum on my local pbf file and will compare it to the external one I curled from.

It could be the data got corrupted while curling? At this point, after confirming the checksum (and if it's valid), should I just blow away the nominatim db in my postgres instance and just try re-running the server? It took 36 hours to get to this error so it's quite a pain troubleshooting if it'll take the same amount of time on each troubleshooting step.

Thank you again, really appreciate your help!

Edit: Looks like a checksum mismatch so I assume the data got corrupted when I curled. I'll have to figured out what caused that. Is there a recommended curl command I should execute for the pbf download?

image

@mtmail
Copy link
Contributor

mtmail commented Feb 29, 2024

Indeed looks like the file got corrupted so best to start complete fresh. You said Monaco (I'm guessing 0.1% of the world) imported fine but I would try another country or US state (1-5% of the world) first before trying the world. While curl or the internet connection wouldn't been at fault it's also possible it's the harddrive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants