Skip to content

Commit

Permalink
First commit
Browse files Browse the repository at this point in the history
  • Loading branch information
PierreMesure committed Apr 5, 2024
0 parents commit 2057bb3
Show file tree
Hide file tree
Showing 9 changed files with 22,185 additions and 0 deletions.
41 changes: 41 additions & 0 deletions .github/workflows/download_osm.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: Download OSM data and extract postcode data
on:
workflow_dispatch:
schedule:
- cron: '0 1 1 * *' # 1AM every 1st of the month
jobs:
download:
runs-on: ubuntu-latest
steps:
- name: Install osmium
run: |
sudo apt-get update
sudo apt-get install -y osmium-tool
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
cache: 'pip'
- run: pip install -r requirements.txt
- name: Download OSM data
run: |
mkdir -p ./data
wget http://download.openstreetmap.fr/extracts/europe/sweden-latest.osm.pbf -O ./data/sweden-latest.osm.pbf
- name: Filter OSM data
run: |
osmium tags-filter ./data/sweden-latest.osm.pbf nwr/addr:postcode -o ./data/extract.osm.pbf
- name: Convert the data
run: python convert_osm.py
- name: Commit the data
uses: nick-fields/retry@v2
with:
timeout_seconds: 10
max_attempts: 5
command: |
git config --global user.name 'Pierre Mesure (Github Actions)'
git config --global user.email '[email protected]'
git config --global rebase.autoStash true
git pull --rebase
git add ./data/osm_codes.csv
git commit -am "Update the data"
git push
30 changes: 30 additions & 0 deletions .github/workflows/download_postnord.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: Download Postnord data
on:
workflow_dispatch:
schedule:
- cron: '0 1 1 * *' # 1AM every 1st of the month
jobs:
download:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
cache: 'pip'
- run: pip install -r requirements.txt
- name: Fetch the data
run: python download_postnord.py
- name: Commit the data
uses: nick-fields/retry@v2
with:
timeout_seconds: 10
max_attempts: 5
command: |
git config --global user.name 'Pierre Mesure (Github Actions)'
git config --global user.email '[email protected]'
git config --global rebase.autoStash true
git pull --rebase
git add .
git commit -am "Update the data"
git push
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
data/*.pbf

60 changes: 60 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# pOSMkod

pOSMkod is a set of scripts to download Swedish postal code (*postkod*) data from OpenStreetMap (OSM, hence the name)

## What does it do?

A [first script](./download_postnord.py) downloads a updated list of all Swedish postal codes from one of Postnord's private APIs. This script is run every month using a [Github Action](.github/workflows/download_postnord.yml) and the data is pushed to the file [postnord_codes.csv](data/postnord_codes.csv).

[Another Github action](.github/workflows/download_osm.yml) downloads a dump of all Swedish data on OpenStreetMap and extracts all the objects that contain the `addr:postcode` property. It is run every month in the same way and the data is saved to the file [osm_codes.csv](data/osm_codes.csv).

These two lists are compared in order to see what proportion of the postal codes are present on OSM.

In the future, another script will attempt to draw boundaries for each postal code based on the objects associated to it.

## Prerequisites

This code has been designed to run using Github Actions but it is perfectly possible to run it locally.
You will need a recent version of Python (at least 3.9). You can install all its dependencies using:

```bash
pip install -r requirements.txt
```

In order to extract objects from an OSM data dump, you will need osmium, which can be install on Linux with:

```bash
apt install osmium-tool
```

## How to use

The first script can then be run using:

```bash
python download_postnord.py
```

To download OSM data, you can simply use wget:

```bash
wget https://download.geofabrik.de/europe/sweden-latest.osm.pbf -O ./data/sweden-latest.osm.pbf
```

When that is done, you can use [osmium](https://osmcode.org/osmium-tool/) to extract the only objects we need:

```bash
osmium tags-filter ./data/sweden-latest.osm.pbf nwr/addr:postcode -o ./data/extract.osm.pbf
```

Finally, the second python script will extract data from the .pbf file and save it as a table.

```bash
python convert_osm.py
```

## License

This code is licensed under AGPLv3.

The postal code data extracted from OpenStreetMap is licensed under [ODbl](https://www.openstreetmap.org/copyright). The license for Postnord's data is unclear.
24 changes: 24 additions & 0 deletions convert_osm.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
from pyrosm import OSM
import polars as pl
print('Loading OSM file...')
osm = OSM(filepath='./data/extract.osm.pbf')

print('Loading objects...')
buildings = osm.get_buildings()

extract = buildings[[
'name',
'addr:street',
'addr:housenumber',
'addr:postcode',
'addr:city',
'osm_type',
'geometry'
]]

print('Creating a list of unique postcodes')
osm_codes = extract['addr:postcode'].str.replace(' ', '').dropna().unique()
osm_codes = sorted([int(code) for code in osm_codes])
osm_codes = [code for code in osm_codes if code > 10000 and code < 98700]
print('Saving all objects as CSV...')
pl.DataFrame({ 'postcodes': osm_codes }).write_csv('./data/osm_codes.csv')
Loading

0 comments on commit 2057bb3

Please sign in to comment.