Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Overpass Turbo API as a source #54

Open
mnm-matin opened this issue Oct 6, 2024 · 2 comments
Open

Add Overpass Turbo API as a source #54

mnm-matin opened this issue Oct 6, 2024 · 2 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@mnm-matin
Copy link
Member

mnm-matin commented Oct 6, 2024

We want to add overpass turbo as a source for retrieving OSM data.

The current approach downloads large pbf files from Geofabrik, however, in most cases only a <1% fraction of the data is required.

There are three parts to this:

  • Augment the current region index made solely using geofabrik index to be more general.
  • Add a source toggle to allow users to pick geofabrik or overpass, including the use of the intermediate file format
  • Implement the Overpass API

Here is a snippet of the intermediate json file:

  • Data is split into Node, Way and Relation
  • Node has lonlat, whereas Way has a "ref" that points to the relevant Node.
{
    "Metadata":{
        "filter_date":"2024-10-06T14:04:00.380499",
        "primary_feature":"power"
    },
    "Data":{
        "Node":{
            "2143713049":{
                "id":2143713049,
                "tags":{},
                "lonlat":[
                    2.591996599999996,
                    9.330352699999997
                ]
            },
            ... more nodes here ...
          },
        "Way":{
            "204361210":{
                "id":204361210,
                "tags":{
                    "location":"outdoor",
                    "power":"substation",
                    "substation":"transmission",
                    "voltage":"161000"
                },
                "refs":[
                    2143713049,
                    2143713052,
                    2143713054,
                    2143713053,
                    2143713049
                ]
            },
            .... more ways here ...
         },
        "Relation":{}
    }
}

@davide-f, there is currently a region index file ./earth-osm/earth_osm/data/gfk_index.csv which needs to align well with pypsa-earth regions as some of the regions are slightly complicated. The csv file is created by gfk_data.py and could be extended.

i have created a mock function in the new file overpass.py and added the relevant toggles. The command that needs to pass as a test is the following:

earth_osm extract power --regions benin --features substation line generator --source overpass

the above command also prints out the args before raising an error

@bobbyxng is it feasible to convert the overpass data into the snippet above and the arg provided in the mock function get_overpass_data ?

@mnm-matin mnm-matin added enhancement New feature or request question Further information is requested bug Something isn't working and removed question Further information is requested labels Oct 6, 2024
@davide-f
Copy link
Member

davide-f commented Oct 6, 2024

That's great Matin!
FYI Regarding the csv file, in pypsa-earth we added a filtering of the regions to adapt to that.
A periodic check may be a good idea to perform though, but a good draft is already implemented.

I've seen you pushed to main the proposal. Does the above work already with overpass?

@bobbyxng
Copy link

Thanks @mnm-matin and sorry for the absent-related late reply. It should be fairly easy, as retrieving the data through overpass turbo yields the structure above. In the API call, you can already filter for the correct component, so each of those jsons can be "homogenous" in the sense that they only contain ways, nodes, etc.

This is how I implemented it in pypsa-eur, I think we can transfer most of it into this:

# -*- coding: utf-8 -*-
# SPDX-FileCopyrightText: : 2020-2024 The PyPSA-Eur Authors
#
# SPDX-License-Identifier: MIT
"""
Retrieve OSM data for the specified country using the overpass API and save it
to the specified output files.

Note that overpass requests are based on a fair
use policy. `retrieve_osm_data` is meant to be used in a way that respects this
policy by fetching the needed data once, only.
"""

import json
import logging
import os
import time

import requests
from _helpers import (  # set_scenario_config,; update_config_from_wildcards,; update_config_from_wildcards,
    configure_logging,
    set_scenario_config,
)

logger = logging.getLogger(__name__)


def retrieve_osm_data(
    country,
    output,
    features=[
        "cables_way",
        "lines_way",
        "routes_relation",
        "substations_way",
        "substations_relation",
    ],
):
    """
    Retrieve OSM data for the specified country and save it to the specified
    output files.

    Parameters
    ----------
    country : str
        The country code for which the OSM data should be retrieved.
    output : dict
        A dictionary mapping feature names to the corresponding output file
        paths. Saving the OSM data to .json files.
    features : list, optional
        A list of OSM features to retrieve. The default is [
            "cables_way",
            "lines_way",
            "routes_relation",
            "substations_way",
            "substations_relation",
            ].
    """
    # Overpass API endpoint URL
    overpass_url = "https://overpass-api.de/api/interpreter"

    features_dict = {
        "cables_way": 'way["power"="cable"]',
        "lines_way": 'way["power"="line"]',
        "routes_relation": 'relation["route"="power"]',
        "substations_way": 'way["power"="substation"]',
        "substations_relation": 'relation["power"="substation"]',
    }

    wait_time = 5

    for f in features:
        if f not in features_dict:
            logger.info(
                f"Invalid feature: {f}. Supported features: {list(features_dict.keys())}"
            )
            raise ValueError(
                f"Invalid feature: {f}. Supported features: {list(features_dict.keys())}"
            )

        retries = 3
        for attempt in range(retries):
            logger.info(
                f" - Fetching OSM data for feature '{f}' in {country} (Attempt {attempt+1})..."
            )

            # Build the overpass query
            op_area = f'area["ISO3166-1"="{country}"]'
            op_query = f"""
                [out:json];
                {op_area}->.searchArea;
                (
                {features_dict[f]}(area.searchArea);
                );
                out body geom;
            """
            try:
                # Send the request
                response = requests.post(overpass_url, data=op_query)
                response.raise_for_status()  # Raise HTTPError for bad responses
                data = response.json()

                filepath = output[f]
                parentfolder = os.path.dirname(filepath)
                if not os.path.exists(parentfolder):
                    os.makedirs(parentfolder)

                with open(filepath, mode="w") as f:
                    json.dump(response.json(), f, indent=2)
                logger.info(" - Done.")
                break  # Exit the retry loop on success
            except (json.JSONDecodeError, requests.exceptions.RequestException) as e:
                logger.error(f"Error for feature '{f}' in country {country}: {e}")
                logger.debug(
                    f"Response text: {response.text if response else 'No response'}"
                )
                if attempt < retries - 1:
                    wait_time += 15
                    logger.info(f"Waiting {wait_time} seconds before retrying...")
                    time.sleep(wait_time)
                else:
                    logger.error(
                        f"Failed to retrieve data for feature '{f}' in country {country} after {retries} attempts."
                    )
            except Exception as e:
                # For now, catch any other exceptions and log them. Treat this
                # the same as a RequestException and try to run again two times.
                logger.error(
                    f"Unexpected error for feature '{f}' in country {country}: {e}"
                )
                if attempt < retries - 1:
                    wait_time += 10
                    logger.info(f"Waiting {wait_time} seconds before retrying...")
                    time.sleep(wait_time)
                else:
                    logger.error(
                        f"Failed to retrieve data for feature '{f}' in country {country} after {retries} attempts."
                    )


if __name__ == "__main__":
    if "snakemake" not in globals():
        from _helpers import mock_snakemake

        snakemake = mock_snakemake("retrieve_osm_data", country="BE")
    configure_logging(snakemake)
    set_scenario_config(snakemake)

    # Retrieve the OSM data
    country = snakemake.wildcards.country
    output = snakemake.output

    retrieve_osm_data(country, output)

https://github.com/PyPSA/pypsa-eur/blob/master/scripts/retrieve_osm_data.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants