Skip to content

Latest commit

 

History

History
135 lines (89 loc) · 6.85 KB

README.md

File metadata and controls

135 lines (89 loc) · 6.85 KB

address formatting

A quick example

Given a set of address parts

 house_number:  17
 road:          Rue du Médecin-Colonel Calbairac
 neighbourhood: Lafourguette
 suburb:        Toulouse Ouest
 postcode:      31000
 city:          Toulouse
 county:        Toulouse
 state:         Midi-Pyrénées
 country:       France
 country_code:  FR

you want to write logic to compile addresses in the format consumers expect

17 Rue du Médecin-Colonel Calbairac
31000 Toulouse
France

or perhaps simply

Rue du Médecin-Colonel Calbairac
Toulouse
France

This repository contains templates for various address formats used in territories around the world. It also contains test cases.

Which addresses we're talking about

The intended use-case is database or geocoding systems (forward, reverse, autocomplete) where we know both the country of the address and the language of the user/reader. The address is displayed to a consumer (for example in an app) and not used to print on an envelope for actual postal delivery. We use it to format output from the OpenCage Geocoder.

We have to deal with

  • incomplete data
  • anything with a name (peaks, bridges, bus stops)

Unlike physical post (office) mail we don't have to deal with

  • apartment/flat number, floor numbers
  • PO boxes
  • translating the language of the (destination) address. Whatever langauge is input is output.

Processing logic

Our goal with this repository is a series of (programming) language independent templates. Those templates can then be processed by whatever software you like.

We've written, use and maintain a working implementation of a processer in Perl, see (CPAN: Geo::Address::Formatter, github repo). There is also an open-source implementation in PHP.

We would love there to be other langauge implementations. If you do write a processor, please let us know so we can list it here.

Coverage

As of Tue May 10 09:13:21 CEST 2016 coverage is:

We are aware of 249 territories 
We have tests for 249 (100%) territories
We have rules for 249 (100%) territories
0 (0%) territories have neither rules nor tests

A detailed breakdown of test and configuration coverage can be found by running bin/coverage.pl -d. A list of all known territories is in conf/country_codes.yaml Note: the list is simple all officially assigned ISO 3166-1 alpha-2 codes, and is not a political statement on whether or not these territories are or are not or should or should not be political states.

File format

The files are in YAML format. The templates are written in Mustache. Both formats are human readable, strict, solve escaping and support comments. YAML allows references (called "ankers") to avoid copy&paste, Mustache allows sub-templates (called "partials").

How to add your country/territory

  1. add a .yaml testcase to the relevant file for the country/territory in testcases/countries. The file names correspond to the appropriate ISO 3166-1 alpha-2 code - see conf/country_codes.yaml
  • a good way to get sample data is:
    • find an addressed location (house, business, etc) in your target territory in OpenStreetMap
    • get the coordinates (lat, long) of the location
    • put the coordinates into the OpenCage Geocoder demo page
    • look at the resulting JSON in the Raw Response tab
  1. edit conf/countries/worldwide.yaml
  • Possibly your country/territory uses an existing generic format as defined at the top of the file. If so, great, just map you country_code to the generic template. You may still want to add clean up code (see the entry for DE as an example).
  • If not you need to define a new generic rule set
    • possibly you will need to define new state/region mappings in conf/state_codes.yaml
  1. to test you will now need to process the .yaml test via a processer (see above) and ensure the input leads to the desired output.

If in doubt, please get in touch via github issues.

Formatting rules

Currently we support the following formatting rules:

  • replace: regex that operates on the input values, useful for removing bureaucratic cruft like "London Borough of ". Note if you define the regex starting with format X=, for example city= it should operate only on values with that key
  • postformat_replace: regex that operates on the final output
  • add_component: with a value of the form component=XXXX
  • change_country: change the country value of the input, useful for dependent territories. Can include a substitution like $state so that that component value is then inserted into the new country value. See testcases/countries/sh.yaml for an example.
  • use_country: use the formating configuration of another country, useful for dependent territories to avoid duplicating configuration

The future

More tests! For every rule about addresses there are exceptions and edge cases to consider. More test cases are always needed.

Planned features

  • optionally shorten/abbreviate addresses, e.g. 'Hoover Str' instead of 'Hoover Street'
  • basic error checking, for example ignore things which obviously can not be postcodes
  • define rules for postcode format specifically

We welcome your pull requests. Together we can address the world!

Who are we?

We run the OpenCage Geocoder. Previously, before beine spun off as a seperate company, we were a division of Lokku, long time supporters of OpenStreetMap and open data initiatives. We also run #geomob, a meetup of London location based service developers where we do our best to highlight geoinnovation.

Further reading

Here's our blog post anouncing this project and the motivations behind it.

You may enjoy Michael Tandy's Falsehoods Programmers Believe about Addresses.

If it's actual address data you're after, check out OpenAddresses.

If you want to turn longitude, latitude into addresses or placenames, well that's what a geocoder does. Check out ours: OpenCage Geocoder.

If all this convinces you address are evil, please check out what3words which allows you to dispense with them entirely.