Skip to content

Commit

Permalink
Fix error caused by multiple components; closes #11 (#12)
Browse files Browse the repository at this point in the history
* add example address with duplicated components to cause test to fail

* fix problem with duplicated address components

* 0.1.3 release
  • Loading branch information
cole-brokamp authored Oct 20, 2022
1 parent 23e0cc5 commit c8bb75d
Show file tree
Hide file tree
Showing 8 changed files with 12 additions and 7 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ FROM rocker/r-ver:4.1.3

# DeGAUSS container metadata
ENV degauss_name="postal"
ENV degauss_version="0.1.2"
ENV degauss_version="0.1.3"
ENV degauss_description="normalized and parsed addresses"
ENV degauss_argument="expand [default: '']"

Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@
If `my_address_file.csv` is a file in the current working directory with an address column named `address`, then the [DeGAUSS command](https://degauss.org/using_degauss.html#DeGAUSS_Commands):

```sh
docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/postal:0.1.2 my_address_file.csv
docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/postal:0.1.3 my_address_file.csv
```

will produce `my_address_file_postal_0.1.2.csv` with added columns:
will produce `my_address_file_postal_0.1.3.csv` with added columns:

- **`cleaned_address`**: `address` with non-alphanumeric characterics and excess whitespace removed (with `dht::clean_address()`)
- **`parsed.{address_component}`**: multiple columns, one for each [parsed address component](https://github.com/openvenues/libpostal#parser-labels) (e.g., `parsed.road`, `parsed.state`, `parsed.house_number`)
Expand All @@ -24,10 +24,10 @@ After parsing, the parsed addresses can be expanded into [several possible norma
If any value is provided as an argument (e.g., "expand"), then the [DeGAUSS command](https://degauss.org/using_degauss.html#DeGAUSS_Commands):

```sh
docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/postal:0.1.2 my_address_file.csv expand
docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/postal:0.1.3 my_address_file.csv expand
```

will produce `my_address_file_postal_0.1.2_expand.csv` with the above columns *plus*:
will produce `my_address_file_postal_0.1.3_expand.csv` with the above columns *plus*:

- **`expanded_addresses`**: the expanded addresses for `parsed_address`

Expand Down
6 changes: 4 additions & 2 deletions entrypoint.R
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,11 @@ parsed_address_components <-
purrr::transpose() |>
purrr::modify(unlist) |>
purrr::modify(jsonlite::fromJSON) |>
purrr::modify(tibble::as_tibble) |>
purrr::modify(tibble::as_tibble, .name_repair = "unique") |>
dplyr::bind_rows() |>
dplyr::rename_with(~ paste("parsed", .x, sep = "."))
dplyr::select(-contains("...")) |>
dplyr::rename_with(~ paste("parsed", .x, sep = ".")) |>
suppressMessages()

d <- dplyr::bind_cols(d, parsed_address_components)

Expand Down
1 change: 1 addition & 0 deletions test/address.csv
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ id,address
87190048084," "
97124042024," "
5100020177,"4506 CAMBERWELL RD CINCINNATI, OH 45209"
5100020177,"4506 CAMBERWELL RD CINCINNATI, OH CINCINNATI, OH 45209"
55000100212,"5585 FAIRWOOD RD GREEN TOWNSHIP, OH 45239"
51000810328,"6628 JULY CT COLERAIN TOWNSHIP, OH 45239"
61201400371,"5126 BRASHER AV BLUE ASH, OH 45242"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ id,address,cleaned_address,parsed_address,parsed.house_number,parsed.road,parsed
87190048084,NA,NA,na,NA,NA,NA,na,NA,NA,NA,NA
97124042024,NA,NA,na,NA,NA,NA,na,NA,NA,NA,NA
5100020177,"4506 CAMBERWELL RD CINCINNATI, OH 45209",4506 CAMBERWELL RD CINCINNATI OH 45209,4506 camberwell rd cincinnati oh 45209,4506,camberwell rd,cincinnati,oh,45209,NA,NA,NA
5100020177,"4506 CAMBERWELL RD CINCINNATI, OH CINCINNATI, OH 45209",4506 CAMBERWELL RD CINCINNATI OH CINCINNATI OH 45209,4506 camberwell rd 45209,4506,camberwell rd,NA,NA,45209,NA,NA,NA
55000100212,"5585 FAIRWOOD RD GREEN TOWNSHIP, OH 45239",5585 FAIRWOOD RD GREEN TOWNSHIP OH 45239,5585 fairwood rd green township oh 45239,5585,fairwood rd,green township,oh,45239,NA,NA,NA
51000810328,"6628 JULY CT COLERAIN TOWNSHIP, OH 45239",6628 JULY CT COLERAIN TOWNSHIP OH 45239,6628 july ct colerain township oh 45239,6628,july ct,colerain township,oh,45239,NA,NA,NA
61201400371,"5126 BRASHER AV BLUE ASH, OH 45242",5126 BRASHER AV BLUE ASH OH 45242,5126 brasher av blue ash oh 45242,5126,brasher av,blue ash,oh,45242,NA,NA,NA
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ id,address,cleaned_address,parsed_address,parsed.house_number,parsed.road,parsed
97124042024,NA,NA,na,NA,NA,NA,na,NA,NA,NA,NA,national association
5100020177,"4506 CAMBERWELL RD CINCINNATI, OH 45209",4506 CAMBERWELL RD CINCINNATI OH 45209,4506 camberwell rd cincinnati oh 45209,4506,camberwell rd,cincinnati,oh,45209,NA,NA,NA,4506 camberwell road cincinnati ohio 45209
5100020177,"4506 CAMBERWELL RD CINCINNATI, OH 45209",4506 CAMBERWELL RD CINCINNATI OH 45209,4506 camberwell rd cincinnati oh 45209,4506,camberwell rd,cincinnati,oh,45209,NA,NA,NA,4506 camberwell road cincinnati oh 45209
5100020177,"4506 CAMBERWELL RD CINCINNATI, OH CINCINNATI, OH 45209",4506 CAMBERWELL RD CINCINNATI OH CINCINNATI OH 45209,4506 camberwell rd 45209,4506,camberwell rd,NA,NA,45209,NA,NA,NA,4506 camberwell road 45209
55000100212,"5585 FAIRWOOD RD GREEN TOWNSHIP, OH 45239",5585 FAIRWOOD RD GREEN TOWNSHIP OH 45239,5585 fairwood rd green township oh 45239,5585,fairwood rd,green township,oh,45239,NA,NA,NA,5585 fairwood road green township ohio 45239
55000100212,"5585 FAIRWOOD RD GREEN TOWNSHIP, OH 45239",5585 FAIRWOOD RD GREEN TOWNSHIP OH 45239,5585 fairwood rd green township oh 45239,5585,fairwood rd,green township,oh,45239,NA,NA,NA,5585 fairwood road green township oh 45239
51000810328,"6628 JULY CT COLERAIN TOWNSHIP, OH 45239",6628 JULY CT COLERAIN TOWNSHIP OH 45239,6628 july ct colerain township oh 45239,6628,july ct,colerain township,oh,45239,NA,NA,NA,6628 july ct colerain township ohio 45239
Expand Down
File renamed without changes.
File renamed without changes.

0 comments on commit c8bb75d

Please sign in to comment.