Skip to content

Commit

Permalink
first five of zipcode and RAM check; closes #5; closes #6 (#13)
Browse files Browse the repository at this point in the history
  • Loading branch information
cole-brokamp authored Dec 23, 2022
1 parent c8bb75d commit c4cf2d4
Show file tree
Hide file tree
Showing 8 changed files with 285 additions and 275 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ FROM rocker/r-ver:4.1.3

# DeGAUSS container metadata
ENV degauss_name="postal"
ENV degauss_version="0.1.3"
ENV degauss_version="0.1.4"
ENV degauss_description="normalized and parsed addresses"
ENV degauss_argument="expand [default: '']"

Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@
If `my_address_file.csv` is a file in the current working directory with an address column named `address`, then the [DeGAUSS command](https://degauss.org/using_degauss.html#DeGAUSS_Commands):

```sh
docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/postal:0.1.3 my_address_file.csv
docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/postal:0.1.4 my_address_file.csv
```

will produce `my_address_file_postal_0.1.3.csv` with added columns:
will produce `my_address_file_postal_0.1.4.csv` with added columns:

- **`cleaned_address`**: `address` with non-alphanumeric characterics and excess whitespace removed (with `dht::clean_address()`)
- **`parsed.{address_component}`**: multiple columns, one for each [parsed address component](https://github.com/openvenues/libpostal#parser-labels) (e.g., `parsed.road`, `parsed.state`, `parsed.house_number`)
- **`parsed_address`**: a "parsed" address created by pasting together available `parsed.house_number`, `parsed.road`, `parsed.city`, `parsed.state`, `parsed.postcode` address components
- **`parsed_address`**: a "parsed" address created by pasting together available `parsed.house_number`, `parsed.road`, `parsed.city`, `parsed.state`, and the *first five digits* of the `parsed.postcode` address components

### Optional Argument

Expand All @@ -24,10 +24,10 @@ After parsing, the parsed addresses can be expanded into [several possible norma
If any value is provided as an argument (e.g., "expand"), then the [DeGAUSS command](https://degauss.org/using_degauss.html#DeGAUSS_Commands):

```sh
docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/postal:0.1.3 my_address_file.csv expand
docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/postal:0.1.4 my_address_file.csv expand
```

will produce `my_address_file_postal_0.1.3_expand.csv` with the above columns *plus*:
will produce `my_address_file_postal_0.1.4_expand.csv` with the above columns *plus*:

- **`expanded_addresses`**: the expanded addresses for `parsed_address`

Expand Down
8 changes: 7 additions & 1 deletion entrypoint.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

dht::greeting()

dht::check_ram(4)

doc <- "
Usage:
entrypoint.R <filename> [<expand>]
Expand Down Expand Up @@ -44,9 +46,13 @@ parsed_address_components <-

d <- dplyr::bind_cols(d, parsed_address_components)

if (!is.null(d$parsed.postcode)) {
d$parsed.postcode_five <- substr(d$parsed.postcode, 1, 5)
}

d <- tidyr::unite(d,
col = "parsed_address",
tidyselect::any_of(paste0("parsed.", c("house_number", "road", "city", "state", "postcode"))),
tidyselect::any_of(paste0("parsed.", c("house_number", "road", "city", "state", "postcode_five"))),
sep = " ", na.rm = TRUE, remove = FALSE)

## expanding addresses
Expand Down
1 change: 1 addition & 0 deletions test/address.csv
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ id,address
5100020177,"4506 CAMBERWELL RD CINCINNATI, OH 45209"
5100020177,"4506 CAMBERWELL RD CINCINNATI, OH CINCINNATI, OH 45209"
55000100212,"5585 FAIRWOOD RD GREEN TOWNSHIP, OH 45239"
55000100212,"5585 FAIRWOOD RD GREEN TOWNSHIP, OH 45239-8579"
51000810328,"6628 JULY CT COLERAIN TOWNSHIP, OH 45239"
61201400371,"5126 BRASHER AV BLUE ASH, OH 45242"
19200650054,"3708 TAPPAN AV CINCINNATI, OH 45223"
Expand Down
137 changes: 69 additions & 68 deletions test/address_postal_0.1.2.csv → test/address_postal_0.1.4.csv

Large diffs are not rendered by default.

Large diffs are not rendered by default.

File renamed without changes.
File renamed without changes.

0 comments on commit c4cf2d4

Please sign in to comment.