Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing addresses not found in rejected list #371

Open
BK01 opened this issue Nov 24, 2023 · 2 comments
Open

Missing addresses not found in rejected list #371

BK01 opened this issue Nov 24, 2023 · 2 comments
Labels
bug Something isn't working data integration geocoder

Comments

@BK01
Copy link
Contributor

BK01 commented Nov 24, 2023

Scenario:
*Geocoded in PROD with October 2023 data vintage.

Searching for the following addressString (50 Clark Dr, Port Alice, BC) results in a BLOCK level match with a fault of STREET_NAME.spelledWrong. The resulting fullAddress includes an updated street name of Clarke Dr.

Task:
Investigate why the addresses on Clark Dr were not found in the rejected address SHPs.

Initial Review:

It was found that BCA, AddressBC and the street sign shown in Google Street View use the street name ‘Clark’. BCA's assessment search tool also lists ’50 CLARK DR PORT ALICE’ with a PID. The PID was also confirmed in the PMBC and BCA input files. This street name discrepancy has been submitted to the road network team for review.

Based on the street name difference, it was no surprise that 50 Clark Dr, Port Alice, BC was not listed in the site_Hybrid_geocoder file. However, it was also not listed in the rejected address SHPs.

Next, the PMBC layer was examined and showed that this location is a strata lot. More specifically, there were 102 parcel polygons stacked at this location. 101 parcels had a unique PID and an owner type of Private. One parcel was unclassified and had no PID.

Performing a within search (in PROD) of this parcel revealed 69 sites at -127.4830565, 50.4290903. Based on the site_Hybrid_geocoder file they each have a unique PID.

Regarding the address range, most addresses appear to be within the correct overall range. However, the road network shows that part of Clarke Dr only has addresses on the left side of the street (right side range is 0 - 0), while having addresses on both sides of the street in another section. This corresponds with the residential driveways seen in aerial photos and Google Street View.

Finally, as a test the source addresses from AddressBC found within the associated parcel were Batch Geocoded. Of the 99 addresses, 100% of the non-Clark Dr addresses (67x) were found with a matchPrecision of CIVIC_NUMBER. The Clark Dr addresses only matched to the BLOCK level (30x) and the STREET level (2x - CIVIC_NUMBER.notInAnyBlock).

@BK01 BK01 added bug Something isn't working data integration geocoder labels Nov 24, 2023
@gleeming
Copy link
Collaborator

gleeming commented Nov 24, 2023

Some notes to pass on from my investigation

  • Bing, MapBox and MWRD web map (looks to be sourced from ITN) all use Clarke Dr. I don't have evidence one way or the other to confirm the correct spelling, just noting multiple sources use both Clarke and Clark.
  • The D4 FME Site Prep process outputs "good" looking address candidates. It also rejects unusable address candidates (e.g. malformed civic number, missing street name) as well as any replicate records into the rejected_site_Hybrid shapefile. Since the Port Alice records from BCA/ABC look like they could be real addresses, they become candidates for the next steps and are not kicked out as rejects (yet).
  • After the batching (E1) and Java Site Loader Prep (E2) steps, any candidates related to imperfect street name geocodes get rejected to a file called site_loader_prep_rejected.csv (see below for why this may have caused us all confusion). For your cases, I can see in our older runs that the Clark Dr addresses are indeed output here due to being penalized for [STREET_NAME.spelledWrong:2]
  • Because they are rejected at this stage, as you note they weren't expected to make it to the E4 BAARG site_Hybrid_geocoder.tsv output

TBD

  • Update OLS Prep Flow geocoder diagram -- change file name from rejected_geocode_Hybrid Shapefile to site_loader_prep_rejected.csv
  • Update geocoder data processing documentation section E2.5 last table -- change site_loader_prep_log.csv to site_loader_prep_rejected.csv

@BK01
Copy link
Contributor Author

BK01 commented Nov 24, 2023

Thank you for looking into this further and providing notes on the remaining steps. By searching the contents of site_loader_prep_rejected.csv I can confirm that the addresses on Clark Drive are listed with a fault of STREET_NAME.spelledWrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data integration geocoder
Projects
None yet
Development

No branches or pull requests

2 participants