Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/odbiz scripts #12

Open
wants to merge 37 commits into
base: feature/odbiz_scripts
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
c699b4e
Rename folders for organization
Skye-Chen-CSBP-CPSE Jun 24, 2022
00959cf
Backup PreProcessing and OpenTab scripts
Skye-Chen-CSBP-CPSE Jun 24, 2022
383b680
Basic Merging script + helpers for ODBiz
Skye-Chen-CSBP-CPSE Jun 24, 2022
2f2dfc4
Merge branch 'feature/odbiz_scripts' of https://github.com/Skye-Chen-…
Skye-Chen-CSBP-CPSE Jun 24, 2022
644dbbc
Vancouver: Filtered out non-CAD businesses, fixed business_name
Skye-Chen-CSBP-CPSE Jul 1, 2022
b3f18ac
Merging scripts edited to fix Vancouver
Skye-Chen-CSBP-CPSE Jul 1, 2022
7c3247d
Created scripts to create and extract zip files for transferring file…
Skye-Chen-CSBP-CPSE Jul 1, 2022
7999935
Fixed `street_no` formatting
Skye-Chen-CSBP-CPSE Jul 9, 2022
e319a63
Added more dup_keys, cleaned up code
Skye-Chen-CSBP-CPSE Jul 9, 2022
0562b38
Helpful scripts from other projects
Skye-Chen-CSBP-CPSE Jul 9, 2022
6ef2b1b
Added more viewing options
Skye-Chen-CSBP-CPSE Jul 9, 2022
1b9342b
Notebook for verifying data formats
Skye-Chen-CSBP-CPSE Jul 9, 2022
2f18e7e
Added more data exploration analysis
Skye-Chen-CSBP-CPSE Jul 9, 2022
6d609a6
Merge branch 'feature/odbiz_scripts' of https://github.com/Skye-Chen-…
Skye-Chen-CSBP-CPSE Jul 9, 2022
03b6e3b
Removed redundant code
Skye-Chen-CSBP-CPSE Jul 15, 2022
557569e
Added script to remove invalid coordinates
Skye-Chen-CSBP-CPSE Jul 15, 2022
f683ebe
Facilitates weekly script backups
Skye-Chen-CSBP-CPSE Jul 15, 2022
ad1e36c
Merge branch 'feature/odbiz_scripts' of https://github.com/Skye-Chen-…
Skye-Chen-CSBP-CPSE Jul 15, 2022
933204a
Fix invalid coords, move pipeline to new server, dedup Vancouver
Skye-Chen-CSBP-CPSE Jul 23, 2022
2535064
Merge branch 'feature/odbiz_scripts' of https://github.com/Skye-Chen-…
Skye-Chen-CSBP-CPSE Jul 23, 2022
fb807d5
Update scripts for use in new server
Skye-Chen-CSBP-CPSE Jul 29, 2022
6da5204
Improved documentation and data integrity verification
Skye-Chen-CSBP-CPSE Aug 5, 2022
b1611d5
Improved documentation
Skye-Chen-CSBP-CPSE Aug 5, 2022
0279b27
Drop temp cols, copy over files to parsing
Skye-Chen-CSBP-CPSE Aug 5, 2022
738338e
Backup parsing scripts
Skye-Chen-CSBP-CPSE Aug 5, 2022
96d6541
Backup geocoding scripts
Skye-Chen-CSBP-CPSE Aug 5, 2022
f0f7317
Search for incorrectly parsed addresses
Skye-Chen-CSBP-CPSE Aug 12, 2022
79fe471
Added column alt_business_name
Skye-Chen-CSBP-CPSE Aug 20, 2022
c496550
Refactored code and Reapply libpostal to dashes_with_spaces problem
Skye-Chen-CSBP-CPSE Aug 20, 2022
f7848e0
Geocoding weekly backup
Skye-Chen-CSBP-CPSE Aug 20, 2022
d92c12e
Improved parsing for dashes with spaces and consolidate parsed addres…
Skye-Chen-CSBP-CPSE Aug 26, 2022
771668a
Weekly backup
Skye-Chen-CSBP-CPSE Aug 26, 2022
820b693
Rename preprocessing
Skye-Chen-CSBP-CPSE Sep 2, 2022
0df90e6
Rename readme files
Skye-Chen-CSBP-CPSE Sep 2, 2022
60e9acf
Rename files to make it easier to export function
Skye-Chen-CSBP-CPSE Sep 2, 2022
7eee891
Update documentation,
Skye-Chen-CSBP-CPSE Sep 2, 2022
b86c876
Weekly backup
Skye-Chen-CSBP-CPSE Sep 2, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Added column alt_business_name
Skye-Chen-CSBP-CPSE committed Aug 20, 2022
commit 79fe47119d66f8fc14d30b4d07a9bd20f008d2e4
50 changes: 42 additions & 8 deletions scripts/Businesses/2-OpenTabulate/1-openTabulate.ipynb
Original file line number Diff line number Diff line change
@@ -30,8 +30,15 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 1,
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-15T16:59:23.441518Z",
"iopub.status.busy": "2022-08-15T16:59:23.440438Z",
"iopub.status.idle": "2022-08-15T16:59:23.567847Z",
"shell.execute_reply": "2022-08-15T16:59:23.566868Z",
"shell.execute_reply.started": "2022-08-15T16:59:23.441412Z"
},
"tags": []
},
"outputs": [
@@ -129,6 +136,7 @@
" 'business_sector',\n",
" 'business_subsector',\n",
" 'business_description',\n",
" 'alt_business_name',\n",
" 'business_id_no',\n",
" 'licence_number',\n",
" 'licence_type',\n",
@@ -239,15 +247,23 @@
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"execution_count": 2,
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-15T16:59:23.577060Z",
"iopub.status.busy": "2022-08-15T16:59:23.576612Z",
"iopub.status.idle": "2022-08-15T17:00:02.613353Z",
"shell.execute_reply": "2022-08-15T17:00:02.612294Z",
"shell.execute_reply.started": "2022-08-15T16:59:23.577033Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Beginning data processing.\n",
"Completed processing in 0.000790267251431942 seconds.\n"
"Completed processing in 37.817878804169595 seconds.\n"
]
}
],
@@ -266,8 +282,17 @@
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"execution_count": 4,
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-15T17:00:25.847534Z",
"iopub.status.busy": "2022-08-15T17:00:25.846232Z",
"iopub.status.idle": "2022-08-15T17:00:26.494861Z",
"shell.execute_reply": "2022-08-15T17:00:26.493978Z",
"shell.execute_reply.started": "2022-08-15T17:00:25.847455Z"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
@@ -280,6 +305,7 @@
"source": [
"# transfer files directly from OpenTab/data/output to Merging/input\n",
"import shutil\n",
"import os\n",
"src = '../2-OpenTabulate/data/output'\n",
"dst = '../3-Merging/input'\n",
"\n",
@@ -301,8 +327,16 @@
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"execution_count": null,
"metadata": {
"execution": {
"iopub.status.busy": "2022-08-15T17:00:03.109630Z",
"iopub.status.idle": "2022-08-15T17:00:03.109958Z",
"shell.execute_reply": "2022-08-15T17:00:03.109818Z",
"shell.execute_reply.started": "2022-08-15T17:00:03.109802Z"
},
"tags": []
},
"outputs": [],
"source": [
"# '''\n",
1 change: 1 addition & 0 deletions scripts/Businesses/2-OpenTabulate/opentab.conf
Original file line number Diff line number Diff line change
@@ -73,6 +73,7 @@ biz = ('business_name',
'business_sector',
'business_subsector',
'business_description',
'alt_business_name',
'business_id_no',
'licence_number',
'licence_type',
1 change: 1 addition & 0 deletions scripts/Businesses/3-Merging/MergingBiz.py
Original file line number Diff line number Diff line change
@@ -114,6 +114,7 @@ def main():
col_order = [ 'idx',
'localfile',
'business_name',
'alt_business_name',
'business_sector',
'business_subsector',
'business_description',