Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/odbiz scripts #12

Open
wants to merge 37 commits into
base: feature/odbiz_scripts
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
c699b4e
Rename folders for organization
Skye-Chen-CSBP-CPSE Jun 24, 2022
00959cf
Backup PreProcessing and OpenTab scripts
Skye-Chen-CSBP-CPSE Jun 24, 2022
383b680
Basic Merging script + helpers for ODBiz
Skye-Chen-CSBP-CPSE Jun 24, 2022
2f2dfc4
Merge branch 'feature/odbiz_scripts' of https://github.com/Skye-Chen-…
Skye-Chen-CSBP-CPSE Jun 24, 2022
644dbbc
Vancouver: Filtered out non-CAD businesses, fixed business_name
Skye-Chen-CSBP-CPSE Jul 1, 2022
b3f18ac
Merging scripts edited to fix Vancouver
Skye-Chen-CSBP-CPSE Jul 1, 2022
7c3247d
Created scripts to create and extract zip files for transferring file…
Skye-Chen-CSBP-CPSE Jul 1, 2022
7999935
Fixed `street_no` formatting
Skye-Chen-CSBP-CPSE Jul 9, 2022
e319a63
Added more dup_keys, cleaned up code
Skye-Chen-CSBP-CPSE Jul 9, 2022
0562b38
Helpful scripts from other projects
Skye-Chen-CSBP-CPSE Jul 9, 2022
6ef2b1b
Added more viewing options
Skye-Chen-CSBP-CPSE Jul 9, 2022
1b9342b
Notebook for verifying data formats
Skye-Chen-CSBP-CPSE Jul 9, 2022
2f18e7e
Added more data exploration analysis
Skye-Chen-CSBP-CPSE Jul 9, 2022
6d609a6
Merge branch 'feature/odbiz_scripts' of https://github.com/Skye-Chen-…
Skye-Chen-CSBP-CPSE Jul 9, 2022
03b6e3b
Removed redundant code
Skye-Chen-CSBP-CPSE Jul 15, 2022
557569e
Added script to remove invalid coordinates
Skye-Chen-CSBP-CPSE Jul 15, 2022
f683ebe
Facilitates weekly script backups
Skye-Chen-CSBP-CPSE Jul 15, 2022
ad1e36c
Merge branch 'feature/odbiz_scripts' of https://github.com/Skye-Chen-…
Skye-Chen-CSBP-CPSE Jul 15, 2022
933204a
Fix invalid coords, move pipeline to new server, dedup Vancouver
Skye-Chen-CSBP-CPSE Jul 23, 2022
2535064
Merge branch 'feature/odbiz_scripts' of https://github.com/Skye-Chen-…
Skye-Chen-CSBP-CPSE Jul 23, 2022
fb807d5
Update scripts for use in new server
Skye-Chen-CSBP-CPSE Jul 29, 2022
6da5204
Improved documentation and data integrity verification
Skye-Chen-CSBP-CPSE Aug 5, 2022
b1611d5
Improved documentation
Skye-Chen-CSBP-CPSE Aug 5, 2022
0279b27
Drop temp cols, copy over files to parsing
Skye-Chen-CSBP-CPSE Aug 5, 2022
738338e
Backup parsing scripts
Skye-Chen-CSBP-CPSE Aug 5, 2022
96d6541
Backup geocoding scripts
Skye-Chen-CSBP-CPSE Aug 5, 2022
f0f7317
Search for incorrectly parsed addresses
Skye-Chen-CSBP-CPSE Aug 12, 2022
79fe471
Added column alt_business_name
Skye-Chen-CSBP-CPSE Aug 20, 2022
c496550
Refactored code and Reapply libpostal to dashes_with_spaces problem
Skye-Chen-CSBP-CPSE Aug 20, 2022
f7848e0
Geocoding weekly backup
Skye-Chen-CSBP-CPSE Aug 20, 2022
d92c12e
Improved parsing for dashes with spaces and consolidate parsed addres…
Skye-Chen-CSBP-CPSE Aug 26, 2022
771668a
Weekly backup
Skye-Chen-CSBP-CPSE Aug 26, 2022
820b693
Rename preprocessing
Skye-Chen-CSBP-CPSE Sep 2, 2022
0df90e6
Rename readme files
Skye-Chen-CSBP-CPSE Sep 2, 2022
60e9acf
Rename files to make it easier to export function
Skye-Chen-CSBP-CPSE Sep 2, 2022
7eee891
Update documentation,
Skye-Chen-CSBP-CPSE Sep 2, 2022
b86c876
Weekly backup
Skye-Chen-CSBP-CPSE Sep 2, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Ignore the following directories
.ipynb_checkpoints*
69 changes: 69 additions & 0 deletions scripts/Businesses/1-PreProcessing/BackupRawFiles.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# BackupRawFiles.ipynb\n",
"This notebook compresses all non-archived files under `1-PreProcessing/raw` into the `odbiz_raw_backup.zip` file.\n",
"\n",
"This provides a convenient way for us to backup our raw datasets. "
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"'/home/jovyan/ODBiz/1-PreProcessing/odbiz_raw_backup.zip'"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import shutil\n",
"\n",
"dir_to_compress = '/home/jovyan/ODBiz/1-PreProcessing/raw'\n",
"output_filename = '/home/jovyan/ODBiz/1-PreProcessing/odbiz_raw_backup'\n",
"\n",
"shutil.make_archive(output_filename, 'zip', dir_to_compress)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:root] *",
"language": "python",
"name": "conda-root-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
20 changes: 20 additions & 0 deletions scripts/Businesses/1-PreProcessing/README_preprocessing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# 1-PreProcessing
The purpose of this step is to do some basic data cleaning, especially if certain datasets are especially troublesome for the rest of our pipeline to deal with.

## `preprocessing_main.ipynb`
Hitting "Run all" on this Jupyter notebook will run all the necessary scripts in the correct order. A lot of the documentation was already written in the file `preprocessing_main.ipynb`, so it will not be repeated here.

## `ODBizSources.csv`
The file `ODBizSources.csv` stores metadata about our source files, including links to the original sources. Our source files are stored in `/home/jovyan/ODBiz/1-PreProcessing/raw`.

---

## Dropped rows
The scripts below will drop entries if they meet certain conditions that we deem as being irrelevant given the scope of this project

### `process_vancouver.py`
The Vancouver dataset duplicates every approved business year after year. As a result, about 80% of it's entries are duplicates. So, this script detects all duplicate sets based on the following columns: `['BusinessName', 'Province', 'BusinessType', 'Unit', 'House', 'Street', 'City', 'Province', 'PostalCode', 'Country']`

The script keeps the most recent entry and removes all the other duplicates.

In addition, this script removes entries that do not have a valid Canadian province code (i.e., they reside in states/provinces outside of Canada)
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@
"cells": [
{
"cell_type": "code",
"execution_count": 71,
"execution_count": 1,
"metadata": {
"execution": {
"iopub.execute_input": "2022-03-09T16:13:37.195853Z",
"iopub.status.busy": "2022-03-09T16:13:37.195419Z",
"iopub.status.idle": "2022-03-09T16:13:37.200410Z",
"shell.execute_reply": "2022-03-09T16:13:37.199323Z",
"shell.execute_reply.started": "2022-03-09T16:13:37.195810Z"
"iopub.execute_input": "2022-08-05T19:38:43.114168Z",
"iopub.status.busy": "2022-08-05T19:38:43.113840Z",
"iopub.status.idle": "2022-08-05T19:38:43.121707Z",
"shell.execute_reply": "2022-08-05T19:38:43.120987Z",
"shell.execute_reply.started": "2022-08-05T19:38:43.114075Z"
},
"tags": []
},
Expand All @@ -21,18 +21,30 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"metadata": {
"execution": {
"iopub.execute_input": "2022-03-10T20:36:04.766397Z",
"iopub.status.busy": "2022-03-10T20:36:04.765776Z",
"iopub.status.idle": "2022-03-10T20:36:05.801784Z",
"shell.execute_reply": "2022-03-10T20:36:05.800840Z",
"shell.execute_reply.started": "2022-03-10T20:36:04.766275Z"
"iopub.execute_input": "2022-08-05T19:38:45.841340Z",
"iopub.status.busy": "2022-08-05T19:38:45.840958Z",
"iopub.status.idle": "2022-08-05T19:38:46.985150Z",
"shell.execute_reply": "2022-08-05T19:38:46.983742Z",
"shell.execute_reply.started": "2022-08-05T19:38:45.841289Z"
},
"tags": []
},
"outputs": [],
"outputs": [
{
"ename": "ModuleNotFoundError",
"evalue": "No module named 'geopandas'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)",
"Input \u001b[0;32mIn [2]\u001b[0m, in \u001b[0;36m<cell line: 2>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mpandas\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mpd\u001b[39;00m\n\u001b[0;32m----> 2\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mgeopandas\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mgpd\u001b[39;00m\n\u001b[1;32m 3\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mfolium\u001b[39;00m\n",
"\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'geopandas'"
]
}
],
"source": [
"import pandas as pd\n",
"import geopandas as gpd\n",
Expand Down Expand Up @@ -191,7 +203,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand Down
132 changes: 132 additions & 0 deletions scripts/Businesses/1-PreProcessing/burnaby_york.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-09T19:33:50.959950Z",
"iopub.status.busy": "2022-08-09T19:33:50.959370Z",
"iopub.status.idle": "2022-08-09T19:33:52.917541Z",
"shell.execute_reply": "2022-08-09T19:33:52.916006Z",
"shell.execute_reply.started": "2022-08-09T19:33:50.959879Z"
},
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/envs/odbiz/share/proj failed\n"
]
},
{
"ename": "DriverError",
"evalue": "/home/jovyan/ODBiz/1-PreProcessing/raw/shapefiles/Business_Licences.geojson: No such file or directory",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mCPLE_OpenFailedError\u001b[0m Traceback (most recent call last)",
"File \u001b[0;32mfiona/_shim.pyx:83\u001b[0m, in \u001b[0;36mfiona._shim.gdal_open_vector\u001b[0;34m()\u001b[0m\n",
"File \u001b[0;32mfiona/_err.pyx:291\u001b[0m, in \u001b[0;36mfiona._err.exc_wrap_pointer\u001b[0;34m()\u001b[0m\n",
"\u001b[0;31mCPLE_OpenFailedError\u001b[0m: /home/jovyan/ODBiz/1-PreProcessing/raw/shapefiles/Business_Licences.geojson: No such file or directory",
"\nDuring handling of the above exception, another exception occurred:\n",
"\u001b[0;31mDriverError\u001b[0m Traceback (most recent call last)",
"Input \u001b[0;32mIn [1]\u001b[0m, in \u001b[0;36m<cell line: 12>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 9\u001b[0m name \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mBC_Burnaby_shapefile\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 10\u001b[0m fp \u001b[38;5;241m=\u001b[39m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mraw_shp_dir\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m/Business_Licences.geojson\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m---> 12\u001b[0m city \u001b[38;5;241m=\u001b[39m \u001b[43mgpd\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mread_file\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfp\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 14\u001b[0m \u001b[38;5;66;03m#city = city.set_crs(, allow_override=True)\u001b[39;00m\n\u001b[1;32m 15\u001b[0m \u001b[38;5;66;03m#city = city.to_crs()\u001b[39;00m\n\u001b[1;32m 17\u001b[0m city\u001b[38;5;241m.\u001b[39mexplore()\n",
"File \u001b[0;32m/opt/conda/envs/odbiz/lib/python3.10/site-packages/geopandas/io/file.py:253\u001b[0m, in \u001b[0;36m_read_file\u001b[0;34m(filename, bbox, mask, rows, engine, **kwargs)\u001b[0m\n\u001b[1;32m 250\u001b[0m path_or_bytes \u001b[38;5;241m=\u001b[39m filename\n\u001b[1;32m 252\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m engine \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfiona\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n\u001b[0;32m--> 253\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_read_file_fiona\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 254\u001b[0m \u001b[43m \u001b[49m\u001b[43mpath_or_bytes\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mfrom_bytes\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbbox\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbbox\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmask\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmask\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mrows\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrows\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\n\u001b[1;32m 255\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 256\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m engine \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mpyogrio\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n\u001b[1;32m 257\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m _read_file_pyogrio(\n\u001b[1;32m 258\u001b[0m path_or_bytes, bbox\u001b[38;5;241m=\u001b[39mbbox, mask\u001b[38;5;241m=\u001b[39mmask, rows\u001b[38;5;241m=\u001b[39mrows, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs\n\u001b[1;32m 259\u001b[0m )\n",
"File \u001b[0;32m/opt/conda/envs/odbiz/lib/python3.10/site-packages/geopandas/io/file.py:294\u001b[0m, in \u001b[0;36m_read_file_fiona\u001b[0;34m(path_or_bytes, from_bytes, bbox, mask, rows, **kwargs)\u001b[0m\n\u001b[1;32m 291\u001b[0m reader \u001b[38;5;241m=\u001b[39m fiona\u001b[38;5;241m.\u001b[39mopen\n\u001b[1;32m 293\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m fiona_env():\n\u001b[0;32m--> 294\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m \u001b[43mreader\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpath_or_bytes\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;28;01mas\u001b[39;00m features:\n\u001b[1;32m 295\u001b[0m \n\u001b[1;32m 296\u001b[0m \u001b[38;5;66;03m# In a future Fiona release the crs attribute of features will\u001b[39;00m\n\u001b[1;32m 297\u001b[0m \u001b[38;5;66;03m# no longer be a dict, but will behave like a dict. So this should\u001b[39;00m\n\u001b[1;32m 298\u001b[0m \u001b[38;5;66;03m# be forwards compatible\u001b[39;00m\n\u001b[1;32m 299\u001b[0m crs \u001b[38;5;241m=\u001b[39m (\n\u001b[1;32m 300\u001b[0m features\u001b[38;5;241m.\u001b[39mcrs[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124minit\u001b[39m\u001b[38;5;124m\"\u001b[39m]\n\u001b[1;32m 301\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m features\u001b[38;5;241m.\u001b[39mcrs \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124minit\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;129;01min\u001b[39;00m features\u001b[38;5;241m.\u001b[39mcrs\n\u001b[1;32m 302\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m features\u001b[38;5;241m.\u001b[39mcrs_wkt\n\u001b[1;32m 303\u001b[0m )\n\u001b[1;32m 305\u001b[0m \u001b[38;5;66;03m# handle loading the bounding box\u001b[39;00m\n",
"File \u001b[0;32m/opt/conda/envs/odbiz/lib/python3.10/site-packages/fiona/env.py:408\u001b[0m, in \u001b[0;36mensure_env_with_credentials.<locals>.wrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 405\u001b[0m \u001b[38;5;129m@wraps\u001b[39m(f)\n\u001b[1;32m 406\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mwrapper\u001b[39m(\u001b[38;5;241m*\u001b[39margs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs):\n\u001b[1;32m 407\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m local\u001b[38;5;241m.\u001b[39m_env:\n\u001b[0;32m--> 408\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mf\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 409\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 410\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(args[\u001b[38;5;241m0\u001b[39m], \u001b[38;5;28mstr\u001b[39m):\n",
"File \u001b[0;32m/opt/conda/envs/odbiz/lib/python3.10/site-packages/fiona/__init__.py:264\u001b[0m, in \u001b[0;36mopen\u001b[0;34m(fp, mode, driver, schema, crs, encoding, layer, vfs, enabled_drivers, crs_wkt, **kwargs)\u001b[0m\n\u001b[1;32m 261\u001b[0m path \u001b[38;5;241m=\u001b[39m parse_path(fp)\n\u001b[1;32m 263\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m mode \u001b[38;5;129;01min\u001b[39;00m (\u001b[38;5;124m'\u001b[39m\u001b[38;5;124ma\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mr\u001b[39m\u001b[38;5;124m'\u001b[39m):\n\u001b[0;32m--> 264\u001b[0m c \u001b[38;5;241m=\u001b[39m \u001b[43mCollection\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpath\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmode\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdriver\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdriver\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mencoding\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mencoding\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 265\u001b[0m \u001b[43m \u001b[49m\u001b[43mlayer\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mlayer\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43menabled_drivers\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43menabled_drivers\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 266\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m mode \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mw\u001b[39m\u001b[38;5;124m'\u001b[39m:\n\u001b[1;32m 267\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m schema:\n\u001b[1;32m 268\u001b[0m \u001b[38;5;66;03m# Make an ordered dict of schema properties.\u001b[39;00m\n",
"File \u001b[0;32m/opt/conda/envs/odbiz/lib/python3.10/site-packages/fiona/collection.py:162\u001b[0m, in \u001b[0;36mCollection.__init__\u001b[0;34m(self, path, mode, driver, schema, crs, encoding, layer, vsi, archive, enabled_drivers, crs_wkt, ignore_fields, ignore_geometry, **kwargs)\u001b[0m\n\u001b[1;32m 160\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmode \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mr\u001b[39m\u001b[38;5;124m'\u001b[39m:\n\u001b[1;32m 161\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39msession \u001b[38;5;241m=\u001b[39m Session()\n\u001b[0;32m--> 162\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msession\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mstart\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 163\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmode \u001b[38;5;129;01min\u001b[39;00m (\u001b[38;5;124m'\u001b[39m\u001b[38;5;124ma\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mw\u001b[39m\u001b[38;5;124m'\u001b[39m):\n\u001b[1;32m 164\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39msession \u001b[38;5;241m=\u001b[39m WritingSession()\n",
"File \u001b[0;32mfiona/ogrext.pyx:540\u001b[0m, in \u001b[0;36mfiona.ogrext.Session.start\u001b[0;34m()\u001b[0m\n",
"File \u001b[0;32mfiona/_shim.pyx:90\u001b[0m, in \u001b[0;36mfiona._shim.gdal_open_vector\u001b[0;34m()\u001b[0m\n",
"\u001b[0;31mDriverError\u001b[0m: /home/jovyan/ODBiz/1-PreProcessing/raw/shapefiles/Business_Licences.geojson: No such file or directory"
]
}
],
"source": [
"import os\n",
"import geopandas as gpd\n",
"import pandas as pd\n",
"\n",
"raw_shp_dir = '/home/jovyan/ODBiz/1-PreProcessing/raw/shapefiles'\n",
"out_dir = '/home/jovyan/ODBiz/1-PreProcessing' # Default\n",
"\n",
"\n",
"name = \"BC_Burnaby_shapefile\"\n",
"fp = f\"{raw_shp_dir}/Business_Licences.geojson\"\n",
"\n",
"city = gpd.read_file(fp)\n",
"\n",
"#city = city.set_crs(, allow_override=True)\n",
"#city = city.to_crs()\n",
"\n",
"city.explore()\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"execution": {
"iopub.execute_input": "2022-06-27T14:45:18.340157Z",
"iopub.status.busy": "2022-06-27T14:45:18.339952Z",
"iopub.status.idle": "2022-06-27T14:45:18.365361Z",
"shell.execute_reply": "2022-06-27T14:45:18.343352Z",
"shell.execute_reply.started": "2022-06-27T14:45:18.340131Z"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"BC_Burnaby_shapefile\n",
"epsg:4326\n"
]
}
],
"source": [
"print(name)\n",
"print(city.crs)\n",
"#city = city.to_crs(epsg=4326)\n",
"#print(city.crs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
" # sub_city = city.head(500)\n",
"\n",
"city['lon'] = city.centroid.x\n",
"city['lat'] = city.centroid.y"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:root] *",
"language": "python",
"name": "conda-root-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading