README for Location Data Cleanup Project
This project is written in Python 3.6 using Google Places api and MySQL
This project takes location information (address, name, or phone number), checks it against Google Places to get the most accurate data, and updates a specified Database Table with that information
Information required to run:
-
Database Host
-
Database Username
-
Database Password
-
Google Places API Key
-
inputtype for query
textquery
phonenumber
(information on required phonenumber formatting can be found here)
-
Name of
CSV
file with location/phone number information
On execution:
-
Optional:
location_data_cleanup_seed.py
creates tables with required schema (also deletes existing tables with the same name, so exercise caution here) -
location_data_cleanup_config_reader.py
andlocation_data_cleanup_csv_reader.py
read and prepare user input fromlocation_data_cleanup_config.json
and theCSV
file specified in the configurations -
location_data_cleanup_database_prep.py
takes the prepared data fromlocation_data_cleanup_csv_reader.py
and inserts it intoRelationshipTable
in preparation for making Places api calls -
location_data_cleanup_find_places.py
runs through all unprocessed entries inRelationshipTable
(processed entries are flagged to prevent redundant api requests, therefore allowing the user to add rows to the table and run the script at any time with no wasteful cost) and makes a GET request to the Google Places API. The request returns Google's top location suggestion based on theCSV
place data, with which the script will update the record inRelationshipTable
-
Also, if the location returned from the GET request has never been returned before (it has a unique place Id), the place Id is inserted into
DetailsTable
which contains all unique places that have been searched and the all available detailed information about it (hours, lat, lng, phone number, address, website, etc) -
location_data_cleanup_details_update.py
runs through all records inDetailsTable
and updates the details for each place through a GET request to the Places API. Note it only makes a request if theDetails
field is empty or theDateUpdated
is earlier than Google's last update of the place
-
The GET requests made to the Places API cost money. Each initial request for a place (based on
CSV
input and Processed field inRelationshipTable
) costs $0.017 and each update request (based on place id and DateUpdated field inDetailsTable
) costs $0.003. Please be aware of that so as to avoid large bills due to frivolous requests. Overall pricing information can be found here, initial request pricing here, and detail update request pricing here. -
Place details are returned as a json object and dumped unceremoniously into a single field. SQL magic is requrired for any formatted extraction of data from that field.
-
It is important that the quotation marks are maintained in the
location_data_cleanup_config.json
file and in theCSV
file specified in the config file. If they are not, the api calls to Tableau will not work. -
Commas and other seperating characters are not necessary in the
CSV
query data if textquery is specified. All non-alphanumeric data will be stripped out for GET request compatability -
The "Record Id" column in the
CSV
file is not used in any way by the script. It is simply transferred to theRelationshipTable
for trackability if pulling data from another database or table. If not needed, simply set to an empty string. -
"Type" refers to the type of establishment being searched for in the Places API. The list of supported types can be found here.
-
The stance taken on bad requests (aka PlaceIds that become obsolete (due to the business closing, moving, etc) or no places matching a given description) is to not update them. Instead, these records are placed in a Failures Table for error tracking.
- MySQLdb (specifically mysqlclient)
- requests
- os
- path
- csv
- json
A properly formatted CSV
file looks as such, with the first row as headers and the following rows as corresponding values:
"Record Id";"Query Input";"Type"
"2345";"First Baptist Church North 7th Street";"Church"
"3344";"Lily's Daycare 123 West Sycamore New York, NY";"Daycare"
"2234";"Harry's Pet Store 332 South St. Huston, TX";"Pet store"