Skip to content

Commit

Permalink
Added method to clean input csv of NBSP unicode chars
Browse files Browse the repository at this point in the history
The CSV file gets copied and pasted, into and out of Slack. While we would not recoomend this, it happened.
This addition strips thos characters from the input csv and writes a clean file.
  • Loading branch information
Stephen James committed Dec 29, 2023
1 parent cf59efc commit 5ee976a
Showing 1 changed file with 12 additions and 1 deletion.
13 changes: 12 additions & 1 deletion src/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,13 @@
import csv
from geocode import geocode, find_timezone, find_country_code

# Strip non-breaking-space invisible characters
def replace_non_breaking_spaces(unsanitised, clean):
with open(unsanitised, "r") as input:
with open(clean, "w") as output:
for line in input:
line = line.replace("\xa0", " ")
output.write(line)

# Convert CSV file to JSON object.
def convert_csv_to_json(file_path):
Expand Down Expand Up @@ -43,7 +50,11 @@ def add_geocoding_to_json(data):

if __name__ == '__main__':

csv_file_path = os.getcwd() + '/../data_src/sites_with_clients.csv'
unsanitised_csv_file_path = os.getcwd() + '/../data_src/sites_with_clients.csv'
csv_file_path = os.getcwd() + '/../data_src/sites_with_clients.clean.csv'

# Strip non-breaking-space invisible characters
replace_non_breaking_spaces(unsanitised_csv_file_path, csv_file_path)

# Convert CSV to valid JSON
json_data_without_geocoding = convert_csv_to_json(csv_file_path)
Expand Down

0 comments on commit 5ee976a

Please sign in to comment.