Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Substantial mismatches in 2020 and 2021 SA3 #47

Open
MRWhitehead opened this issue Aug 12, 2024 · 1 comment · May be fixed by #51
Open

Substantial mismatches in 2020 and 2021 SA3 #47

MRWhitehead opened this issue Aug 12, 2024 · 1 comment · May be fixed by #51

Comments

@MRWhitehead
Copy link

Re: australian_postcodes.csv and .xslx

Swathes of postcodes where sa3 code (2020) and SA3_CODE_2021 | SA3_NAME_2021 are mismatched.

I haven't quantified the extent of the issue. Just noticed that spreadsheet was not able to be used for any 2021 SA3 work.

image

@pnappa
Copy link

pnappa commented Nov 20, 2024

In addition to this, I notice some values are completely off & entire SA3s are missing, leading to maps having large holes in them (the Goulburn SA3 - 10105 - is missing).

I worked around this by using the ABS provided shapefiles for postcodes & SA3s, and deriving this myself based on largest overlapping area.

Here's the code (caveat, it's LLM generated), I just run it in a jupyter notebook:

# Import necessary libraries
import geopandas as gpd
import pandas as pd

# Download these from here:
# https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition-3/jul2021-jun2026/access-and-downloads/digital-boundary-files
sa3_shapefile_path = 'SA3_2021_AUST_SHP_GDA2020/SA3_2021_AUST_GDA2020.shp'
postcode_shapefile_path = 'POA_2021_AUST_GDA2020_SHP/POA_2021_AUST_GDA2020.shp'

# Load the SA3 and Postcode shapefiles
sa3_gdf = gpd.read_file(sa3_shapefile_path)
postcode_gdf = gpd.read_file(postcode_shapefile_path)

# Ensure both GeoDataFrames use the same coordinate reference system (CRS)
if sa3_gdf.crs != postcode_gdf.crs:
    postcode_gdf = postcode_gdf.to_crs(sa3_gdf.crs)

# Calculate the intersection between each Postcode and SA3
intersections = gpd.overlay(postcode_gdf, sa3_gdf, how='intersection')

# Calculate the area of each intersection
intersections["area"] = intersections.geometry.area

# Group by Postcode and find the SA3 with the largest intersecting area
largest_overlap = intersections.groupby("POA_CODE21").apply(
    lambda x: x.loc[x["area"].idxmax()]
)

# Reset index and select relevant columns
largest_overlap = largest_overlap.reset_index(drop=True)
postcode_to_sa3_mapping = largest_overlap[["POA_CODE21", "SA3_CODE21"]]

# Save the mapping to a CSV file (optional)
postcode_to_sa3_mapping.to_csv("postcode_to_sa3_mapping.csv", index=False)

# Display the mapping
postcode_to_sa3_mapping.head()

@CRBagnall CRBagnall linked a pull request Jan 8, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants