Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update anzsic2006 to download from abs source. #103

Merged
merged 5 commits into from
Aug 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ Authors@R: c(person("Will", "Mackey", email = "[email protected]", role = c("au
person("Benjamin", "Wee", role = c("aut")),
person("Carlos", "Yanez", role = "ctb"),
person("Bas", "Latcham", role = "ctb"),
person("Rex", "Parsons", role = "ctb", comment = c(ORCID = "0000-0002-6053-8174"))
person("Rex", "Parsons", role = "ctb", comment = c(ORCID = "0000-0002-6053-8174")),
person("Pete", "Owen", role = "ctb")
)
Maintainer: Will Mackey <[email protected]>
License: GPL-3
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# strayr (development version)
* `create read_correspondence_tbl()` reads correspondence tables from
`absmapsdata` similarly to `read_absmap()`
* updated `anzsco2006` to include leading zeros in codes (see ). This is a backwards incompatible change that may cause issues (not enough for a major version progression)

# strayr 0.2.2
* `anzsco2022` updated to reflect changes made by the ABS
Expand Down
29 changes: 14 additions & 15 deletions R/data_descriptions.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@
"anzsco2009"


#' ANZSCO 2019
#' ANZSCO 2013
#'
#' Wide table containing all levels of the Australian and New Zealand Standard
#' Classification of Occupations (ANZSCO), Version 1.3, 2019
#' Classification of Occupations (ANZSCO), Version 1.2, 2013
#'
#' @format A \code{tibble} with 11 variables:
#' \describe{
Expand All @@ -41,14 +41,13 @@
#' \item{\code{skill_level}}{Skill level required for occupation, determined by the ABS (1 is highest, 5 is lowest).
#' See \url{https://www.abs.gov.au/ausstats/[email protected]/Previousproducts/C4BECE1704987586CA257089001A9181 } for details.}
#' }
"anzsco2019"

"anzsco2013"


#' ANZSCO 2013
#' ANZSCO 2019
#'
#' Wide table containing all levels of the Australian and New Zealand Standard
#' Classification of Occupations (ANZSCO), Version 1.2, 2013
#' Classification of Occupations (ANZSCO), Version 1.3, 2019
#'
#' @format A \code{tibble} with 11 variables:
#' \describe{
Expand All @@ -65,7 +64,8 @@
#' \item{\code{skill_level}}{Skill level required for occupation, determined by the ABS (1 is highest, 5 is lowest).
#' See \url{https://www.abs.gov.au/ausstats/[email protected]/Previousproducts/C4BECE1704987586CA257089001A9181 } for details.}
#' }
"anzsco2013"
"anzsco2019"


#' ANZSCO 2021
#'
Expand All @@ -89,6 +89,7 @@
#' }
"anzsco2021"


#' ANZSCO 2022
#'
#' Wide table containing all levels of the Australian and New Zealand Standard
Expand All @@ -113,28 +114,26 @@
"anzsco2022"


#' ANZSIC
#' ANZSIC 2006
#'
#' Wide table containing all levels of the Australian and New Zealand Standard
#' Industrial Classification (ANZSIC), 2006 (Revision 1.0). Cat. 1292.0.
#' Industrial Classification (ANZSIC), 2006 (Revision 2.0). Cat. 1292.0.
#'
#' @format A \code{tibble} with 8 variables:
#' \describe{
#' \item{\code{anzsic_division_code}}{ANZSIC division codes character, e.g. "A", "B"}
#' \item{\code{anzsic_division}}{ANZSIC division title, e.g. "Agriculture, Forestry and Fishing"}
#' \item{\code{anzsic_subdivision_code}}{ANZSIC subdivision codes integer, e.g. 1, 2}
#' \item{\code{anzsic_subdivision_code}}{ANZSIC subdivision codes 2-digit character, e.g. 01, 02}
#' \item{\code{anzsic_subdivision}}{ANZSIC subdivision title, e.g. "Agriculture"}
#' \item{\code{anzsic_group_code}}{ANZSIC group codes integer, e.g. 11, 12}
#' \item{\code{anzsic_group_code}}{ANZSIC group codes 3-digit character, e.g. 011, 012}
#' \item{\code{anzsic_group}}{ANZSIC group title, e.g. "Mushroom and Vegetable Growing"}
#' \item{\code{anzsic_class_code}}{ANZSIC class codes integer, e.g. 111, 112}
#' \item{\code{anzsic_class_code}}{ANZSIC class codes 4-digit character, e.g. 0111, 0112}
#' \item{\code{anzsic_class}}{ANZSIC class title, e.g. "Vegetable Growing (Under Cover)"}
#' }
#' @source \url{https://www.abs.gov.au/statistics/classifications/australian-and-new-zealand-standard-industrial-classification-anzsic/2006-revision-2-0/numbering-system-and-titles/division-subdivision-group-and-class-codes-and-titles}
"anzsic2006"





#' ASCED Field of Education
#'
#' Wide table containing all levels of fields of education in the Australian
Expand Down
2 changes: 1 addition & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ Current structures stored in `strayr` are:
- `anzsco2013`: occupation levels of ANZSCO, [2013, Version 1.2](https://www.abs.gov.au/AUSSTATS/[email protected]/allprimarymainfeatures/4AF138F6DB4FFD4BCA2571E200096BAD?opendocument).
- `anzsco2009`: occupation levels ANZSCO, [First Edition, Revision 1, 2009](https://www.abs.gov.au/AUSSTATS/[email protected]/DetailsPage/1220.0First%20Edition,%20Revision%201?OpenDocument).
- Australian and New Zealand Standard Industrial Classification (**ANZSIC**), Cat. 1292.0:
- `anzsic2006`: industry levels of ANZSIC, [2006 (Revision 1.0)](https://www.abs.gov.au/ausstats/[email protected]/0/20C5B5A4F46DF95BCA25711F00146D75?opendocument).
- `anzsic2006`: industry levels of ANZSIC, [2006 (Revision 2.0)](https://www.abs.gov.au/statistics/classifications/australian-and-new-zealand-standard-industrial-classification-anzsic/2006-revision-2-0).
- Australian Standard Classification of Education (**ASCED**), Cat. 1272.0:
- `asced_foe2001`: field of education levels of ASCED, [2001](https://www.abs.gov.au/ausstats/[email protected]/mf/1272.0).
- `asced_qual2001`: qualification levels of ASCED, [2001](https://www.abs.gov.au/ausstats/[email protected]/mf/1272.0).
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ Current structures stored in `strayr` are:
- Australian and New Zealand Standard Industrial Classification
(**ANZSIC**), Cat. 1292.0:
- `anzsic2006`: industry levels of ANZSIC, [2006 (Revision
1.0)](https://www.abs.gov.au/ausstats/[email protected]/0/20C5B5A4F46DF95BCA25711F00146D75?opendocument).
2.0)](https://www.abs.gov.au/statistics/classifications/australian-and-new-zealand-standard-industrial-classification-anzsic/2006-revision-2-0).
- Australian Standard Classification of Education (**ASCED**), Cat.
1272.0:
- `asced_foe2001`: field of education levels of ASCED,
Expand Down
101 changes: 66 additions & 35 deletions data-raw/create_anzsic2006.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,47 +2,78 @@
# Reading and cleaning ANZSIC correspondence

library(tidyverse)
library(glue)
library(rvest)

# include factor variants or nah?
include_factor_variants <- FALSE

# ty asiripanich
anzsic_url <- "https://raw.githubusercontent.com/asiripanich/anzsic/master/anzsic_2006.csv"

# Read
anzsic_raw <- read_csv(anzsic_url) %>%
rename_all(~ glue("anzsic_{.}")) %>%
mutate_if(is.double, as.integer) %>%
as_tibble()

# Add layers of nfd
class_nfd <- anzsic_raw %>%
distinct(anzsic_division_title, anzsic_division_code,
anzsic_subdivision_title, anzsic_subdivision_code,
anzsic_group_title, anzsic_group_code) %>%
mutate(anzsic_class_code = anzsic_group_code * 10,
anzsic_class_title = glue("{anzsic_group_title}, nfd"))

group_nfd <- anzsic_raw %>%
distinct(anzsic_division_title, anzsic_division_code,
anzsic_subdivision_title, anzsic_subdivision_code) %>%
mutate(anzsic_group_title = glue("{anzsic_subdivision_title}, nfd"),
anzsic_group_code = anzsic_subdivision_code * 10,
anzsic_class_title = anzsic_group_title,
anzsic_class_code = anzsic_group_code * 10)

subdivision_nfd <- anzsic_raw %>%
group_by(anzsic_division_code, anzsic_division_title) %>%
summarise(anzsic_subdivision_code = min(anzsic_subdivision_code)) %>%
mutate(anzsic_subdivision_title = glue("{anzsic_division_title}, nfd"),
anzsic_group_title = anzsic_subdivision_title,
anzsic_group_code = anzsic_subdivision_code * 10,
anzsic_class_title = anzsic_group_title,
anzsic_class_code = anzsic_group_code * 10)
# fetch from abs website
url <- "https://www.abs.gov.au/statistics/classifications/australian-and-new-zealand-standard-industrial-classification-anzsic/2006-revision-2-0/numbering-system-and-titles/division-subdivision-group-and-class-codes-and-titles"

df <- url %>%
rvest::read_html() %>%
rvest::html_table()

# bind tables together
anzsic_2006_temp <-
purrr::list_rbind(df)

# fix columns names and bind together
colnames(anzsic_2006_temp) <- c("anzsic_division_code", "anzsic_subdivision_code", "anzsic_group_code", "anzsic_class_code", "title")

first_row <-
as.data.frame(t(colnames(df[[1]])))

colnames(first_row) <- c("anzsic_division_code", "anzsic_subdivision_code", "anzsic_group_code", "anzsic_class_code", "title")

anzsic_2006_total <-
dplyr::bind_rows(first_row, anzsic_2006_temp)

# replace blanks with NAs
anzsic_2006_total[anzsic_2006_total == ""] <- NA

# fill NAs down from above
anzsic_2006_fill <-
anzsic_2006_total %>%
tidyr::fill(colnames(anzsic_2006_total), .direction = c("down"))

# get each grouping type individually
anzsic_2006_class <-
anzsic_2006_total %>%
dplyr::filter(stringr::str_detect(anzsic_class_code, "^[:digit:]+$")) %>%
dplyr::select(anzsic_class_code, anzsic_class_title = title)

anzsic_2006_group <-
anzsic_2006_total %>%
dplyr::filter(stringr::str_detect(anzsic_group_code, "^[:digit:]+$")) %>%
dplyr::select(anzsic_group_code, anzsic_group_title = title)

anzsic_2006_subdivision <-
anzsic_2006_total %>%
dplyr::filter(stringr::str_detect(anzsic_subdivision_code, "^[:digit:]+$")) %>%
dplyr::select(anzsic_subdivision_code, anzsic_subdivision_title = title)

anzsic_2006_division <-
anzsic_2006_total %>%
dplyr::filter(stringr::str_detect(anzsic_division_code, "^[:alpha:]+$")) %>%
dplyr::select(anzsic_division_code, anzsic_division_title = title)

# combine grouping types into final table
anzsic_2006_final <-
anzsic_2006_fill %>%
dplyr::left_join(anzsic_2006_division) %>%
dplyr::left_join(anzsic_2006_subdivision) %>%
dplyr::left_join(anzsic_2006_group) %>%
dplyr::left_join(anzsic_2006_class) %>%
dplyr::filter(!is.na(anzsic_class_title)) %>%
dplyr::select(
anzsic_division_code, anzsic_division_title, anzsic_subdivision_code, anzsic_subdivision_title,
anzsic_group_code, anzsic_group_title, anzsic_class_code, anzsic_class_title
) %>%
dplyr::as_tibble()

# Finalise data frame; noting that we are avoiding the nfd complication for now
anzsic2006 <- anzsic_raw %>%
anzsic2006 <- anzsic_2006_final %>%
arrange(anzsic_division_code,
anzsic_subdivision_code,
anzsic_group_code,
Expand Down
Binary file modified data/anzsic2006.rda
Binary file not shown.
13 changes: 8 additions & 5 deletions man/anzsic2006.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions pkgdown/_pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,15 @@ reference:
- starts_with("asced")
- starts_with("asc")
- auholidays
- school_terms
- title: "Importing ABS Data"
desc: >
Functions for retrieving ABS data
contents:
- read_absmap
- get_seifa
- get_seifa_index_sheet
- read_correspondence_tbl
- title: "Helper functions"
desc: >
Functions for cleaning data and working with datasets
Expand Down