Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent ABS URLs for previous releases #264

Open
Henry-DJPR opened this issue Jan 15, 2025 · 5 comments
Open

Inconsistent ABS URLs for previous releases #264

Henry-DJPR opened this issue Jan 15, 2025 · 5 comments

Comments

@Henry-DJPR
Copy link
Contributor

Hey Matt. release_date in read_abs doesn't work for some time series catalogues. I was trying to get a historical edition of Victorian GSP which has a financial year release. Here's the code that errors out:

read_abs(series_id = "A2478275V", release_date = "2023-11-21")

The error occurs because because lines 274 to 279 of R/read_abs.R convert latest release to a month code instead of a financial year code:

  • Original: .../australian-national-accounts-state-accounts/latest-release/5220003_annual_vic.xlsx
  • Converted: .../australian-national-accounts-state-accounts/nov-2023/5220003_annual_vic.xlsx
  • Correct: .../australian-national-accounts-state-accounts/2022-23-financial-year/5220003_annual_vic.xlsx

I started making a pull request which replaced 'latest-release' with something different based on the time series directory frequency and series end (to differentiate calendar year from financial year and different versions of quarterly) but quickly realised that the ABS have inconsistent naming schemes for previous releases. For example, state account financial years are formatted as 2022-23-financial-year whereas financial year supplementary trade is 2021-22.

At the moment, I cannot conceive of a good fix to this problem short of indexing all previous releases somewhere. I think the best way forward would be to add error handling and a tweak to the documentation to note that release_date only works for monthly releases. As far as can tell, none of the new ABS APIs support historical releases and the time series directory only points to the latest version.

@MattCowgill
Copy link
Owner

Thanks @Henry-DJPR. This is frustrating - there are many inconsistencies like this.

I agree that the option you've identified is the least bad.

In theory I could index all previous releases but I worry that would be brittle - the ABS could (for example) change their naming convention for future releases. I'll add this.

@MattCowgill
Copy link
Owner

@Henry-DJPR FYI I've added notes in the documentation, but no additional error handling. Grateful for suggestions as to how you think this should work.

@MattCowgill
Copy link
Owner

Hi @Henry-DJPR do you have a suggestion about how this should function?
I've updated the docs. The inconsistency here is on the ABS side, rather than the readabs package itself.

In hindsight I think I should have introduced the release_date functionality in a new function that was labelled experimental, as it is less reliable than the core read_abs() function.

@Henry-DJPR
Copy link
Contributor Author

Hey Matt, sorry I didn't get to this - I've just been on leave. My suggestion would be checking the url for a 404 if release_date is not null. This will add some execution time when using release_date but prevent ambiguous failure. Another option could be to let users enter their own release url component eg. release_date = "2022-23-financial-year". This would only be used if release_date is a character and not coercible to a date.

@MattCowgill
Copy link
Owner

Cheers @Henry-DJPR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants