Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_abs trips over for occasionally quarterly data #167

Open
mcooganj opened this issue Aug 29, 2021 · 6 comments
Open

read_abs trips over for occasionally quarterly data #167

mcooganj opened this issue Aug 29, 2021 · 6 comments

Comments

@mcooganj
Copy link

There are a number of publications that include quarterly data from time to time. For example, the retail sales publication has real tables four times per year.

There may be a way to set the vintage of the release, but I couldn't find it. I had thought that perhaps looking up by series_id would work. I would imagine that there's a look-up table at some point that turns the series_id into a release-table pair.

Perhaps in the case that it's an occasionally-quarterly release, it could re-direct to the most recent quarterly publication?

R> read_abs(series_id="A3349269F")
Finding URLs for tables corresponding to ABS series ID
Attempting to download files from series ID , Retail Trade, Australia
Downloading https://www.abs.gov.au/statistics/industry/retail-and-wholesale-trade/retail-trade-australia/latest-release/850107.xls
trying URL 'https://www.abs.gov.au/statistics/industry/retail-and-wholesale-trade/retail-trade-australia/latest-release/850107.xls'
Error in utils::download.file(url = url, destfile = destfile, mode = "wb", :
cannot open URL 'https://www.abs.gov.au/statistics/industry/retail-and-wholesale-trade/retail-trade-australia/latest-release/850107.xls'
In addition: Warning message:
In utils::download.file(url = url, destfile = destfile, mode = "wb", :
cannot open URL 'https://www.abs.gov.au/statistics/industry/retail-and-wholesale-trade/retail-trade-australia/latest-release/850107.xls': HTTP status was '404 Not Found'

@MattCowgill
Copy link
Owner

Thanks @mcooganj I'll look into this

@MattCowgill
Copy link
Owner

PS There's no series ID-release table lookup table within the package; if you request a series ID, the package queries an ABS API (the Time Series Directory) to find the corresponding release table

@Henry-DJPR
Copy link
Contributor

I'm having a similar problem with detailed labour force quarterly ANZSIC tables. They appear to be valid timeseries sheets but they aren't in the time series directory. Here's an example of a missing table. Doesn't work when I look for it specifically, download everything or looks for one of its component series.
read_abs("6291.0.55.001", 4)
read_abs("6291.0.55.001") %>% count(table_title)
read_abs(series_id = "A84090257V")
The problem appears to be that they straight up aren't in the directory:
shell.exec("https://abs.gov.au/servlet/TSSearchServlet?sid=A84090257V")
I wanted to check that I haven't missed something simple before I contact the ABS?

@MattCowgill
Copy link
Owner

MattCowgill commented Apr 28, 2022

Thanks @Henry-DJPR. You're right, this is a problem on the ABS side, the series have disappeared from the Time Series Directory. I will contact them now, I'm in regular contact with the people who maintain the TSD.

I realise this isn't ideal, but a workaround is to do:

download_abs_data_cube("labour-force-australia-detailed",
                       "6291004") %>% 
  read_abs_local(filenames = .) 

@MattCowgill
Copy link
Owner

@Henry-DJPR, the problem with the ABS Time Series Directory appears to have been resolved. This now works:

readabs::read_abs("6291.0.55.001", "4")
#> Finding URLs for tables corresponding to ABS catalogue 6291.0.55.001
#> Attempting to download files from catalogue 6291.0.55.001, Labour Force, Australia, Detailed
#> Downloading https://www.abs.gov.au/statistics/labour/employment-and-unemployment/labour-force-australia-detailed/latest-release/6291004.xlsx
#> Extracting data from downloaded spreadsheets
#> Tidying data from imported ABS spreadsheets
#> # A tibble: 9,000 × 12
#>    table_no sheet_no table_title  date       series  value series_type data_type
#>    <chr>    <chr>    <chr>        <date>     <chr>   <dbl> <chr>       <chr>    
#>  1 6291004  Data1    Table 04. E… 1984-11-01 Agric…   NA   Trend       STOCK    
#>  2 6291004  Data1    Table 04. E… 1984-11-01 Agric…  403.  Seasonally… STOCK    
#>  3 6291004  Data1    Table 04. E… 1984-11-01 Agric…  411.  Original    STOCK    
#>  4 6291004  Data1    Table 04. E… 1984-11-01 Minin…   NA   Trend       STOCK    
#>  5 6291004  Data1    Table 04. E… 1984-11-01 Minin…   94.8 Seasonally… STOCK    
#>  6 6291004  Data1    Table 04. E… 1984-11-01 Minin…   94.1 Original    STOCK    
#>  7 6291004  Data1    Table 04. E… 1984-11-01 Manuf…   NA   Trend       STOCK    
#>  8 6291004  Data1    Table 04. E… 1984-11-01 Manuf… 1096.  Seasonally… STOCK    
#>  9 6291004  Data1    Table 04. E… 1984-11-01 Manuf… 1099.  Original    STOCK    
#> 10 6291004  Data1    Table 04. E… 1984-11-01 Elect…   NA   Trend       STOCK    
#> # … with 8,990 more rows, and 4 more variables: collection_month <chr>,
#> #   frequency <chr>, series_id <chr>, unit <chr>

Created on 2022-05-02 by the reprex package (v2.0.1)

@Henry-DJPR
Copy link
Contributor

Brilliant! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants