Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in download file #12

Open
ABSYKS opened this issue Feb 10, 2019 · 8 comments
Open

Error in download file #12

ABSYKS opened this issue Feb 10, 2019 · 8 comments

Comments

@ABSYKS
Copy link

ABSYKS commented Feb 10, 2019

Hi, I'm having issue extracting values from Starbuck's latest 10-k.

library(XBRL)
xbrl_sbux <- "https://www.sec.gov/Archives/edgar/data/829224/000082922418000052/sbux-20180930.xml"
old_o <- options(stringsAsFactors = FALSE)
xbrl_data <- xbrlDoAll(xbrl_sbux)
options(old_o)

Trace:

Error in fileFromCache(file) : 
  Error in download.file(file, cached.file, quiet = !verbose) : 
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/829224/000082922418000052/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd'

In addition: Warning message:
In download.file(file, cached.file, quiet = !verbose) :
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/829224/000082922418000052/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd': HTTP status was '403 Forbidden'

It seems like two links are being incorrectly appended but not sure why, any help would be great, thanks!

@mattrdini
Copy link

Having the exact same problem for most of my attempts can't figure out if the problem is with finstr or with SEC. The error happens mostly for 2018 filings (haven't seen an issue with any other year yet) and always with the same 'dei' file. Thanks for your time.

@enFinExplorer
Copy link

I was working through the old bergrant/XBRLFiles workflow for a 2019 company to try and figure out the issue. It seems that I am getting tons of duplicates so that the spread statements do not work. I was able to do it manually by using dplyr::distinct, so figure someone can go through and update the finstr package for the fix (I'm still fairly new so no clue how to do that and don't want to break anything).

`
pres_df_num <- pres_df_num %>%
filter(is.na(dimension1)) %>%
filter(!is.na(endDate)) %>%
select(elOrder, contains("level"), elementId, fact, decimals, endDate) %>%
mutate( fact = as.numeric(fact) * 10^as.numeric(decimals)) %>% filter(!is.na(fact))
pres_df_num <- distinct(pres_df_num)

pres_df_num <- pres_df_num %>%
spread(endDate, fact ) %>%
arrange(elOrder)

pres_df_num <- distinct(pres_df_num)

#library(pander)
pres_df_num %>%
select(elementId, contains("2019"), contains("2018")) %>% distinct() %>%
pandoc.table(
style = "rmarkdown",
split.table = 200,
justify = c("left", "right", "right")
)`

@eoziolor
Copy link

eoziolor commented May 7, 2020

Hey @enFinExplorer , would you mind explaining a bit how to implement this (manually for now) to fix the issue with wrong url generation?

@enFinExplorer
Copy link

One thing that I figured out is you have to manually go and install the .xsd schema pages into your xbrl cache. Otherwise, I'll have to go through and look some of my older code to see if I still have it somewhere.

@eoziolor
Copy link

eoziolor commented May 7, 2020

One thing that I figured out is you have to manually go and install the .xsd schema pages into your xbrl cache. Otherwise, I'll have to go through and look some of my older code to see if I still have it somewhere.

Oh no worries, it's just a curiosity for me. I saw another response to the linked issue that may lead to the correct xbrl download link. I'll give it a shot first and see if it might be an automated solution. I'm trying to write a quick app to pull and do some basic financial stats on a variety of companies, but as I see everyone has figured, it's not that easy to do this across time :D

Thanks for the pointer!

@jwozny
Copy link

jwozny commented Feb 8, 2021

This is an issue with the XBRL library when the Schema URL is HTTPS.
Specifically this part of the XBRL/R/XBRL.R file in the library (sourced from https://cran.r-project.org/web/packages/XBRL/index.html):

  fixFileName <- function(dname, file.name) {
    if (substr(file.name, 1, 5) != "http:") {
      if (substr(file.name, 1, 5) == "../..") { ## A better solution is preferred, but it works for now
        file.name <- paste0(dirname(dirname(dname)), "/",  substr(file.name, 7, nchar(file.name)))
      } else if (substr(file.name, 1, 2) == "..") {
        file.name <- paste0(dirname(dname), "/", substr(file.name, 4, nchar(file.name)))
      } else {
        file.name <- paste0(dname,"/", file.name)
      }
    }
    file.name
  }

It checks that if the URL doesn't start with "http:" then it starts modifying it. It prepends the file name with the parent directory of the original request:

dname = 'https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/'
file.name = 'https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd'
fixFileName returns 'https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd'

I'm new to R, but looking into how to recompile the library and force it to use the fixed version of the file.

@xbrl-data
Copy link

Just a note; is anyone having 403 errors trying to use xbrlDoAll these days or is the SEC just mad at me for using this so much? My quick workaround was to install the recount package in R and use download_retry instead of download.file within the the actual XBRL package functions.

@IEORTools
Copy link

A potential fix to the 403 errors that @jwozny brought up with the XBRL URL issue.
https://stackoverflow.com/questions/53651481/schema-file-does-not-exist-in-xbrl-parse-file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants