Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot Open URL #17

Open
georgeaj opened this issue Jun 11, 2019 · 25 comments
Open

Cannot Open URL #17

georgeaj opened this issue Jun 11, 2019 · 25 comments

Comments

@georgeaj
Copy link

No finreportr functions work when year = 2019. Have tested on multiple companies and multiple years, problem is not company specific and only exists when year = 2019.

GetBalanceSheet('GOOG', 2019)

Error in fileFromCache(file) : Error in download.file(file, cached.file, quiet = !verbose) : cannot open URL 'https://www.sec.gov/Archives/edgar/data/1652044/000165204419000004/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd'

Session Info:

R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.4.0    stringr_1.4.0    dplyr_0.8.1      purrr_0.3.2      readr_1.3.1      tidyr_0.8.3      tibble_2.1.3     ggplot2_3.1.1   
 [9] tidyverse_1.2.1  finreportr_1.0.1 lubridate_1.7.4  rvest_0.3.4      xml2_1.2.0       edgar_2.0.1     

loaded via a namespace (and not attached):
 [1] tidyselect_0.2.5  xfun_0.7          slam_0.1-45       NLP_0.2-0         haven_2.1.0       lattice_0.20-38   colorspace_1.4-1 
 [8] generics_0.0.2    yaml_2.2.0        XML_3.98-1.20     rlang_0.3.4       R.oo_1.22.0       pillar_1.4.1      withr_2.1.2      
[15] glue_1.3.1        R.utils_2.8.0     selectr_0.4-1     readxl_1.3.1      modelr_0.1.4      plyr_1.8.4        cellranger_1.1.0 
[22] munsell_0.5.0     gtable_0.3.0      R.methodsS3_1.7.1 XBRL_0.99.18      qdapRegex_0.7.2   knitr_1.23        tm_0.7-6         
[29] parallel_3.6.0    curl_3.3          broom_0.5.2       Rcpp_1.0.1        backports_1.1.4   scales_1.0.0      jsonlite_1.6     
[36] hms_0.4.2         stringi_1.4.3     grid_3.6.0        cli_1.1.0         tools_3.6.0       magrittr_1.5      lazyeval_0.2.2   
[43] crayon_1.3.4      pkgconfig_2.0.2   assertthat_0.2.1  httr_1.4.0        rstudioapi_0.10   R6_2.4.0          nlme_3.1-139     
[50] compiler_3.6.0   
@dchen728
Copy link

Hi author and georgeaj, is this issue resolved? I am having the same issue and trying to figure out why.
This is a great package and really helpful to pull annual data. Thanks.

@georgeaj
Copy link
Author

georgeaj commented Jun 26, 2019 via email

@dchen728
Copy link

My understanding is that finreportr pulls the data in XML format from SEC and then parse and convert the data into dataframe in R. It would be great if you could make your function into a package since there are very few ways currently available to pull SEC data into R.

@dchen728
Copy link

I haven’t tried it in a few days. I was able to get one company’s 2019 data one time. I suspect it could possibly be that the SEC database can’t handle the amount of requests it gets for current data every day and so it returns nothing. If this is the case then there may not be a solution. After having this problem I wrote my own function to pull the data from the SEC’s excel files that are posted with every filing. I may make it into a package if I get the rest of the kinks out. What method does finreportr use to get the data?

On Jun 26, 2019, at 4:59 PM, dchen728 @.***> wrote: Hi author and georgeaj, is this issue resolved? I am having the same issue and trying to figure out why. This is a great package and really helpful to pull annual data. Thanks. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

My understanding is that finreportr pulls the data in XML format from SEC and then parse and convert the data into dataframe in R. It would be great if you could make your function into a package since there are very few ways currently available to pull SEC data into R.

@sewardlee337
Copy link
Owner

sewardlee337 commented Jun 27, 2019

Hello @georgeaj @dchen728 ,

Thank you very much for reporting this issue. Much apologies for the late reply -- I have been very busy lately.

I will take a look this week, and will be in touch if I need help testing patches for this bug.

@sewardlee337
Copy link
Owner

sewardlee337 commented Jun 27, 2019

A brief update:

From what I'm seeing, the underlying issue appears to be due to something about the way the XBRL package interfaces with EDGAR. When finreportr pulls and parses XBRL-format data from the U.S. Securities and Exchange Commission, it calls the XBRL package function xbrlDoAll().

For example, if you try to run:

## ORCL's 2019 financials
url <- "https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531.xml"

## Call xbrlDoAll(), in verbose mode
XBRL::xbrlDoAll(url, cache.dir='XBRLcache',prefix.out="out",verbose=TRUE)

The printout you receive is:

Downloading to cache dir...trying URL 'https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531.xml'
downloaded 6.2 MB

Schema:  https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531.xsd 
Level: 1 ==> https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531.xsd 
Downloading to cache dir...trying URL 'https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531.xsd'
downloaded 98 KB

Roles
Elements
XBRLcache/orcl-20190531.xsd  ==> Linkbase:  https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531_cal.xml 
Linkbase:  https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531_cal.xml 
Level: 2 ==> https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531_cal.xml 
Downloading to cache dir...trying URL 'https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531_cal.xml'
downloaded 119 KB

Calculations.
XBRLcache/orcl-20190531.xsd  ==> Linkbase:  https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531_def.xml 
Linkbase:  https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531_def.xml 
Level: 2 ==> https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531_def.xml 
Downloading to cache dir...trying URL 'https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531_def.xml'
downloaded 356 KB

Definitions.
XBRLcache/orcl-20190531.xsd  ==> Linkbase:  https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531_lab.xml 
Linkbase:  https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531_lab.xml 
Level: 2 ==> https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531_lab.xml 
Downloading to cache dir...trying URL 'https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531_lab.xml'
downloaded 879 KB

Labels.
XBRLcache/orcl-20190531.xsd  ==> Linkbase:  https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531_pre.xml 
Linkbase:  https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531_pre.xml 
Level: 2 ==> https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531_pre.xml 
Downloading to cache dir...trying URL 'https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/orcl-20190531_pre.xml'
downloaded 643 KB

Presentations.
XBRLcache/orcl-20190531.xsd  ==> Schema:  http://www.xbrl.org/2003/xbrl-instance-2003-12-31.xsd 
Schema:  http://www.xbrl.org/2003/xbrl-instance-2003-12-31.xsd 
Level: 2 ==> http://www.xbrl.org/2003/xbrl-instance-2003-12-31.xsd 
Using file from cache dir...
Elements
XBRLcache/xbrl-instance-2003-12-31.xsd  ==> Schema:  http://www.xbrl.org/2003/xbrl-linkbase-2003-12-31.xsd 
Schema:  http://www.xbrl.org/2003/xbrl-linkbase-2003-12-31.xsd 
Level: 3 ==> http://www.xbrl.org/2003/xbrl-linkbase-2003-12-31.xsd 
Using file from cache dir...
Elements
XBRLcache/xbrl-linkbase-2003-12-31.xsd  ==> Schema:  http://www.xbrl.org/2003/xl-2003-12-31.xsd 
Schema:  http://www.xbrl.org/2003/xl-2003-12-31.xsd 
Level: 4 ==> http://www.xbrl.org/2003/xl-2003-12-31.xsd 
Using file from cache dir...
Elements
XBRLcache/xl-2003-12-31.xsd  ==> Schema:  http://www.xbrl.org/2003/xlink-2003-12-31.xsd 
Schema:  http://www.xbrl.org/2003/xlink-2003-12-31.xsd 
Level: 5 ==> http://www.xbrl.org/2003/xlink-2003-12-31.xsd 
Using file from cache dir...
Elements
XBRLcache/xbrl-linkbase-2003-12-31.xsd  ==> Schema:  http://www.xbrl.org/2003/xlink-2003-12-31.xsd 
Schema:  http://www.xbrl.org/2003/xlink-2003-12-31.xsd 
Already discovered. Skipping
XBRLcache/orcl-20190531.xsd  ==> Schema:  http://www.xbrl.org/2003/xbrl-linkbase-2003-12-31.xsd 
Schema:  http://www.xbrl.org/2003/xbrl-linkbase-2003-12-31.xsd 
Already discovered. Skipping
XBRLcache/orcl-20190531.xsd  ==> Schema:  http://www.xbrl.org/2005/xbrldt-2005.xsd 
Schema:  http://www.xbrl.org/2005/xbrldt-2005.xsd 
Level: 2 ==> http://www.xbrl.org/2005/xbrldt-2005.xsd 
Using file from cache dir...
Elements
XBRLcache/xbrldt-2005.xsd  ==> Schema:  http://www.xbrl.org/2003/xbrl-instance-2003-12-31.xsd 
Schema:  http://www.xbrl.org/2003/xbrl-instance-2003-12-31.xsd 
Already discovered. Skipping
XBRLcache/orcl-20190531.xsd  ==> Schema:  https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd 
Schema:  https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd 
Level: 2 ==> https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd 
Downloading to cache dir...trying URL 'https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd'
Error in fileFromCache(file) : 
  Error in download.file(file, cached.file, quiet = !verbose) : 
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd'

In addition: Warning message:
In download.file(file, cached.file, quiet = !verbose) :
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd': HTTP status was '403 Forbidden'

This issue appears to affect all packages and applications that use this function in the XBRL package. For example: bergant/finstr#12

I will update when I find out more. Thank you so much for your patience.

@dchen728
Copy link

Hey Seward, thanks for the update. Look forward to hearing more updates. I will keep an eye on the XBRL package as well. Thanks again.

@sewardlee337
Copy link
Owner

I have written to the author of the XBRL package to see if he can offer some guidance.

@dchen728
Copy link

dchen728 commented Jul 1, 2019

Thanks for the update.

@enFinExplorer
Copy link

I am having issues with a company that got delisted with a subsequent symbol change (EPE to EPEG).
GetIncome('EPEG', 2019) does not work because because the XML tag has epeg at the end as opposed to the name when it was filed, epe.
'https://www.sec.gov/Archives/edgar/data/1584952/000158495219000003/epeg-20181231.xml'

@shyams80
Copy link

I have written to the author of the XBRL package to see if he can offer some guidance.

Any luck with this?

@selgamal
Copy link

I have written to the author of the XBRL package to see if he can offer some guidance.

Any luck with this?

Please see this SO question , might help hack a solution if this is urgent for you.

@mfarr76
Copy link

mfarr76 commented Dec 7, 2019

I was able to fix the XBRL package with the SO question from above but I ran into another problem where GetFiniancials for a 2019 report year would return the following error:
Error: Result must have length 1011, not 0

After doing some digging it appears that the descriptions of cash flow statements, balance sheets, and income have changed from previous years. The 2019 report I was looking at (symbol "SM") has the following:

CONSOLIDATED BALANCE SHEETS (in thousands, except share data)
CONSOLIDATED STATEMENTS OF CASH FLOWS (in thousands)
CONSOLIDATED STATEMENTS OF OPERATIONS (in thousands, except per share data)

For example, GetIncome only looks for these column headers:

income.descriptions <- c("CONSOLIDATED STATEMENTS OF INCOME", "CONSOLIDATED STATEMENT OF INCOME", "CONSOLIDATED STATEMENTS OF OPERATIONS", "CONSOLIDATED STATEMENT OF OPERATIONS", "CONSOLIDATED STATEMENT OF EARNINGS", "CONSOLIDATED STATEMENTS OF EARNINGS", "INCOME STATEMENTS", "CONSOLIDATED RESULTS OF OPERATIONS")

I made the correction to the descriptions and was able to download the data.

Just thought I would pass along.

@GreenGrassBlueOcean
Copy link

GreenGrassBlueOcean commented Jan 14, 2020

I am also working with this package and observed the same behavior. There are however two distinct problems arising at the same time.

  1. XBRL is not generating correct URLs
  2. The naming of xml files on the sec website has changed recently.

XBRL:
The bug in the XBRL package should be fixed as indicated above (I also found the same solution independently). When working working on windows it is best to rebuild the XBRL package from source with the changed file . The libxml file is not available for the compiler and should best be added according to this url: https://stackoverflow.com/questions/39568937/how-to-create-cran-ready-r-package-that-has-external-dependency-libxml2

SEC NAMES:
when running (with the fixed XBRL package) the following code:

Income <- finreportr::GetIncome("SBUX", 2019)

results in the following error

Error in fileFromCache(file.inst) : 
  Error in download.file(file, cached.file, quiet = !verbose) : 
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/829224/000082922419000051/sbux-20190929.xml'

if I then check if this file is present on the Edgar website:
https://www.sec.gov/Archives/edgar/data/829224/000082922419000051/

I notice that the file sbux-20190929.xml is not present on EDGAR.
I checked this also for other companies like Google and Boeing and they all observer the same behavior.

When I then try to find what should be the correct name using the edgarWebR (https://github.com/mwaldstein/edgarWebR) package:

FilingsonEdgar <- edgarWebR::company_filings(x = "SBUX", type = "10-K")
DocumentsonEdgar <-  edgarWebR::filing_documents(x = test$href[1])
link <- DocumentsonEdgar[DocumentsonEdgar[5] == 'XML', 4]

I get the following URL:

https://www.sec.gov/Archives/edgar/data/12927/000001292719000077/a201909sep3010-q_htm.xml

When passing this URL to the revised XBRL package

xbrl.vars <- XBRL::xbrlDoAll(link, verbose=TRUE)

it downloads the data correctly.

CONCLUSION:
The finreportr::GetIncome function generates the wrong URL for using in XBRL.
When using the debug statement in Rstudio for finreportr::GetIncome you end up in the function finreportr::GetFinancial which has a in-function helper function called GetURL which generates a static URL. I would propose to replace the GetURL function with a slight adaptation of the above mentioned way to retrieve the right url by using the edgarWebR package

@Handiel
Copy link

Handiel commented Apr 18, 2020

Hello, I am trying to load J.P. Morgan income statement, but I get the following error, could you help me with some solution. Thanks in advance.

>GetIncome("JPM", 2019)

Error in fileFromCache(file.inst) : Error in download.file(file, cached.file, quiet = !verbose) : no fue posible abrir la URL 'https://www.sec.gov/Archives/edgar/data/19617/000001961719000054/jpm-20181231.xml'_

@IEORTools
Copy link

I am also working with this package and observed the same behavior. There are however two distinct problems arising at the same time.

  1. XBRL is not generating correct URLs
  2. The naming of xml files on the sec website has changed recently.

XBRL:
The bug in the XBRL package should be fixed as indicated above (I also found the same solution independently). When working working on windows it is best to rebuild the XBRL package from source with the changed file . The libxml file is not available for the compiler and should best be added according to this url: https://stackoverflow.com/questions/39568937/how-to-create-cran-ready-r-package-that-has-external-dependency-libxml2

SEC NAMES:
when running (with the fixed XBRL package) the following code:

Income <- finreportr::GetIncome("SBUX", 2019)

results in the following error

Error in fileFromCache(file.inst) : 
  Error in download.file(file, cached.file, quiet = !verbose) : 
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/829224/000082922419000051/sbux-20190929.xml'

if I then check if this file is present on the Edgar website:
https://www.sec.gov/Archives/edgar/data/829224/000082922419000051/

I notice that the file sbux-20190929.xml is not present on EDGAR.
I checked this also for other companies like Google and Boeing and they all observer the same behavior.

When I then try to find what should be the correct name using the edgarWebR (https://github.com/mwaldstein/edgarWebR) package:

FilingsonEdgar <- edgarWebR::company_filings(x = "SBUX", type = "10-K")
DocumentsonEdgar <-  edgarWebR::filing_documents(x = test$href[1])
link <- DocumentsonEdgar[DocumentsonEdgar[5] == 'XML', 4]

I get the following URL:

https://www.sec.gov/Archives/edgar/data/12927/000001292719000077/a201909sep3010-q_htm.xml

When passing this URL to the revised XBRL package

xbrl.vars <- XBRL::xbrlDoAll(link, verbose=TRUE)

it downloads the data correctly.

CONCLUSION:
The finreportr::GetIncome function generates the wrong URL for using in XBRL.
When using the debug statement in Rstudio for finreportr::GetIncome you end up in the function finreportr::GetFinancial which has a in-function helper function called GetURL which generates a static URL. I would propose to replace the GetURL function with a slight adaptation of the above mentioned way to retrieve the right url by using the edgarWebR package

In the GetFinancial function there is the GetURL function which I believe is the issue. The inst.url string object is created with finishing with the report.period. Apparently now EDGAR has created string endings to the xml file (examples include cal,def,lab,pre) that need to be added in to the inst.url string. I don't know how many suffixes are enumerated. It looks like the @GreenGrassBlueOcean has different string endings on the xml file.

 ##   Function to acquire Instance Document URL
 GetURL <- function(symbol, year) {
      
      lower.symbol <- tolower(symbol)
      
      accession.no.raw <- GetAccessionNo(symbol, year, foreign = FALSE)
      accession.no <- gsub("-", "" , accession.no.raw)
      
      CIK <- CompanyInfo(symbol)
      CIK <- as.numeric(CIK$CIK)
      
      report.period <- ReportPeriod(symbol, CIK, accession.no, accession.no.raw)
      report.period <- gsub("-", "" , report.period)
      
      inst.url <- paste0("https://www.sec.gov/Archives/edgar/data/", CIK, "/", 
                         accession.no, "/", lower.symbol, "-", report.period, ".xml")
      return(inst.url)
 }

@L-plate-coder
Copy link

L-plate-coder commented Sep 15, 2020

FilingsonEdgar <- edgarWebR::company_filings(x = "SBUX", type = "10-K") DocumentsonEdgar <- edgarWebR::filing_documents(x = test$href[1]) link <- DocumentsonEdgar[DocumentsonEdgar[5] == 'XML', 4]

Returned:
Error in edgarWebR::filing_documents(x = test$href[1]) : object 'test' not found

But I knew where the file was so wrote the URL manually (and checked it many times).

All good until I got to the xbrl.vars <- XBRL::xbrlDoAll(link, verbose=TRUE)

Whereupon, same as mentioned before:

`..trying URL 'https://www.sec.gov/Archives/edgar/data/1800/000110465920023904/https://xbrl.sec.gov/dei/2019/dei-2019-01-31.xsd'
Error in fileFromCache(file) :
Error in download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1800/000110465920023904/https://xbrl.sec.gov/dei/2019/dei-2019-01-31.xsd'

In addition: Warning message:
In download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1800/000110465920023904/https://xbrl.sec.gov/dei/2019/dei-2019-01-31.xsd': HTTP status was '404 Not Found'`

Going to carry on trying to find some answers but any ideas welcome.

@enFinExplorer
Copy link

enFinExplorer commented Sep 15, 2020 via email

@jwozny
Copy link

jwozny commented Feb 8, 2021

This is an issue with the XBRL library when the Schema URL is HTTPS.
Specifically this part of the XBRL/R/XBRL.R file in the library (sourced from https://cran.r-project.org/web/packages/XBRL/index.html):

  fixFileName <- function(dname, file.name) {
    if (substr(file.name, 1, 5) != "http:") {
      if (substr(file.name, 1, 5) == "../..") { ## A better solution is preferred, but it works for now
        file.name <- paste0(dirname(dirname(dname)), "/",  substr(file.name, 7, nchar(file.name)))
      } else if (substr(file.name, 1, 2) == "..") {
        file.name <- paste0(dirname(dname), "/", substr(file.name, 4, nchar(file.name)))
      } else {
        file.name <- paste0(dname,"/", file.name)
      }
    }
    file.name
  }

It checks that if the URL doesn't start with "http:" then it starts modifying it. It prepends the file name with the parent directory of the original request:

dname = 'https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/'
file.name = 'https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd'
fixFileName returns 'https://www.sec.gov/Archives/edgar/data/1341439/000156459019023119/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd'

I'm new to R, but looking into how to recompile the library and force it to use the fixed version of the file.

@mmbostwick
Copy link

mmbostwick commented Feb 23, 2021

Is it possible this is the same issue? Looks I'm getting a "doubled" URL - HTTP status was '404 Not Found'

Thanks for your time!

> GetIncome('GOOG', 2019)
Error in fileFromCache(file) : 
  Error in download.file(file, cached.file, quiet = !verbose) : 
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/1652044/000165204419000004/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd'

In addition: Warning message:
In download.file(file, cached.file, quiet = !verbose) :
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/1652044/000165204419000004/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd': HTTP status was '404 Not Found'

Versioning:

# R --version
R version 4.0.4 (2021-02-15) -- "Lost Library Book"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
https://www.gnu.org/licenses/.

# uname -a
Linux t470p 5.4.0-65-generic #73~18.04.1-Ubuntu SMP Tue Jan 19 09:02:24 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux`

@DataScienceProjectsJapan

Seems like the package doesn't work anymore.
Most of the tutorial here doesn't work at all
https://rpubs.com/rwalkerWU/Usingfinreportr

for example
JPM.IS <- GetIncome("JPM", 2015)
JPM.BS <- GetBalanceSheet("JPM", 2015)
JPM.SCF <- GetCashFlow("JPM", 2015)
AnnualReports("JPM") #sometimes works

Trying any number of commands gives
Error in open.connection(x, "rb") : HTTP error 403.
or
Error in fileFromCache(file.inst) :
Error in download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1326801/000132680116000043/fb-20151231.xml'

In addition: Warning messages:
1: closing unused connection 4 (http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=JPM&type=10-k&dateb=&owner=exclude&count=100)
2: closing unused connection 3 (http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=JPM&type=10-k&dateb=&owner=exclude&count=100)
3: In download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1326801/000132680116000043/fb-20151231.xml': HTTP status was '403 Forbidden'

@IEORTools
Copy link

Most of the '403 Forbidden' cannot open URL errors are due to SEC EDGAR requiring a user agent authentication. I was able to fix for XBRL package with the following. Insert your own name and email in the string.

options(HTTPUserAgent = "yourname [email protected]")

@coleburdette
Copy link

Most of the '403 Forbidden' cannot open URL errors are due to SEC EDGAR requiring a user agent authentication. I was able to fix for XBRL package with the following. Insert your own name and email in the string.

options(HTTPUserAgent = "yourname [email protected]")

This worked for me! Thanks @IEORTools

@IEORTools
Copy link

Here is a possible solution by editing the XBRL source code to fix the URL issue

https://stackoverflow.com/questions/53651481/schema-file-does-not-exist-in-xbrl-parse-file

@L-plate-coder
Copy link

L-plate-coder commented Nov 13, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests