Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloads failing, but work with browser user agent #118

Closed
rushgeo opened this issue May 7, 2021 · 16 comments
Closed

Downloads failing, but work with browser user agent #118

rushgeo opened this issue May 7, 2021 · 16 comments

Comments

@rushgeo
Copy link

rushgeo commented May 7, 2021

I'm having intermittent problems downloading through tigris. Sometimes all three download attempts fail, and other times they succeed. When they fail, the output file in the cache directory will either be zero bytes, or a very short HTML error:

<HTML><HEAD><TITLE>Error</TITLE></HEAD><BODY>
An error occurred while processing your request.<p>
Reference&#32;&#35;97&#46;3f4a0760&#46;1620408093&#46;140630ae
</BODY></HTML>

Inspired by the discussion here, I added a browser user agent to the downloads. Specifically, I added:
user_agent("Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0")
in every GET() call in tigris:::load_tiger

This seems to work every time, but I suppose I can't be 100% certain the user agent is doing the trick when there is still intermittent success without the patch.

Still, I wonder if it's worth either:

  1. following up with someone at the Census to ask about if their CDN or policies could be impacting downloads, or
  2. using a user agent in tigris either all of the time, or after the first failed download attempt.
@walkerke
Copy link
Owner

walkerke commented May 7, 2021

We've gotten a number of error reports about this in the past week; my best guess is that the Census website is undergoing some maintenance or is having some issues. @loganpowell - do you have any thoughts on @rushgeo's suggestion?

@loganpowell
Copy link

Hi friends. If you're making a lot of calls to any Census address, there's a default policy that will block your IP. If you've been able to make successful gets and then - all of a sudden - are getting errors and then aren't able to get successfully after receiving the error the first time, this is probably happening to you. I have to do heavy pulls using wget sometimes. In order to do so, I usually try to do it from a "throw-away" IP address (via VPN) and do everything in one sitting. Our Akamai caching layer will institute the block after some unknown time (within hours).

@rushgeo
Copy link
Author

rushgeo commented May 7, 2021

This doesn't sound like the scenario I'm experiencing. I'm having this happen from my first attempt on a new machine, and I'm also having intermittent success after previously having errors on another machine.

@loganpowell
Copy link

In that case, it's unrelated to the issue referenced. What are the addresses tigris accesses?

@rushgeo
Copy link
Author

rushgeo commented May 7, 2021

I've mostly been downloading tracts, which for 2010 come from https://www2.census.gov/geo/tiger/TIGER2010/TRACT/2010/ if the cartographic boundary files aren't requested instead.
The code that builds the URL is here.

@loganpowell
Copy link

Sorry for the delayed response. Are you still experiencing this issue?

@profLuna
Copy link

Not sure if this is the same problem, but I have recently had trouble downloading county subdivisions. The following fails:

ma_towns_sf <- county_subdivisions(state = "MA", cb = TRUE)

I get the following message:

Using FIPS code '25' for state 'MA'
error 1 in extracting from zip fileCannot open layer cb_2019_25_cousub_500k
Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
Opening layer failed.

No problem accessing states. Just county subdivisions and smaller geographies, and sometimes it works. Using tigris version 1.4

@walkerke
Copy link
Owner

@profLuna I just tested - it is working for me on my local version of R. I've also tested on my server version of R which took a little while to connect to the Census website but is working too. Are you running a server version of R? Downloads seem to fail more frequently there. I'd also always recommend using options(tigris_use_cache = TRUE) to build a local cache rather than relying on data downloads.

@profLuna
Copy link

@walkerke Thanks for the quick response. I am running a local version of R. Tried doing with and without a VPN, but same response. Definitely will set local cache to TRUE, although I'm stuck at the moment. Still weird because states and tracts work without a problem. It just seems to be county_subdivisions.

@ricobert1
Copy link

Hi, I can confirm this same behavior and the issue is ongoing.

Specifically, the link specified, for instance, by block_groups is valid for downloading when pasted into a browser. However, from the R environment it fails to download.

@walkerke
Copy link
Owner

This one's a little tricky to test as I can't reproduce the error; however I'm wondering if heavy use of tigris temporarily clogs certain datasets on the Census website. For example, if I run:

> httr:::default_ua()
[1] "libcurl/7.58.0 r-curl/4.3.1 httr/1.4.2"

It's possible then that many R users are sending the same user agent to the Census website and intermittently blocking it, given that this user agent will be identical across tigris users with those versions. I'll do some more research on this.

@loganpowell
Copy link

loganpowell commented May 30, 2021 via email

@jzadra
Copy link

jzadra commented Jun 17, 2021

I am having the same issue. I can get states and block groups, but zctas fail:

zctas()
Previous download failed.  Re-download attempt 1 of 3...
Previous download failed.  Re-download attempt 2 of 3...
Previous download failed.  Re-download attempt 3 of 3...
Error: Download failed; check your internet connection or the status of the Census Bureau website
                 at http://www2.census.gov/geo/tiger/.

It's been several months since I used tigris. At first I got the following:

ZCTAs can take several minutes to download.  To cache the data and avoid re-downloading in future R sessions, set `options(tigris_use_cache = TRUE)`
Error: Cannot open "/private/var/folders/5_/l71sk6kn29z17n011g8kld5m0000gp/T/Rtmp8guaZD"; The source could be corrupt or not supported. See `st_drivers()` for a list of supported formats.
In addition: Warning message:
In unzip(file_loc, exdir = tmp) : error 1 in extracting from zip file

I then removed tigris and reinstalled from github, and now get the download error.

EDIT:

I tried to get zctas again just a minute after posting this, and it worked.

@pdeshlab
Copy link

pdeshlab commented Aug 3, 2021

I am dealing with the same zctas error mentioned above:

zctas <- tigris::zctas()  

# error: Download failed; check your internet connection or the status of the Census Bureau website
Previous download failed.  Re-download attempt 1 of 3...
Previous download failed.  Re-download attempt 2 of 3...
Previous download failed.  Re-download attempt 3 of 3...
Error: Download failed; check your internet connection or the status of the Census Bureau website
                 at http://www2.census.gov/geo/tiger/.

I've been experiencing it for about 24 hours, but am not sure if it takes more time for someone to be unblocked if they've made multiple requests. Like jzadra pointed out, zctas seems to be the only geometry affected by this error, but again, I'm not sure if that's because it is the geometry I've been querying most frequently.

@walkerke
Copy link
Owner

walkerke commented Aug 3, 2021

I just ran zctas() successfully. I would strongly recommend using shapefile caching with options(tigris_use_cache = TRUE) if you are frequently requesting ZCTAs. This will store the shapefile on your computer and use the local cache instead of downloading from the Census website each time and risking this issue.

@pdeshlab
Copy link

pdeshlab commented Aug 3, 2021

I'm definitely going to use options(tigris_use_cache = TRUE) in the future, but unfortunately, I didn't use that option when I was first scripting. Do you happen to know how long it usually takes for the issue to resolve itself?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants