Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib2.HTTPError: HTTP Error 404: Not Found #16

Open
carcinocron opened this issue May 10, 2014 · 3 comments
Open

urllib2.HTTPError: HTTP Error 404: Not Found #16

carcinocron opened this issue May 10, 2014 · 3 comments

Comments

@carcinocron
Copy link

Fair Warning: my version of the script is modified, and these modifications were my first attempt ever at Python, but I this stack trace is similar enough to #10 that it probably affects the original code, too.

I keep getting this error:

Traceback (most recent call last):
  File "redditdownload.py", line 212, in <module>
    URLS = extract_urls(ITEM['url'])
  File "redditdownload.py", line 137, in extract_urls
    urls = process_imgur_url(url)
  File "redditdownload.py", line 111, in process_imgur_url
    return extract_imgur_album_urls(url)
  File "redditdownload.py", line 29, in extract_imgur_album_urls
    response = urlopen(album_url)
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 410, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 448, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found

I think this is caused when running the code without the --update tag and the script reaches the absolute last entry in the sub's list of posts. I think it is specifically 404'ing on the URL of the "next page".

Other than that, all the images that I would reasonably expect to have successfully downloaded, seem to be successfully downloading.

@carcinocron
Copy link
Author

I could be totally wrong though. My logs don't show anything. While writing this, I realized this:

                    # Download the image
                    download_from_url(URL, FILEPATH)

                    # Image downloaded successfully!
                    print '    Downloaded URL [%s] as [%s].' % (URL.encode('utf-8'), FILENAME.encode('utf-8'))
                    DOWNLOADED += 1
                    FILECOUNT += 1

Which means that a failed download wouldn't get logged anyways, because the URL is logged after the download is successful. I just made the following change:

                    print '    Attempting to Download URL [%s] as [%s].' % (URL.encode('utf-8'), FILENAME.encode('utf-8'))

                    # Download the image
                    download_from_url(URL, FILEPATH)

                    # Image downloaded successfully!
                    print '    Downloaded URL [%s] as [%s].' % (URL.encode('utf-8'), FILENAME.encode('utf-8'))
                    DOWNLOADED += 1
                    FILECOUNT += 1

Now my logs should have the URL stated before the download attempt fails (and all the evidence is lost)

So that hopefully I can follow up with better information

@ghost ghost self-assigned this Oct 7, 2014
@ghost
Copy link

ghost commented Oct 12, 2014

That seems like a reasonable change, I've added it to the script now. Did you get any further tracking down your 404 issue?

@emacsomancer
Copy link

I haven't modified the code in any way, and I get this too.

Traceback (most recent call last):
  File "/home/username/Apps/RedditImageGrab/redditdownload.py", line 268, in <module>
    URLS = extract_urls(ITEM['url'])
  File "/home/usernameApps/RedditImageGrab/redditdownload.py", line 197, in extract_urls
    urls = process_deviant_url(url)
  File "/home/username/Apps/RedditImageGrab/redditdownload.py", line 167, in process_deviant_url
    response = urlopen(url)
  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 437, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 550, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 475, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 558, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
   urllib2.HTTPError: HTTP Error 403: Forbidden

I got it from running:

python2 redditdownload.py -sfw FractalPorn /home/username/WALLPAPER -score 50 

rachmadaniHaryono referenced this issue in rachmadaniHaryono/RedditImageGrab Aug 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants