Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scprep.io.download.download_google_drive failing due to google api changes #132

Open
atong01 opened this issue Jun 27, 2022 · 0 comments
Open
Labels

Comments

@atong01
Copy link
Member

atong01 commented Jun 27, 2022

A report by Erica in the Help Slack shows that the scprep google drive downloads are breaking in the workshop notebooks. Example:

# download the data from Google Drive
scprep.io.download.download_google_drive("1QGkqL_FF7iveR1TLZ8HJKBANOmugBxlm",
                                         "retinal_bipolar.zip")
scprep.io.download.unzip("retinal_bipolar.zip")

Fails with

---------------------------------------------------------------------------
BadZipFile                                Traceback (most recent call last)
[<ipython-input-3-f00e6d2b86fe>](https://gfr362bpfk-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20220623-060059-RC00_456727846#) in <module>()
      2 scprep.io.download.download_google_drive("1QGkqL_FF7iveR1TLZ8HJKBANOmugBxlm",
      3                                          "retinal_bipolar.zip")
----> 4 scprep.io.download.unzip("retinal_bipolar.zip")

2 frames
[/usr/lib/python3.7/zipfile.py](https://gfr362bpfk-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20220623-060059-RC00_456727846#) in _RealGetContents(self)
   1323             raise BadZipFile("File is not a zip file")
   1324         if not endrec:
-> 1325             raise BadZipFile("File is not a zip file")
   1326         if self.debug > 1:
   1327             print(endrec)

BadZipFile: File is not a zip file

Examining a bit more, we find that the "zip file" downloaded is not a zip file, but a virus scan warning HTML.

!cat retinal_bipolar.zip

<!DOCTYPE html><html><head><title>Google Drive - Virus scan warning</title><meta http-equiv="content-type" content="text/html; charset=utf-8"/><style nonce="xG6PSW5r7g0D-MRjK19yow">/* Copyright 2022 Google Inc. All Rights Reserved. */
.goog-inline-block{position:relative;display:-moz-inline-box;display:inline-block}* html .goog-inline-block{display:inline}*:first-child+html .goog-inline-block{display:inline}.goog-link-button{position:relative;color:#15c;text-decoration:underline;cursor:pointer}.goog-link-button-disabled{color:#ccc;text-decoration:none;cursor:default}body{color:#222;font:normal 13px/1.4 arial,sans-serif;margin:0}.grecaptcha-badge{visibility:hidden}.uc-main{padding-top:50px;text-align:center}#uc-dl-icon{display:inline-block;margin-top:16px;padding-right:1em;vertical-align:top}#uc-text{display:inline-block;max-width:68ex;text-align:left}.uc-error-caption,.uc-warning-caption{color:#222;font-size:16px}#uc-download-link{text-decoration:none}.uc-name-size a{color:#15c;text-decoration:none}.uc-name-size a:visited{color:#61c;text-decoration:none}.uc-name-size a:active{color:#d14836;text-decoration:none}.uc-footer{color:#777;font-size:11px;padding-bottom:5ex;padding-top:5ex;text-align:center}.uc-footer a{color:#15c}.uc-footer a:visited{color:#61c}.uc-footer a:active{color:#d14836}.uc-footer-divider{color:#ccc;width:100%}</style><link rel="icon" href="null"/></head><body><div class="uc-main"><div id="uc-dl-icon" class="image-container"><div class="drive-sprite-aux-download-file"></div></div><div id="uc-text"><p class="uc-warning-caption">Google Drive can't scan this file for viruses.</p><p class="uc-warning-subcaption"><span class="uc-name-size"><a href="/open?id=1QGkqL_FF7iveR1TLZ8HJKBANOmugBxlm">retinal_bipolar.zip</a> (92M)</span> is too large for Google to scan for viruses. Would you still like to download this file?</p><form id="downloadForm" action="https://docs.google.com/uc?export=download&amp;id=1QGkqL_FF7iveR1TLZ8HJKBANOmugBxlm&amp;confirm=t" method="post"><input type="submit" id="uc-download-link" class="goog-inline-block jfk-button jfk-button-action" value="Download anyway"/></form></div></div><div class="uc-footer"><hr class="uc-footer-divider"></div></body></html>

A quick stack overflow search reveals that there may be a change in the google drive API.

seems like something has changed behind the scenes and the token stuff does not quite work anymore. However, simply always including confirm=1 as parameter seems to be a workaround. – 
[Mr Tsjolder](https://stackoverflow.com/users/4375377/mr-tsjolder)
[Apr 8 at 14:11](https://stackoverflow.com/questions/38511444/python-download-files-from-google-drive-using-url#comment126879887_39225272)

With a deeper examination, the first response we get no longer has cookies and hence we get confirm=None. I suspect this passes current tests as the test file is small enough for google to scan for viruses.

Changing the above to the following works. This skips the initial request and just substitutes "confirm=1". However, this may have unintended consequences as we no longer check for confirmation. Perhaps there is a better solution.

# download the data from Google Drive
import requests
_GOOGLE_DRIVE_URL = "https://docs.google.com/uc?export=download"
_CHUNK_SIZE = 32768
def _GET_google_drive(id):
    with requests.Session() as session:
        params = {"id": id, "confirm": 1}
        response = session.get(_GOOGLE_DRIVE_URL, params=params, stream=True)
    return response
def _save_response_content(response, destination):
    global _CHUNK_SIZE
    if isinstance(destination, str):
        with open(destination, "wb") as handle:
            _save_response_content(response, handle)
    else:
        for chunk in response.iter_content(_CHUNK_SIZE):
            if chunk:  # filter out keep-alive new chunks
                destination.write(chunk)
def download_google_drive(id, destination):
    response = _GET_google_drive(id)
    _save_response_content(response, destination)
download_google_drive("1QGkqL_FF7iveR1TLZ8HJKBANOmugBxlm", "retinal_bipolar.zip")
scprep.io.download.unzip("retinal_bipolar.zip")
@atong01 atong01 added the bug label Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant