Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using pydrive with user credentials for authenticated download #3

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jeremyfix
Copy link

Unfortunately, when using your code, an anonymous download is performed and I tried several consecutive days, I always got an exceeded quota error making me unable to download the dataset.

This pull requests, which uses code adapted from the FFHQ-Aging repo is using user credentials for downloading the dataset.

The only requirement is to follow the pydrive quickstart for getting the client_secrets.json file placed in the same directory than download_ffhq.py and you can then indicate you want to use pydrive google authentication by appending the --pydrive command line option.

So for example, for downloading the 1024x1024 images, you simply :

python3 download_ffhq.py -i --pydrive

In the code, several attempts are tried to download a file. Without that code, inspired by yours, I got some httplib2.error.ServerNotFoundError: Unable to find the server at www.googleapis.com being raised. Apparently, retrying the download a second time and the exception is not raised.

I only tested the download of the images (the command line above) but as the other downloads go through the download_files function, I hope it works as well for the other downloads.

@jeremyfix
Copy link
Author

jeremyfix commented Apr 22, 2021

Note that, for some reasons, after some times (like hours), it may try to reauthenticate and it ends as a failure but relaunching the script and it continues downloading;

I successfully downloaded the 90 GB of the 1024x1024 images this way.

@mmazeika
Copy link

mmazeika commented Apr 12, 2022

This was very helpful for me. I was able to download the 89GB of 1024x1024 images with a restart after a few hours. As an additional step, I had to replace

# Google Drive virus checker nag.
links = [html.unescape(link) for link in data_str.split('"') if 'export=download' in link]
if len(links) == 1:
    if attempts_left:
        file_url = requests.compat.urljoin(file_url, links[0])
        continue

with

# Google Drive virus checker nag.
file_id = re.findall('uc\?id=(.*)&amp', data_str)
if len(file_id) == 1:
    file_id = file_id[0]
    if attempts_left:
        file_url = 'https://www.googleapis.com/drive/v3/files/{}/?key=API_KEY&alt=media'.format(file_id)
        continue

This is because the virus checker page changed, so the code for handling it doesn't work anymore. To make this work, I had to follow the instructions in the pydrive quickstart link given above (i.e., use this PR and get a client_secrets.json from the Drive API). The new virus checker workaround uses an API key that you can create in a GCP API project, similar to how you get the client_secrets.json file. You can also use the OAuth key.

I had to run the download script with the --cmd_auth flag and use a "Desktop" instead of "Web application" setting in the Drive API to make it work. Here is a screenshot of my Drive API page.
Screenshot from 2022-04-12 18-39-19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants