Skip to content
This repository has been archived by the owner on Oct 4, 2024. It is now read-only.

google colab request #388

Closed
mikebilly opened this issue Oct 1, 2022 · 26 comments
Closed

google colab request #388

mikebilly opened this issue Oct 1, 2022 · 26 comments

Comments

@mikebilly
Copy link

Hi!, Thanks for your amazing repo. I'd like to suggest creating a colab notebook so that I can easily copy photos and videos from google photos and save it in a folder in my google drive

@gilesknap
Copy link
Owner

gilesknap commented Oct 1, 2022

Hi, thanks for the suggestion. I'm not that keen on this because gphotos-sync was born out of Google deprecating its own photos-drive sync, so we moved on from this a long time ago.

What is your use case? If you already have photos in Google why have them in two places in Google (and use up your Google disk quota) ?

If I were to look at this I'd prefer to do it is a desktop app that talks to the photos and drive APIs. That is because Notebooks don't make a great development environment due to lack of version control (at least last time I looked - this is my only effort in colab notebooks https://colab.research.google.com/drive/1zBHuFfGpqkv8I96epPfo45xxyKg2_FTh?usp=sharing)

@mikebilly
Copy link
Author

Thanks @gilesknap for your quick reply. The reason is that: I have a shared drive with great unusual space, meanwhile my own account has 15gb limit. I want to copy my photos and videos from my google photos to that shared drive so that I can make use of my shared drive to store medias. I want to do this on google colab because the download and upload speed is high. Hope you could help me with this.

@gilesknap
Copy link
Owner

I would not recommend this for 2 reasons:

  • You loose all of the photo management features that make google photos a compelling product (so much so that I still put up with it despite a somewhat crippled API)
  • You are risking your precious photos to be lost due to coding bugs. (gphotos-sync is vert clear that its a read-only tool, it makes no changes to the Photos Library and only creates additional backup of Google's servers)

If you still want this and are interested in doing it yourself, I'm happy to help with ideas and some code pointers.

@gilesknap
Copy link
Owner

Oh, and by the way, one of the API limitations is you can't delete any photos unless they were uploaded by the same App. Therefore you would need to delete them manually to get your space back after the tool did a copy to drive. This is obviously error prone and a risk to your photos.

The delete issue is one of many that have been outstanding for years and Google is simply not going to fix. See:

@mikebilly
Copy link
Author

mikebilly commented Oct 1, 2022

Thanks @gilesknap for your reply, well I think I might need an api to get the list of filenames on google photos and make sure that every filename in that list is present in my shared drive. If a single filename is not present, I might copy that single file to my shared drive, can I do that? And after that I can safely delete everything in my google photo?

@gilesknap
Copy link
Owner

This is one of the big issues I faced with gphotos-sync. The mapping between allowed names in the library and allowed filenames is a little fiddly, for example the same filename may appear multiple times in the same album. Since Drive is your target it would also allow the same filename multiple times. HOWEVER you are left to work out how to know which is which, if you have two PHOTO1 in photos but only one in Drive then which one do you copy over?

To solve this gphotos-sync uses a database and keeps additional meta-data on each photo, including what it got called on the local filesystem.

To simply see how you list photos in the library you could look at

def index_photos_media(self) -> int:
log.warning("Indexing Google Photos Files ...")
total_listed = 0
if self.start_date:
start_date = self.start_date
elif self.rescan:
start_date = None
else:
start_date = self._db.get_scan_date()
items_json = self.search_media(
start_date=start_date,
end_date=self.end_date,
do_video=self.include_video,
favourites=self.favourites,
)
while items_json:
media_json = items_json.get("mediaItems", [])
items_count = 0
for media_item_json in media_json:
items_count += 1
total_listed += 1
media_item = GooglePhotosMedia(
media_item_json, to_lower=self.case_insensitive_fs
)
media_item.set_path_by_date(self._media_folder, self._use_flat_path)
(num, row) = self._db.file_duplicate_no(
str(media_item.filename),
str(media_item.relative_folder),
media_item.id,
)
# we just learned if there were any duplicates in the db
media_item.duplicate_number = num
if self.settings.progress and total_listed % 10 == 0:
log.warning(f"Listed {total_listed} items ...\033[F")
if not row:
self.files_indexed += 1
log.info(
"Indexed %d %s", self.files_indexed, media_item.relative_path
)
self.write_media_index(media_item, False)
if self.files_indexed % 2000 == 0:
self._db.store()
elif media_item.modify_date > row.modify_date:
self.files_indexed += 1
# todo at present there is no modify date in the API
# so updates cannot be monitored - this won't get called
log.info(
"Updated Index %d %s",
self.files_indexed,
media_item.relative_path,
)
self.write_media_index(media_item, True)
else:
self.files_index_skipped += 1
log.debug(
"Skipped Index (already indexed) %d %s",
self.files_index_skipped,
media_item.relative_path,
)
self.latest_download = max(
self.latest_download, media_item.create_date
)
log.debug(
"search_media parsed %d media_items with %d PAGE_SIZE",
items_count,
GooglePhotosIndex.PAGE_SIZE,
)
next_page = items_json.get("nextPageToken")
if next_page:
items_json = self.search_media(
page_token=next_page,
start_date=start_date,
end_date=self.end_date,
do_video=self.include_video,
favourites=self.favourites,
)
else:
break
# scan (in reverse date order) completed so the next incremental scan
# can start from the most recent file in this scan
if not self.start_date:
self._db.set_scan_date(last_date=self.latest_download)
log.warning(f"indexed {self.files_indexed} items")
return self.files_indexed

The truth is this is non-trivial and it is why I made the project read-only.

@gilesknap
Copy link
Owner

I should be fair and say that modern cameras are pretty good at creating unique file names for their images. My collection goes back to 1996 and in those days cameras reset the image name counter on each memory card clean. I have hundreds of Image001.jpg in my collection!

So you could try ignoring this issue and assume that filenames are unique. But you would loose photos if this turned out not to be true.

@mikebilly
Copy link
Author

Yeah I think so, I probably don't have 2 files with the same name but with different metadata. Google photos automatically backups my photos and videos on my phone so that's why I'm using it. And I'd say the photos and videos on my phone are numbered distinctly so there's no issue unless I reset my phone and it numbers from 0. I guess I might need a feature where when it sees a file with the same filename, it would check 2 metadatas and if they're the same, it skips, otherwise rename any of the two files to like xxx(2).png or sth then proceeds to copy.

@gilesknap
Copy link
Owner

Exactly. But you need a consistent way to get from metadata to files named (2) or (3) etc. You can't just do it in date order because you are going to be deleting files from the photos source. This is why I keep a DB of what I named a file. I also try to be consistent if someone wants to flush the DB and start again - but I have not been successful in coping with all corner cases. Rebuilding the DB after some files have been deleted from the source may not result in the same filenames the second time a cause the sync to overwrite already copied files.

@gilesknap
Copy link
Owner

I think this would be a better starting point for you instead of reading my code.

And once you have that working the rest of the REST API is documented here:

Drive has a similar REST API. I've had a google around and it looks like there are a few out of date python libraries wrapping it - not sure what to advise on this without more research

@mikebilly
Copy link
Author

Seems fairly complicated with filenames and duplicate checking haha. Well I guess as long as I use the same device and I don't reset, I think I won't encounter any problem with filenames and duplicates. So my plan to copy to drive should work right?

@gilesknap
Copy link
Owner

gilesknap commented Oct 1, 2022 via email

@mikebilly
Copy link
Author

Thank you for your helpful information, I'll begin to try to make this work on colab tomorrow

@gilesknap
Copy link
Owner

Good luck. Closing this as its not a gphotos-sync issue. I'll still respond here if you continue to post.

@mikebilly
Copy link
Author

10-02 09:56:14 WARNING  gphotos-sync 3.5.dev10+g2f6bcbb 2022-10-02 09:56:14.195939 
10-02 09:56:14 ERROR    Symbolic links not supported 
10-02 09:56:14 ERROR    Albums are not going to be synced - requires symlinks 
10-02 09:56:15 ERROR    
Process failed. 
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/Main.py", line 492, in main
    self.setup(args, db_path)
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/Main.py", line 341, in setup
    self.auth.authorize()
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/authorize.py", line 96, in authorize
    open_browser=False, bind_addr="0.0.0.0", port=self.port
  File "/usr/local/lib/python3.7/dist-packages/google_auth_oauthlib/flow.py", line 489, in run_local_server
    bind_addr or host, port, wsgi_app, handler_class=_WSGIRequestHandler
  File "/usr/lib/python3.7/wsgiref/simple_server.py", line 153, in make_server
    server = server_class((host, port), handler_class)
  File "/usr/lib/python3.7/socketserver.py", line 452, in __init__
    self.server_bind()
  File "/usr/lib/python3.7/wsgiref/simple_server.py", line 50, in server_bind
    HTTPServer.server_bind(self)
  File "/usr/lib/python3.7/http/server.py", line 137, in server_bind
    socketserver.TCPServer.server_bind(self)
  File "/usr/lib/python3.7/socketserver.py", line 466, in server_bind
    self.socket.bind(self.server_address)
OSError: [Errno 98] Address already in use
10-02 09:56:15 WARNING  Done. 

This is the first error that I got.
In some other project, this type of flow worked for me:

SCOPES = ['https://www.googleapis.com/auth/youtube.upload']
API_SERVICE_NAME = 'youtube'
API_VERSION = 'v3'

VALID_PRIVACY_STATUSES = ('public', 'private', 'unlisted')

# Authorize the request and store authorization credentials.
def get_authenticated_service(CLIENT_SECRETS_FILE):
  storage = Storage("/content/youtube-upload-credentials.json")
  credentials = storage.get()
  if credentials is None or credentials.invalid:
    flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE, SCOPES)
    flags = tools.argparser.parse_args(args=['--noauth_local_webserver'])
    credentials = tools.run_flow(flow, storage, flags)

  return build(API_SERVICE_NAME, API_VERSION, credentials = credentials)

@gilesknap
Copy link
Owner

This is just saying that the default auth flow port 8080 is already in use. You can choose a different port on the command line with --port (see gphotos-sync --help).

@mikebilly
Copy link
Author

mikebilly commented Oct 2, 2022

Sorry if I say anything dumb, but can I make it use the flow similar to

    flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE, SCOPES)
    flags = tools.argparser.parse_args(args=['--noauth_local_webserver'])
    credentials = tools.run_flow(flow, storage, flags)

which opens a browser page for me to accept the api and I can get the authorization code, without creating a localhost server?
Edit:
image

@gilesknap
Copy link
Owner

I think you are using a function from the oauth2client library.

That is deprecated and you need to use https://github.com/googleapis/google-auth-library-python-oauthlib.

Now you might be able to do clever stuff because you are already logged in to Collab Notebooks, but I don't have any knowledge of that.

Instead, I would hope that the authentication code from gphotos-sync here should still work for you - you would need to make the token locally on your workstation first as the authetication flow only works on a local browser. See setting up for headless browsers here. https://gilesknap.github.io/gphotos-sync/main/tutorials/installation.html#headless-gphotos-sync-servers

@mikebilly
Copy link
Author

Yeah that's what I'm facing, I want to do the authorization phase that gives me the token and credentials and everything by using my client_secret.json without needing my local browser or my local workstasion, the reason is that I want to run this colab notebook on my phone too.

@mikebilly
Copy link
Author

As I've mentioned above, this authorization function works for my other project

SCOPES = ['https://www.googleapis.com/auth/youtube.upload']
API_SERVICE_NAME = 'youtube'
API_VERSION = 'v3'

VALID_PRIVACY_STATUSES = ('public', 'private', 'unlisted')

# Authorize the request and store authorization credentials.
def get_authenticated_service(CLIENT_SECRETS_FILE):
  storage = Storage("/content/youtube-upload-credentials.json")
  credentials = storage.get()
  if credentials is None or credentials.invalid:
    flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE, SCOPES)
    flags = tools.argparser.parse_args(args=['--noauth_local_webserver'])
    credentials = tools.run_flow(flow, storage, flags)

  return build(API_SERVICE_NAME, API_VERSION, credentials = credentials)

So I've modified your authorization function from:

    def authorize(self):
        """Initiates OAuth2 authentication and authorization flow"""
        token = self.load_token()

        if token:
            self.session = OAuth2Session(
                self.client_id,
                token=token,
                auto_refresh_url=self.token_uri,
                auto_refresh_kwargs=self.extra,
                token_updater=self.save_token,
            )
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                self.secrets_file, scopes=self.scope
            )
            # localhost and bind to 0.0.0.0 always works even in a container.
            flow.run_local_server(
                open_browser=False, bind_addr="0.0.0.0", port=self.port
            )

            self.session = flow.authorized_session()

            # Mapping for backward compatibility
            oauth2_token = {
                "access_token": flow.credentials.token,
                "refresh_token": flow.credentials.refresh_token,
                "token_type": "Bearer",
                "scope": flow.credentials.scopes,
                "expires_at": flow.credentials.expiry.timestamp(),
            }

            self.save_token(oauth2_token)

        # set up the retry behaviour for the authorized session
        retries = Retry(
            total=self.max_retries,
            backoff_factor=5,
            status_forcelist=[500, 502, 503, 504, 429],
            allowed_methods=frozenset(["GET", "POST"]),
            raise_on_status=False,
            respect_retry_after_header=True,
        )
        # apply the retry behaviour to our session by replacing the default HTTPAdapter
        self.session.mount("https://", HTTPAdapter(max_retries=retries))

to

    def authorize(self):
        """Initiates OAuth2 authentication and authorization flow"""
        token = self.load_token()

        if token:
            self.session = OAuth2Session(
                self.client_id,
                token=token,
                auto_refresh_url=self.token_uri,
                auto_refresh_kwargs=self.extra,
                token_updater=self.save_token,
            )
        else:
            storage = Storage("/content/gphotos-sync-credentials.json")
            credentials = storage.get()
            if credentials is None or credentials.invalid:
              flow = flow_from_clientsecrets(self.secrets_file, self.scope)
              flags = tools.argparser.parse_args(args=['--noauth_local_webserver'])
              credentials = tools.run_flow(flow, storage, flags)
			  
            # Mapping for backward compatibility
            oauth2_token = {
                "access_token": credentials.access_token,
                "refresh_token": credentials.refresh_token,
                "token_type": "Bearer",
                "scope": credentials.scopes,
                "expires_at": credentials.token_expiry.timestamp(),
            }

            self.save_token(oauth2_token)

        # set up the retry behaviour for the authorized session
        retries = Retry(
            total=self.max_retries,
            backoff_factor=5,
            status_forcelist=[500, 502, 503, 504, 429],
            allowed_methods=frozenset(["GET", "POST"]),
            raise_on_status=False,
            respect_retry_after_header=True,
        )
        # apply the retry behaviour to our session by replacing the default HTTPAdapter
        self.session.mount("https://", HTTPAdapter(max_retries=retries))

(from else: to self.save_token(oauth2_token))
And this is the error that I got:

10-02 14:27:21 WARNING  gphotos-sync 3.5.dev10+g2f6bcbb 2022-10-02 14:27:21.954127 
10-02 14:27:21 ERROR    Symbolic links not supported 
10-02 14:27:21 ERROR    Albums are not going to be synced - requires symlinks 
/usr/local/lib/python3.7/dist-packages/oauth2client/_helpers.py:255: UserWarning: Cannot access /content/gphotos-sync-credentials.json: No such file or directory
  warnings.warn(_MISSING_FILE_MESSAGE.format(filename))

Go to the following link in your browser:

    https://accounts.google.com/o/oauth2/auth?client_id=771314848617-a78n0bc5osd04b3rabt2hqp8qdd88b9h.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fphotoslibrary.readonly+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fphotoslibrary.sharing&access_type=offline&response_type=code

Enter verification code: 4/1ARtbsJrkSVGbpu6z7dA-LQvcpS26V6kKDOPiI7J9YzZ309mFNSQcttSJ2vs
Authentication successful.
10-02 14:27:33 ERROR    
Process failed. 
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/Main.py", line 492, in main
    self.setup(args, db_path)
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/Main.py", line 341, in setup
    self.auth.authorize()
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/authorize.py", line 111, in authorize
    self.save_token(oauth2_token)
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/authorize.py", line 79, in save_token
    dump(token, stream)
  File "/usr/lib/python3.7/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "/usr/lib/python3.7/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.7/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.7/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/usr/lib/python3.7/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type set is not JSON serializable

@mikebilly
Copy link
Author

(from else: to self.save_token(oauth2_token)) And this is the error that I got:

10-02 14:27:21 WARNING  gphotos-sync 3.5.dev10+g2f6bcbb 2022-10-02 14:27:21.954127 
10-02 14:27:21 ERROR    Symbolic links not supported 
10-02 14:27:21 ERROR    Albums are not going to be synced - requires symlinks 
/usr/local/lib/python3.7/dist-packages/oauth2client/_helpers.py:255: UserWarning: Cannot access /content/gphotos-sync-credentials.json: No such file or directory
  warnings.warn(_MISSING_FILE_MESSAGE.format(filename))

Go to the following link in your browser:

    https://accounts.google.com/o/oauth2/auth?client_id=771314848617-a78n0bc5osd04b3rabt2hqp8qdd88b9h.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fphotoslibrary.readonly+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fphotoslibrary.sharing&access_type=offline&response_type=code

Enter verification code: 4/1ARtbsJrkSVGbpu6z7dA-LQvcpS26V6kKDOPiI7J9YzZ309mFNSQcttSJ2vs
Authentication successful.
10-02 14:27:33 ERROR    
Process failed. 
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/Main.py", line 492, in main
    self.setup(args, db_path)
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/Main.py", line 341, in setup
    self.auth.authorize()
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/authorize.py", line 111, in authorize
    self.save_token(oauth2_token)
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/authorize.py", line 79, in save_token
    dump(token, stream)
  File "/usr/lib/python3.7/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "/usr/lib/python3.7/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.7/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.7/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/usr/lib/python3.7/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type set is not JSON serializable

For that error, I've added this code:

import json
class SetEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, set):
            return list(obj)
        return json.JSONEncoder.default(self, obj)

and changed

    def save_token(self, token: str):
        with self.token_file.open("w") as stream:
            dump(token, stream)
        self.token_file.chmod(0o600)

to

    def save_token(self, token: str):
        with self.token_file.open("w") as stream:
          dump(token, stream, cls=SetEncoder)
        self.token_file.chmod(0o600)

But now I get this error:

10-02 15:23:43 WARNING  gphotos-sync 3.5.dev10+g2f6bcbb 2022-10-02 15:23:43.480427 
10-02 15:23:43 ERROR    Symbolic links not supported 
10-02 15:23:43 ERROR    Albums are not going to be synced - requires symlinks 
10-02 15:23:43 ERROR    
Process failed. 
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/Main.py", line 492, in main
    self.setup(args, db_path)
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/Main.py", line 341, in setup
    self.auth.authorize()
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/authorize.py", line 131, in authorize
    respect_retry_after_header=True,
TypeError: __init__() got an unexpected keyword argument 'allowed_methods'
10-02 15:23:43 WARNING  Done. 

@mikebilly
Copy link
Author

mikebilly commented Oct 2, 2022

So I added these commands:

!pip install urllib3 --upgrade 
!pip install requests --upgrade 
!pip install spotipy --upgrade

and it works.
After running this command:

!gphotos-sync --secret="/content/client secret.json" "/content/drive/Shareddrives/Family_photos/backup" --port 307

I get this message:

10-02 15:42:02 WARNING  gphotos-sync 3.5.dev10+g2f6bcbb 2022-10-02 15:42:02.428861 
10-02 15:42:02 ERROR    Symbolic links not supported 
10-02 15:42:02 ERROR    Albums are not going to be synced - requires symlinks 
10-02 15:42:02 WARNING  Indexing Google Photos Files ... 
10-02 15:42:03 WARNING  indexed 2 items 
10-02 15:42:03 WARNING  Downloading Photos ... 
10-02 15:42:15 WARNING  Downloaded 2 Items, Failed 0, Already Downloaded 0 
10-02 15:42:15 WARNING  Done. 

However, the copied photos and videos are not in original quality.
I have 1 photo and 1 video.
The original photo is 64kb, while the copied version is 84kb with slight difference in quality.
The original video is 3840×2160, 863,3mb (uploaded to google photo in original quality), while the copied video is 1920x1080 57,4mb

@gilesknap
Copy link
Owner

gilesknap commented Oct 2, 2022

Yep. Remember I said the API was crippled. Again this is one Google has been sitting on for years. I don't mind the images too much because I cant see any visual difference with my image resolutions. But the videos are awful.
https://issuetracker.google.com/issues/112096115

ALSO: note that you loose GPS info from your images.

@gilesknap
Copy link
Owner

Good work getting the auth going though!!

@mikebilly
Copy link
Author

Thanks @gilesknap, I got your api working, but the only issue is that it doesn't download original quality videos and it strips gps info, for that reason I can't use google photo api. I'm looking for some sort of google takeout api.

@gilesknap
Copy link
Owner

Yeah, sorry about that - I should have thought to mention those specific limitations for your use case.

(it's not my API its Google's!! :-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants