Skip to content
This repository has been archived by the owner on Oct 4, 2024. It is now read-only.

Location data #64

Closed
gilesknap opened this issue Mar 6, 2019 · 12 comments
Closed

Location data #64

gilesknap opened this issue Mar 6, 2019 · 12 comments

Comments

@gilesknap
Copy link
Owner

Google has seen fit to not provide location data with the images downloaded by Google Photos API.
I guess I could use beautiful soup and the ProductUrl to scrape this info and insert into our index - plus into the JPG EXIF as required.

@gilesknap
Copy link
Owner Author

gilesknap commented Mar 9, 2019

Nice idea, but ...
Accessing the google photos page requires an interactive login token and to get that you need to interact with the login pages using a javascript enabled browser. I had a try at using requests and bs4 but got as far as a page telling me to enable javascript. Now I guess we could run it through a rendering engine somehow but this all seems too messy now.

@gilesknap
Copy link
Owner Author

Yeah, what I said. But it looks like there may be some solutions out there such as Selenium or phantomJS, see discussion here https://stackoverflow.com/questions/5793414/mechanize-and-javascript

@gilesknap gilesknap removed the wontfix label Mar 13, 2019
@gilesknap
Copy link
Owner Author

gilesknap commented Mar 13, 2019

Made good progress on this. I can now scrape GPS data into the DB index.
I have yet to add it to add this to the JPG files. This is pending the EXIF library having a write capability (see this issue), which I might look at myself. But pausing this project for a little while to work on a new project bookbot.

@gilesknap
Copy link
Owner Author

I've read that Google will be disabling the use of embedded frameworks to login. I'm pretty sure that this will totally close off the approach I'm using here. See https://www.zdnet.com/article/google-bans-logins-from-embedded-browser-frameworks-to-prevent-mitm-phishing/
So it looks like Google will be keeping our GPS data. I have a sneaking feeling that they don't want to release ALL your data via Google Photos API in case people end up using their backend without them getting to have any influence on what the user sees.

My web scrape approach would still work if I could hand a normal interactive login token to selenium. But that sounds like a security hole, so is probably not possible.

@gilesknap
Copy link
Owner Author

My location scraping code is no longer working. Either Google has changed the login screen or shut down access to login via embedded frameworks.

@gilesknap
Copy link
Owner Author

I'm adding a wontfix label to this because Google has shut down all avenues.

I will keep this open, hoping that things might change.

@gilesknap
Copy link
Owner Author

gilesknap commented Jul 14, 2019

I have an idea.
Instead of web scraping I should get gphotos to scan a Google takeout download.

I believe Google is obliged to provide a service like takeout due to data protection rules so this should be more sustainable.

I have confirmed that takeout includes GPS.

This would be quite clunky particularly since it requires a full download of your library.

Note that we could also use this to check for modified date and update photos that were edited online.

@mholt
Copy link

mholt commented Jul 15, 2019

@gilesknap Hey I dunno if this would be helpful but we've been trying to solve the same problem over here: mholt/timeliner#38

And I think the (unfortunate) conclusion we are also arriving at is that integrating with Takeout archives may be the best way to get the location data

Do you think there is a way to correlate media items from the API with a Takeout archive? Some sort of ID that is consistent between the two sources?

@gilesknap
Copy link
Owner Author

Hi Matt. I do have a plan for a way of doing this.

gphotos-sync has a comparison option used to check the downloaded photos library against a previous backup. It works regardless of where the previous backup comes from so needs to match photos and videos up even if their filenames and folders are different.

I will use this same approach for matching my synchronized library to google takeout files.

The matching primarily uses EXIF UID to ensure that files it is comparing match. When this fails it drops to using dates and filesizes and filenames. Note that create date is extracted from the file's metadata for both videos and images and is reasonably unique on its own (videos don't have a UID unfortunately)

The current file comparison scans over both sets of files and builds a DB table for each. It then runs the sequence of queries in the list 'match' in https://github.com/gilesknap/gphotos-sync/blob/master/gphotos/Queries.py.

I find that this algorithm uniquely matches everything in my 110,000 item library despite the fact that it goes back to 1997 and has many duplicate filenames and images that predate EXIF etc.

HTH, giles.

@gilesknap
Copy link
Owner Author

UPDATE: I was hoping that it would be possible to only download the 'last' zip file from takeout for an incremental update. But no. The library contents are randomly scattered across zip files.

It makes a bit of a mockery of a nice incremental backup system to have to download the entire library again each time you want to update your GPS info.

@gilesknap
Copy link
Owner Author

Another possible approach. I had not realized you can paste JavaScript into an active page. This project did so to automate deletion of all photos. https://github.com/mrishab/google-photos-delete-tool. Perhaps this approach can be used to scrape GPS info.

@gilesknap
Copy link
Owner Author

wrapping these up into #119

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants