-
Notifications
You must be signed in to change notification settings - Fork 169
Location data #64
Comments
Nice idea, but ... |
Yeah, what I said. But it looks like there may be some solutions out there such as Selenium or phantomJS, see discussion here https://stackoverflow.com/questions/5793414/mechanize-and-javascript |
Made good progress on this. I can now scrape GPS data into the DB index. |
I've read that Google will be disabling the use of embedded frameworks to login. I'm pretty sure that this will totally close off the approach I'm using here. See https://www.zdnet.com/article/google-bans-logins-from-embedded-browser-frameworks-to-prevent-mitm-phishing/ My web scrape approach would still work if I could hand a normal interactive login token to selenium. But that sounds like a security hole, so is probably not possible. |
My location scraping code is no longer working. Either Google has changed the login screen or shut down access to login via embedded frameworks. |
I'm adding a wontfix label to this because Google has shut down all avenues. I will keep this open, hoping that things might change. |
I have an idea. I believe Google is obliged to provide a service like takeout due to data protection rules so this should be more sustainable. I have confirmed that takeout includes GPS. This would be quite clunky particularly since it requires a full download of your library. Note that we could also use this to check for modified date and update photos that were edited online. |
@gilesknap Hey I dunno if this would be helpful but we've been trying to solve the same problem over here: mholt/timeliner#38 And I think the (unfortunate) conclusion we are also arriving at is that integrating with Takeout archives may be the best way to get the location data Do you think there is a way to correlate media items from the API with a Takeout archive? Some sort of ID that is consistent between the two sources? |
Hi Matt. I do have a plan for a way of doing this. gphotos-sync has a comparison option used to check the downloaded photos library against a previous backup. It works regardless of where the previous backup comes from so needs to match photos and videos up even if their filenames and folders are different. I will use this same approach for matching my synchronized library to google takeout files. The matching primarily uses EXIF UID to ensure that files it is comparing match. When this fails it drops to using dates and filesizes and filenames. Note that create date is extracted from the file's metadata for both videos and images and is reasonably unique on its own (videos don't have a UID unfortunately) The current file comparison scans over both sets of files and builds a DB table for each. It then runs the sequence of queries in the list 'match' in https://github.com/gilesknap/gphotos-sync/blob/master/gphotos/Queries.py. I find that this algorithm uniquely matches everything in my 110,000 item library despite the fact that it goes back to 1997 and has many duplicate filenames and images that predate EXIF etc. HTH, giles. |
UPDATE: I was hoping that it would be possible to only download the 'last' zip file from takeout for an incremental update. But no. The library contents are randomly scattered across zip files. It makes a bit of a mockery of a nice incremental backup system to have to download the entire library again each time you want to update your GPS info. |
Another possible approach. I had not realized you can paste JavaScript into an active page. This project did so to automate deletion of all photos. https://github.com/mrishab/google-photos-delete-tool. Perhaps this approach can be used to scrape GPS info. |
wrapping these up into #119 |
Google has seen fit to not provide location data with the images downloaded by Google Photos API.
I guess I could use beautiful soup and the ProductUrl to scrape this info and insert into our index - plus into the JPG EXIF as required.
The text was updated successfully, but these errors were encountered: