A Flask app for conducting reverse image search using Google's API.
conda create --name image-lookup --file requirements.txt
to create the program environmentsource activate image-lookup
to enter the environmentpython run.py
to start the server.- In a second terminal, start running ngrok (
ngrok http 8080
) or your html-interface-exposing tool of choice. Alternately, run the program as a dedicated web app on a specific web server. - Go to 0.0.0.0:8080 - or your publicly exposed http endpoint- in your browser of choice.
- Use the file browser to upload an image.
- Browse the gallery of results
- There are two ways to access image search:
- Send a link to a hosted version at the end of the query:
http://images.google .com/searchbyimage?image_url=[image url]
- Send a copy of the image to google directly:
http://www.google.com/searchbyimage/upload
with anencoded_image
field containing the raw image data and animage_content
field containing an empty string. - Either way, before connecting to search you must create a User-Agent header for a modern browser.
- Google will respond with a redirect to a results page (in a
Location
field). - This page will include results, plus a tag linking to a much larger, "visually similar images" results page:
<a href="[url]">Visually similar images</a>
. Extract this tag, and request the linked page. - This final page will contain a number of results for us to extract, generally conforming to the schema.org's Search Results Page standard:
<a class="qb-b">[query text]</a>
, if google has a search query associated with this particular image. (This isn't likely, unless we're searching on something particularly popular.- All results are nested within tags of the form
<div class="rg_di rg_el ivg-i">
Of particular note within are these three tags: 1.<a class="rg_l" href="/imgres?imgurl=[url of image]&imgrefurl=[website of image]& h=[height]&w=[width]&tbnid=[google internal id]&…&…" …>
1.<span class="rg_ilmn">[resolution] - [image domain]</span>
1.<div class="rg_meta">{JSON dict of various features}</div>
pt
: Some type of heading or subject string. I'm uncertain if this key is mandatory.s
: An additional subject string. It may or may not be the same aspt
. I'm uncertain if this key is mandatory.id
: A repeat of thetbnid
value mentioned above.
- This is very brittle, but within the eighth
<script>
tag google stores thumbnails of the first ten images, keyed with thetbnid
value. If you can get to the right portion, you can extract the raw image.