Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

page can not be crawled due to robots.txt #4

Open
pvgenuchten opened this issue May 28, 2019 · 2 comments
Open

page can not be crawled due to robots.txt #4

pvgenuchten opened this issue May 28, 2019 · 2 comments

Comments

@pvgenuchten
Copy link
Contributor

We have set up a ggl crawling on demo.pygeoapi.io to research crawler behaviour on pygeoapi. First results are available, but it puzzels me a bit.

Ggl generally crawls pygeoapi pages in a correct way. One can indeed find demo pygeoapi results at for example https://www.google.com/search?q=site%3Ademo.pygeoapi.io. however no results yet at https://toolbox.google.com/datasetsearch/search?query=site%3Ademo.pygeoapi.io

A weird thing is that when doing 'live test' (a feature on ggl search console) on this url https://demo.pygeoapi.io/master/collections/lakes i get this error: "url not available to google, blocked by robots.txt"

image

However https://demo.pygeoapi.io/master/collections/lakes?f=html runs fine in 'live test'. This makes me wonder, does the 'live-test' crawler use the proper accept header?

Another thing to improve is the fact that https://demo.pygeoapi.io/robots.txt does not return a proper robots.txt file, but in stead a custom file-not-found page (with http status 200!)

let me now if you have any ideas

@justb4
Copy link
Member

justb4 commented Jun 3, 2019

This issue (and code changes) is really for the pygeoapi demo site repo:
https://github.com/geopython/demo.pygeoapi.io . This is currently a Flask app, mainly for templating.

@tomkralidis tomkralidis transferred this issue from geopython/pygeoapi Sep 24, 2019
@tomkralidis
Copy link
Member

@pvgenuchten is this still an issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants