-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datasets not found on Google Dataset Search #199
Comments
Did you verify if the structured data is generated in the frontend (i.e. view source and check for a json+ld block)? Maybe you have customized your frontend? Then you could check if the schema validator indicates any errors for your domain (test with the URL of a dataset). |
Hi @metaodi and thank you for your answer. The validator does not indicate any error and it seems my urls are correct. |
We also had some issues with indexing datasets by Google Dataset Search. Only a few datasets get indexed. |
Maybe google dataset search require standard JSON-LD structure for indexing |
@sagargg this is exactly what this extension provides. But it's hard to tell what went wrong with no further details.
|
Thank you @metaodi for your answer. The JSON+LD is correctly formed. As I have no former experience with letting crawlers access a website, I was not aware of the necessity to take care of a robots.txt file and a sitemap. I realized it is important to read the Google Search guidelines before using the extension. Are there CKAN-specific instructions about setting up a robots.txt and a sitemap? |
No there is nothing CKAN specific. We use this extension on the open data catalogue of the City of Zurich, and it works for us. See the Google Dataset Search help page for specific instructions: https://datasetsearch.research.google.com/help Hope this helps. |
Thanks! Is a robots.txt really needed? I thought that, when none is given, Google would just crawl everything. |
No, it's not necessary. But since I don't know your setup, it could be that an existing robots.txt is blocking the google crawler. Just something to keep in mind. |
I see. I am not aware of any pre-existing robots.txt in my CKAN instance. Maybe if I explicitly put one, the indexing will work. |
Hi,
I am running CKAN 2.9.2 on Ubuntu 20 and I installed the DCAT plugin. I followed the instructions on the README file (activating the
structured_data
anddcat
plugins) in order to have my Datasets discovered by Google Dataset Search but this has not happened until now.What could I be missing?
Best regards
The text was updated successfully, but these errors were encountered: