Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plones default robots.txt prevents google indexing if ?expand used #4898

Closed
2 tasks
djay opened this issue Jun 19, 2023 · 2 comments
Closed
2 tasks

plones default robots.txt prevents google indexing if ?expand used #4898

djay opened this issue Jun 19, 2023 · 2 comments

Comments

@djay
Copy link
Member

djay commented Jun 19, 2023

Describe the bug

Default robots.txt includes the rule

Disallow: /*?

If you use expansion to improve performance of you volto theme your content urls then look similar to

https://digitalnsw.pretagov.com.au/++api++/?expand=actions,breadcrumbs,navigation&expand.navigation.depth=2

the googlebot then can't crawl this and this then results in a "soft 404" (as seen in google search console) and google won't include any of the pages in your site in it's index.

The soft 404 is caused by another bug whereby if the content api call has a problem that it doesn't understand it defaults to 404 not found being rendered but with a 200 status code. (This problem in itself causes other issues since what should be a 500 error doesn't appear as such in GA or server logs.)

In addition another default rule prevents /preview images from being loaded by google bot which could cause indexing issues

Disallow: /*view$

To Reproduce

  1. add expansion to your site see https://training.plone.org/effective-volto/backend/writing-content-expansion.html
  2. Enable search console for your site
  3. Do inspect url and test live url on any of your main content urls

TODO: there is perhaps a more direct way to test this by using something to simulate blocking /*? urls in the browser?

Expected behavior

Google indexes the page fine.

Screenshots

image image

Proposed solution

Other solutions considered

Not really clear the best way forward

  • remove the ```Disallow /*?`` rule from default robots.txt
    • however that is useful for other reasons?
    • timestamps being adding to resource urls will also not be indexed so google will have problems rendering the page
  • expansion without arguments
    • e.g. /++api++/content/@@expand/actions/breadcrumbs/navigation
    • but then how to include expand.navigation.depth=2?

Software (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Volto Version [e.g. 8.5.0]
  • Plone Version [e.g. 5.2.2]
  • Plone REST API Version [e.g. 7.0.1]

Additional context

Add any other context about the problem here.

@djay djay changed the title plones default robots.txt blocks crawling if ?expand used plones default robots.txt prevents google indexing if ?expand used Jun 19, 2023
@arsenico13
Copy link

We just run into this issue and, as a temporary patch, we added this row to robots.txt:

Allow: /*?expand*

We are testing it right now to see if this solves this issue.

@davisagli
Copy link
Member

Fixed in plone/plone.volto#183 + #5584

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants