-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disallow bug #83
Comments
I can confirm this fails. However, this is probably an issue in rep-cpp rather than reppy since it handles the actual robots.txt parsing. Also, I'm not really sure how well rep-cpp actually supports wildcards. Technically, the original robots.txt specification did not allow wildcards, but we do support several extensions. |
I've figured out what the issue is. It appears that internally |
See seomoz/rep-cpp#34. |
thx =) |
Related to this -- i'm noticing an issue parsing even when the wildcard is not leading in a disallow rule. Here's an example: https://www.theverge.com/robots.txt The
Coincidentally, it also fails to parse the sitemaps. A bit of exploring reveals this:
Editing their robots.txt to remove the offending line solves this problem. |
That particular site returns a
Providing a different user agent resolves the issue:
|
@dlecocq - that works for me. Thank you for the tip! |
Hello!
http://mysite.com/test
is allowed, but must be disallowed. Am i right?The text was updated successfully, but these errors were encountered: