-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix indexing issues for pages accessible at multiple URLs #582
Comments
Hi guys, Would love to get a better understanding. |
@excentrickristy I added to my queue an action item to build the site and investigate what is causing this duplication. I might not have time to get to this this week. |
I was not able to find time to look at this yet. @oliviergoulet5 can you take some time this week and investigate the reason why we have a duplicate page in our Hugo site? |
Update: There is a setting in Netlify that is enabled by default called Pretty URLs:
I've disabled it on our test site for the specification repo to see if this will fix the duplication issue. The testing website is under https://jakartaee-specifications.netlify.app/ Based on the testing that @oliviergoulet5, this issue only seems to occur on the Netlify server. He was unable to reproduce the error while running the site in a local dev environment. |
The change didn't seem to have fixed the issue. At least, from my end... My next step is to download all the files that are deployed on production to confirm that the server does not have any duplicated files that could explain this duplication. However, the download link on the Netlify site is falling. I will be contacting support about this issue. |
@chrisguindon that's strange, the post processing option being on in the spec repo should have fixed the issue. I was reading this but it looks like asset optimization is being deprecated according to the article you linked, but not until the 17th. Maybe there's a conflict there? After reading that mess, I think you might be best to just reach out to Netlify support to resolve the URL issues as well! |
It's possible but I just took a look and asset optimization is currently disabled for jakartaee-specifications. I sent a support email to Netlify and shared a link to this issue to provide more context on what we are trying to solve. Hopefully, they will be able to point us in the right direction. If anything, it will help once I can download the files that we deployed to confirm if the issue is caused by the server or our custom deployment script. |
I got access to the deployed files and there is no duplicate. This tells me that the server is creating these duplicates URL. I have pretty URLs disabled for both the spec repo and jakarta.ee. I suspect there might be a bug where the pretty URLs setting is always on. I will ask them. If this does work, we might be required to migrate the website to EF preview framework where we would have more control over the server. However, this might not be something we have cycles to do before Q1 2024. |
I heard back from Netlify. Their response is that their CDN normalizes those URLs. .html and without .html and both served by them. This always worked this way and there are no plans to change this. However, they did mention that we could look at using Edge Functions: Edge Functions overview | Netlify Docs to add the canonical header or to even simply redirect to a particular version. |
Disabling pretty URLs on jakarta.ee triggered a regression on our spec pages: |
@chrisguindon - dredging up an old issue here. Since pretty URLS do not seem to work, can we set a canonical tag on these pages? |
The challenge here is these are webpages that are provided by the Jakarta spec projects. All my team does if download these specs and deploy them as-is on the webserver. I don't believe these projects are interested in modifying these files since they have been published. However, I believe we could set a canonical URL within an HTTP header which we could probably do on the server: Netlify allow us to set headers using _headers file. I can do a test using
If it works, I will share the format with you on how this can be configured in case you want to add more of these. |
That sounds good. Since the main website links to the version of the page without .html, that's probably the URL we should use a the canonical URL. Additionally, it would be nice to do a templated title tag to include the version number as well as the word specification. |
@ivargrimstad can we ask our spec projects to follow a template for how set their title tags for their docs? What would be the best way for @excentrickristy to ask them that. |
That should be possible if the asciidoctor plugin supports it. The documents are all generated from AsciiDoc. Would only be for future documents. |
Thanks @ivargrimstad @excentrickristy I don't think I need to be involved in that conversation but I would suggest that you start with a new issue asking spec projects to follow a title convention for their future documents! Regarding the content duplication issue, I just deployed this change: This add a new HTTP header for both https://jakarta.ee/specifications/faces/3.0/jakarta-faces-3.0.html and https://jakarta.ee/specifications/faces/3.0/jakarta-faces-3.0 that says:
Given the current constraints, I believe this is the best we can do. Should you require additional configurations, please submit a request by including the details in the same format as my commit within an issue against the Jakarta EE website. For further details, refer to the official documentation on custom headers from Netlify: Let me know if you have any questions, otherwise I am thinking we can now close this! |
@chrisguindon hmmm I'm not seeing those changes on live. the faces spec pages don't appear to have canonicals set? |
@chrisguindon ah ok I see the canonical in the headers but the issue is that it's using www.jakarta.ee as a root domain instead of jakarta.ee in the canonical link so both faces urls are coming up as non-indexable. |
@excentrickristy Good catch @excentrickristy - I made the fix with jakartaee/jakarta.ee@6fd6676 I would expect to go live in the next 15 minutes. |
@chrisguindon Thank you - it's coming up properly now! So - If I understand correctly, I will have to file an issue for each spec doc with the code from your commit but updated to reflect the proper URLS. I will tag you in my first one to make sure I am doing it right! Thank you for your help. |
@excentrickristy - We can do a bunch of them at the same time. It would definitely simplify things if you can provide the changes in the same format as this file so that we can simply copy and paste what you need: https://github.com/jakartaee/jakarta.ee/blob/src/static/_headers |
However, we can start with updating one to make sure you have the correct format. |
Google Search Console is having trouble indexing specification pages because they are accessible with and without the .html extension with no canonical URL set.
Example:
https://jakarta.ee/specifications/faces/3.0/jakarta-faces-3.0
https://jakarta.ee/specifications/faces/3.0/jakarta-faces-3.0.html
The jakarta.ee site is linking to the document using the .html extension, so it makes sense to use the version with the extension as the canonical version. But, since the rest of the site uses a trailing slash schema for the urls, it might actually be easier to rewrite the .html URLs to that schema. Then, run a search and replace for the links containing the .html urls in the specification pages to strengthen the internal linking structure.
Open to discussion on how to best deal with this one. At the very least we should be setting canonical URLs for the spec pages.
The text was updated successfully, but these errors were encountered: