Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use URL hash fragment anchor for message permalink, add id attribute of message to jump on it #238

Open
bkil opened this issue May 29, 2023 · 7 comments
Labels
A-archive-room-view The view to look at a room day by day in the archive T-Enhancement New feature or request Z-Confidence-Low Low confidence in the enhancement or suggestion based on known factors, or as described.

Comments

@bkil
Copy link

bkil commented May 29, 2023

Include the Matrix event ID in the URI hash, ex:

https://archive.matrix.org/r/securemessagingapps:matrix.org/date/2023/05/30#$5cQZRtG9bsleXZI2x-s6wEDfeZ5B1nC_jEvOwpA-VdI

To make this work, we would also need to set the id attribute of each timeline message to the respective value (instead of the current data-event-id) so the browser will jump to it upon loading. You can use the :target CSS selector to highlight the matching message on the timeline with a different background and add a mark on the side as well.

If the backend for some reason would also need to access the event ID (without JavaScript) to return messages for the given date, consider adding it to both the query and the hash.

There were restrictions in former versions of HTML on the syntax of the ID, but from HTML5, it should be non-empty and can contain basically anything except whitespace:

@MadLittleMods MadLittleMods added T-Enhancement New feature or request A-archive-room-view The view to look at a room day by day in the archive Z-Confidence-Low Low confidence in the enhancement or suggestion based on known factors, or as described. labels May 30, 2023
@MadLittleMods MadLittleMods changed the title Use anchor for message permalink, add as ID of message to jump on it Use URL hash fragment anchor for message permalink, add id attribute of message to jump on it May 30, 2023
@MadLittleMods
Copy link
Contributor

@bkil What benefit are you trying to achieve with this? I assume you're after the permalink event scrolling into view even when JavaScript is disabled?

Please note, we're not specifically optimizing for the disabled JavaScript case but simpler and semantic is better in terms of search engines which we do care about. I don't think search engines care about scroll though 🤔

We do need to set the ?at=$abc attribute on the server backend in order to set the continuation position as you're paginating backward and forward and have to take the query parameter into account for the the server-side rendered HTML to include the selected event metadata (URL previews), semantic attributes, styles, etc.

Duplicating the event ID in the hash and ?at=$abc query parameter seems like more hassle and noise than it's worth for the disabled JavaScript scroll benefit.

@bkil
Copy link
Author

bkil commented May 30, 2023

The way how it is generated at present is actually inferior from a SEO standpoint. You now generate hundreds of pages per day (differentiated by the ID in the URI query), all containing the exact same content, but interlinked somewhat with the major difference being invisible SEO metadata and the single class hand crafted on top of the highlighted message substituting :target.

Search engines have heuristics to detect such link farms and either penalize such results or downrank the whole domain for this.

If keeping the continuation token is unavoidable, it may be included as long as it remains the same across links pointing towards the same wall of messages

@bkil
Copy link
Author

bkil commented May 30, 2023

For inspiration, this is how IndieWeb generates their online archive (backed by a git repository and a bridge between Slack-IRC-Matrix) with excellent JS & noJS accessibility and optimized for SEO:

@MadLittleMods
Copy link
Contributor

MadLittleMods commented May 30, 2023

You now generate hundreds of pages per day (differentiated by the ID in the URI query)

@bkil Ahh, that's a really interesting point (especially in terms of caching)! But this seemed to work out fine for Gitter with the same URL pattern for permalinks.

I don't think the Matrix Public Archive really qualifies for a link farm or spamdexing. Having a permalink for an item is pretty standard. You can even see this with Discourse or StackExchange sites.

As an interesting point of comparison, in the case of StackExchange questions/answers, they do duplicate the answer ID in the URL and the hash (I assume the hash is for scrolling): https://stackoverflow.com/a/482129/796832 -> https://stackoverflow.com/questions/184618/what-is-the-best-comment-in-source-code-you-have-ever-encountered/482129#482129

If keeping the continuation token is unavoidable, it may be included as long as it remains the same across links pointing towards the same wall of messages

I'm not sure about the distinction you're trying to make here? Can you give an example?

@bkil
Copy link
Author

bkil commented May 30, 2023

I also know of blog engines from the 90s that generate a similar URL including a message ID in both the hash and the query. Although, all such ranking algorithms are proprietary, I'd probably allow for including a tiny bit of context around each referenced message, however including the whole day worth of chat on each separate page would definitely not fly with me.

For tree-based or thread-based blog engines, this typically boils down to referring to a thread or subtree at a time, not the whole root every time.

In search engines I've tried, those results are ranked higher which are accessible through content-unique URLs. I.e., answers are not at the top, as they have been downranked by The Algorithm.

Your linked StackOverflow example also includes this crucial piece:

<link rel="canonical" href="https://stackoverflow.com/questions/184618/what-is-the-best-comment-in-source-code-you-have-ever-encountered" />

@bkil
Copy link
Author

bkil commented May 30, 2023

Drawbacks of link differentiation via the query pointing to the same page:

Advantages:

  • Can load a new advertisement after each click
  • If message links are purely presented in the form where they point to unique batches and differentiated by anchors, search engines may discard the precise connection (they usually ignore anchors during the crawl). Including a link to this along with a link to the individual message as used by indieweb can mitigate this.

@MadLittleMods
Copy link
Contributor

MadLittleMods commented May 30, 2023

Your linked StackOverflow example also includes this crucial piece:

<link rel="canonical" href="https://stackoverflow.com/questions/184618/what-is-the-best-comment-in-source-code-you-have-ever-encountered" />

Please create a new separate issue about adding this (with the SO example) ⏩ -> #251


For tree-based or thread-based blog engines, this typically boils down to referring to a thread or subtree at a time, not the whole root every time.

Reddit and Twitter are a good example of this but they are slightly different use cases since they support infinite nested levels of threads. Both include the permalink ID in the URL for reference.

Reddit even has a ?context=3 query parameter to specify the depth of surrounding messages to show. For a Matrix room, the context for a given event is just the surrounding messages (whether that be in the main timeline or thread timeline) which is what we're already showing.

It's unclear what impact on SEO that our current level of bulk surrounding messages has but it's also something we haven't measured and not something I'm particularly worried about this stage. Based on that experience with Gitter, I've seen plenty of relevant permalinks appear in Google. I'm leaning towards leaving things as-is.


In terms of the drawbacks you listed for using the ?at=$abc query parameter, we can't really get away from not including it in the URL since we want URL previews to work well.

And in terms of following a reply-chain without a page reload (as long as the messages are on the page), this isn't really relevant since we can still accommodate for that with the Hydrogen client-side JS.

Caching seems like the most impactful benefit we could get from changing but also not a total deal-breaker in my opinion with how it currently works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-archive-room-view The view to look at a room day by day in the archive T-Enhancement New feature or request Z-Confidence-Low Low confidence in the enhancement or suggestion based on known factors, or as described.
Projects
None yet
Development

No branches or pull requests

2 participants