Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

=> Bluesky: link preview embeds sometimes fail or contain raw HTML #1615

Closed
snarfed opened this issue Dec 12, 2024 · 24 comments
Closed

=> Bluesky: link preview embeds sometimes fail or contain raw HTML #1615

snarfed opened this issue Dec 12, 2024 · 24 comments
Labels
bug User-facing breakage and reliability issues within Bridgy Fed. now

Comments

@snarfed
Copy link
Owner

snarfed commented Dec 12, 2024

Ugh. Example: https://bsky.app/profile/pcottle.threads.net.ap.brid.gy/post/3lczzpgrtsep2

image

From #microformats:

[snarfed] has anyone ever wanted to disable mf1 backcompat in an mf2 parser?
[snarfed] I'd like that option
[snarfed] example of bad results for parsing mf1 backcompat: https://pin13.net/mf2/?url=https://www.popsci.com/science/chain-reaction-the-hopeful-history-of-uranium-book-excerpt/
[tantek] those are some really weird false positives
[tantek] should we add "h-full" to our item blocklist?
[aaronpk] The php parser has that option https://github.com/microformats/php-mf2?tab=readme-ov-file#classic-microformats-markup
[snarfed] re h-full, sure, but I expect most/many mf2 consumers at least look for specific types they're interested in, so h-full wouldn't bother them too much...?
[snarfed] the real problem is the h-entry content there, at least for me
[snarfed] it may not technically be wrong, but it's not great
[tantek] yeah it's pretty bad
[tantek] I wonder if we could tweak it with some backcompat algo tweaks, e.g. if there's only a synthetic e-content (no author or name properties) then perhaps we should drop the entire synthetic h-entry?
[tantek] [snarfed] do you think such "reduce noise" efforts would be sufficient?
[tantek] or would this end up being a continuous patching exercise?
[snarfed] maybe!
[snarfed] honestly I don't know the ecosystem well enough to say
[gRegor] Some more discussion about those Tailwind classes in microformats/microformats2-parsing#59
[gRegor] [snarfed], is the problem the lack of the mf1 entry-title?
[gRegor] The entry-content looks mostly correct, though it includes the little holiday promo at the bottom which it probably shouldn't
[snarfed] yeah I don't have a crisp clear problem statement here, I just don't like it 😎
[snarfed] afaict this ugliness - even if it's technically correct - caused the URL and HTML in the link preview in https://bsky.app/profile/ScienceDesk.flipboard.social.ap.brid.gy/post/3lcgk6msncg42 instead of human-readable text, but I don't exactly understand how yet

@snarfed snarfed added now and removed now labels Dec 12, 2024
@snarfed snarfed changed the title => Bluesky: link preview embeds sometime contain raw HTML => Bluesky: link preview embeds sometimes fail or contain raw HTML Dec 22, 2024
@Tamschi Tamschi added the bug User-facing breakage and reliability issues within Bridgy Fed. label Dec 25, 2024
@Steviemac1
Copy link

Apologies for the question. I note that this has been labeled as a bug. I’m not sure of the significance of that distinction.

The lack of an image preview on posts bridged from Flipboard is making those posts look poor and very dull or uninteresting. Quite bot like too. Whats the outlook on overcoming this?

@snarfed
Copy link
Owner Author

snarfed commented Jan 10, 2025

@Steviemac1 hmm, I'm seeing lots of link previews on Flipboard posts, eg https://bsky.app/profile/ScienceDesk.flipboard.social.ap.brid.gy . They should generally be there if the post wasn't truncated. Feel free to send an example where the post wasn't truncated and the preview was missing, I'm happy to look.

@Steviemac1
Copy link

Steviemac1 commented Jan 10, 2025

This Bluesky account is fed by bridged posts from Flipboard. @Ayethatllbright.flipboard.com.ap.brid.gy
It’s only the occasional one that has an image preview appear. The posts don't appear to be truncated, it looks like a title plus link. Some with longer titles get a preview. I may be misunderstanding what gets truncated though.

@snarfed
Copy link
Owner Author

snarfed commented Jan 10, 2025

Thanks! Looks like a few of the most common sites that that feed links to are hitting this bug, including audiophix.com, netflixlife.com, and singersroom.com. Sorry for the trouble!

Fwiw I do see a few successful link previews in that feed from other sites, eg abc.net.au, ultimateclassicrock.com, and loudersound.com.

@snarfed snarfed added the now label Jan 10, 2025
@Steviemac1
Copy link

Steviemac1 commented Jan 10, 2025

I hasn't spotted that some sites were ok and others weren't.Having said that, if you go back by 23days+ on the posts on that account, those failing now were much more consistently ok. AudioPhix.com as one example had plenty of previews at that point. I’m not sure what changed or where at that stage around Dec. 18 though. Nor whether that's in the bridging process, at Flipboard or on those sites.

@Steviemac1
Copy link

There’s a slight change that may or may not be significant. The bridged posts from Flipboard that have failed to include the image just don't have it or a space for it.

Today there’s some sites now where bridged posts via Flioboard are showing a blank gap where an image should be. I reckon that's new, and in one case, it’s a site where images have been appearing properly until now. See example below.

https://bsky.app/profile/Ayethatllbright.flipboard.com.ap.brid.gy/post/3lfs2c2skouh2

@snarfed
Copy link
Owner Author

snarfed commented Jan 15, 2025

Sigh, yeah, sorry, a recent Bluesky team appview change broke us. Fix is merged, hopefully it'll be deployed soon! bluesky-social/atproto#3370

@Steviemac1
Copy link

It looks as though that fix above worked although issue 1715 just added may suggest otherwise.
#1715

It still leaves the basic issue of some bridged items from Flipboard to Bluesky not giving an image preview, as described above. Current examples here.
@Ayethatllbright.flipboard.com.ap.brid.gy

To manage my expectations is that likely to progress to a fix at some point and at what sort of general timescale? I've weeks/ months. Or is it more likely to remain as it is for a much longer period?

@snarfed
Copy link
Owner Author

snarfed commented Jan 23, 2025

Yes! The original bug here is still open. It's medium priority for us right now, I'd love to get to it, but I don't have an ETA for you, sorry.

@Steviemac1
Copy link

Understood, thanks Ryan, that covers what I needed to know nicely.

@Steviemac1
Copy link

There’s a new variation to the problem of some Images not previewing when posts are bridged to Bluesky.

I've just noticed that several bridged posts today are showing an image, but it’s the same image for all posts, not the correct image. They all show the image from the first of my bridged posts today.

You can see this on posts 18 Feb on @Ayethatllbright.flipboard.com.ap.brid.gy for the examples.

@snarfed
Copy link
Owner Author

snarfed commented Feb 18, 2025

Whoa, weird! Thanks for reporting, will look.

@snarfed
Copy link
Owner Author

snarfed commented Feb 18, 2025

Ah, looks like this is due to snarfed/granary#885 . We started linking unbridged @-mentions like @music-stories-Ayethatllbright in these posts, and we're using that @-mention for the link preview instead of the article link.

cc @Daft-Freak, I wonder what we should do here. Ideally we wouldn't generate link previews for @-mentions, but when we generate link previews for Bluesky, those aren't clearly distinguished, they're all link facets:

bridgy-fed/atproto.py

Lines 912 to 926 in 4bd8ebb

# if there are any links, generate an external embed as a preview
# for the first link
if ret.get('$type') == 'app.bsky.feed.post' and not ret.get('embed'):
for facet in ret.get('facets', []):
if feats := facet.get('features'):
if feats[0]['$type'] == 'app.bsky.richtext.facet#link':
try:
link = web.Web.load(feats[0]['uri'], metaformats=True,
authorship_fetch_mf2=False,
raise_=False)
except AssertionError as e:
# we probably have an Object already stored for this URL
# with source_protocol that's not web
logger.warning(e)
continue

We could use whether the link text starts with @ as a heuristic? Not ideal, but it'd work for now. Oh, we do have the original AS1 around at that point, I guess we could look up the link URL in the AS1's tags. Hmm.

@Daft-Freak
Copy link
Contributor

Ah, and tags are processed before links so it'll always be first... I guess we need to track if something was originally a link, though the only idea I have right now is sticking some extra property on the facet for a bit...

@snarfed
Copy link
Owner Author

snarfed commented Feb 18, 2025

Right now I'm thinking we only consider URLs for previews if they're not in the AS1's objectType: mention tags. I can try that now.

@snarfed
Copy link
Owner Author

snarfed commented Feb 18, 2025

Damn, looks like that fix ^ didn't work. Example: https://bsky.app/profile/FlipboardBR.flipboard.com.ap.brid.gy

@Daft-Freak
Copy link
Contributor

Daft-Freak commented Feb 18, 2025

Oh, this is "fun"... An example from @Ayethatllbright.flipboard.com:

We start off with a mention tag with a url of https://flipboard.com/magazines/pVHZKTloSLm0WD5oXGnD2w:m:4047857365 and a displayName of @[email protected].

translate_ids can't translate that url, so it gets set to None

Source.postprocess_object then adds a new mention tag with a url of https://flipboard.com/@ayethatllbright/music-stories-kl8tia9pz and a displayName of @music-stories-Ayethatllbright which are... both different, but refer to the same thing.

Then the actual links are parsed from the content... last.

Edit: the only remotely helpful thing here is that the first url is a redirect to the second...

Edit 2: perhaps just to get back to how things were before, we collect the "fallback" link facets and append them after handling everything else?

@snarfed
Copy link
Owner Author

snarfed commented Feb 18, 2025

Hah, I've been debugging too, and saw the same thing. And the redirect threw me for a minute too, whee.

Edit 2: perhaps just to get back to how things were before, we collect the "fallback" link facets and append them after handling everything else?

Hmm maybe! That would definitely work, but it feels a bit brittle. I'm actually maybe inclined to go with your earlier suggestion, seems a bit more robust...?

I guess we need to track if something was originally a link, though the only idea I have right now is sticking some extra property on the facet for a bit...

@snarfed
Copy link
Owner Author

snarfed commented Feb 18, 2025

Goddamn complexity 😆 😢

@snarfed
Copy link
Owner Author

snarfed commented Feb 19, 2025

...ah, and I see what you mean, even an extra property on the facet wouldn't help us because neither the name nor the URL directly match the HTML link in content. Sigh.

So then, probably either the leading '@' character heuristic, or we have to follow redirects for all mention tag URLs before comparing them here.

snarfed added a commit that referenced this issue Feb 19, 2025
…ed @-mentions

for #1615 (comment), cc @Daft-Freak. just uses leading @ character as a heuristic. definitely not ideal, but noticeably simpler than actually doing this "right." 😐
@snarfed
Copy link
Owner Author

snarfed commented Feb 19, 2025

Woo, looks like the @ heuristic ^ did the trick, eg https://bsky.app/profile/FlipboardBR.flipboard.com.ap.brid.gy/post/3liiktsoykry2

@Steviemac1
Copy link

Steviemac1 commented Feb 20, 2025

Thanks for correcting that repeated image on links. The underlying issue of images not appearing remains and I know is on the radar to fix at some point.

Just to reflect on that, some of the bridged posts with links used to have an image appear, others didn't. Of the latest bridged posts, I haven't seen any with an image. That could be coincidence or related to the issue/fix for repeated image.

Later: And then an hour or two after a couple of posts with images came through… 🤷‍♂️

snarfed added a commit to snarfed/granary that referenced this issue Feb 25, 2025
…ntent

convert HTML content/summary to plain text description

for snarfed/bridgy-fed#1615
@snarfed
Copy link
Owner Author

snarfed commented Feb 25, 2025

Fixed the raw HTML here. Previews from those sites still aren't great, but they're at least better.

https://bsky.app/profile/fedi.test.snarfed.org/post/3liy4734pxhu2 :
Image

Realistically, to get these previews better on the long tail of sites with less common markup, I'd need to switch to a third party link preview generator. Maybe someday!

@snarfed snarfed closed this as completed Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug User-facing breakage and reliability issues within Bridgy Fed. now
Projects
None yet
Development

No branches or pull requests

4 participants