Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: Strip detection text before truncation #196

Merged
merged 1 commit into from
Jan 17, 2025
Merged

Conversation

nattsw
Copy link
Contributor

@nattsw nattsw commented Jan 17, 2025

Besides removing images, we also want to make sure that we truncate the text after removing the image, otherwise text sent for detection would be empty.

e.g. a cooked post that looks like that

<p></p><div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://asd.cloudfront.net/original/4X/c/d/d/asd.jpeg\" data-download-href=\"/uploads/short-url/asd.jpeg?dl=1\" title=\"IMG_20928\"><img src=\"https://asd.asd.net/optimized/4X/c/d/d/asd.jpeg\" alt=\"IMG_2029\" data-base62-sha1=\"asd\" width=\"666\" height=\"500\" srcset=\"https://asd.cloudfront.net/optimized/4X/c/d/d/asd.jpeg, https://asd.cloudfront.net/optimized/4X/c/d/d/asd.jpeg 1.5x, https://asd.cloudfront.net/optimized/4X/c/d/d/asd.jpeg 2x\" data-dominant-color=\"767065\"><div class=\"meta\">\n<svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use href=\"#far-image\"></use></svg><span class=\"filename\">IMG_2029</span><span class=\"informations\">1920×1440 742 KB</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use href=\"#discourse-expand\"></use></svg>\n</div></a></div>\n<p>L’església romànica de Santa Margarida.</p>

should strip the div.lightbox and send <p>L’església romànica de Santa Margarida.</p> but is sending <p></p> now due to the mis-order.

Copy link
Contributor

@Drenmi Drenmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@nattsw nattsw merged commit 97edd7d into main Jan 17, 2025
3 checks passed
@nattsw nattsw deleted the strip-before-truncate branch January 17, 2025 08:47
nattsw added a commit that referenced this pull request Jan 22, 2025
Besides removing images, we also want to make sure that we truncate the text _after_ removing the image, otherwise text sent for detection would be empty.

e.g. a cooked post that looks like that

`<p></p><div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://asd.cloudfront.net/original/4X/c/d/d/asd.jpeg\" data-download-href=\"/uploads/short-url/asd.jpeg?dl=1\" title=\"IMG_20928\"><img src=\"https://asd.asd.net/optimized/4X/c/d/d/asd.jpeg\" alt=\"IMG_2029\" data-base62-sha1=\"asd\" width=\"666\" height=\"500\" srcset=\"https://asd.cloudfront.net/optimized/4X/c/d/d/asd.jpeg, https://asd.cloudfront.net/optimized/4X/c/d/d/asd.jpeg 1.5x, https://asd.cloudfront.net/optimized/4X/c/d/d/asd.jpeg 2x\" data-dominant-color=\"767065\"><div class=\"meta\">\n<svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use href=\"#far-image\"></use></svg><span class=\"filename\">IMG_2029</span><span class=\"informations\">1920×1440 742 KB</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use href=\"#discourse-expand\"></use></svg>\n</div></a></div>\n<p>L’església romànica de Santa Margarida.</p>`

should strip the `div.lightbox` and send `<p>L’església romànica de Santa Margarida.</p>` but is sending `<p></p>` now due to the mis-order.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants