Skip to content

Commit

Permalink
FIX: Strip detection text before truncation (#196)
Browse files Browse the repository at this point in the history
Besides removing images, we also want to make sure that we truncate the text _after_ removing the image, otherwise text sent for detection would be empty.

e.g. a cooked post that looks like that

`<p></p><div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://asd.cloudfront.net/original/4X/c/d/d/asd.jpeg\" data-download-href=\"/uploads/short-url/asd.jpeg?dl=1\" title=\"IMG_20928\"><img src=\"https://asd.asd.net/optimized/4X/c/d/d/asd.jpeg\" alt=\"IMG_2029\" data-base62-sha1=\"asd\" width=\"666\" height=\"500\" srcset=\"https://asd.cloudfront.net/optimized/4X/c/d/d/asd.jpeg, https://asd.cloudfront.net/optimized/4X/c/d/d/asd.jpeg 1.5x, https://asd.cloudfront.net/optimized/4X/c/d/d/asd.jpeg 2x\" data-dominant-color=\"767065\"><div class=\"meta\">\n<svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use href=\"#far-image\"></use></svg><span class=\"filename\">IMG_2029</span><span class=\"informations\">1920×1440 742 KB</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use href=\"#discourse-expand\"></use></svg>\n</div></a></div>\n<p>L’església romànica de Santa Margarida.</p>`

should strip the `div.lightbox` and send `<p>L’església romànica de Santa Margarida.</p>` but is sending `<p></p>` now due to the mis-order.
  • Loading branch information
nattsw authored Jan 17, 2025
1 parent 620d774 commit 97edd7d
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 2 deletions.
5 changes: 3 additions & 2 deletions app/services/discourse_translator/base.rb
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,9 @@ def self.strip_tags_for_detection(detection_text)
end

def self.text_for_detection(topic_or_post)
strip_tags_for_detection(
get_text(topic_or_post).truncate(DETECTION_CHAR_LIMIT, omission: nil),
strip_tags_for_detection(get_text(topic_or_post)).truncate(
DETECTION_CHAR_LIMIT,
omission: nil,
)
end

Expand Down
5 changes: 5 additions & 0 deletions spec/services/base_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,11 @@ class EmptyTranslator < DiscourseTranslator::Base
post.cooked = text
expect(DiscourseTranslator::Base.text_for_detection(post)).to eq(text)
end

it "strips text before truncation" do
post.cooked = "<img src='http://example.com/image.png' />" + "a" * 1000
expect(DiscourseTranslator::Base.text_for_detection(post)).to eq("a" * 1000)
end
end

describe ".text_for_translation" do
Expand Down

0 comments on commit 97edd7d

Please sign in to comment.