Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use array reduction when cleaning paragraphs #906

Merged
merged 2 commits into from
Oct 10, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 9 additions & 6 deletions Readability.js
Original file line number Diff line number Diff line change
Expand Up @@ -832,15 +832,18 @@ Readability.prototype = {
);

// Remove extra paragraphs
// At this point, nasty iframes have been removed; only embedded video
// ones remain.
var contentTags = ["img", "embed", "object", "iframe"];
this._removeNodes(
this._getAllNodesWithTag(articleContent, ["p"]),
function (paragraph) {
var imgCount = paragraph.getElementsByTagName("img").length;
var embedCount = paragraph.getElementsByTagName("embed").length;
var objectCount = paragraph.getElementsByTagName("object").length;
// At this point, nasty iframes have been removed, only remain embedded video ones.
var iframeCount = paragraph.getElementsByTagName("iframe").length;
var totalCount = imgCount + embedCount + objectCount + iframeCount;
// We use an array reduction here so the counts of elements are summed
// without anyone having to make further code edits if the list of
// content tags is changed.
var totalCount = contentTags.reduce(function (total, tag) {
return total + paragraph.getElementsByTagName(tag).length;
}, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm a little lost. Any reason not to use this._getAllNodesWithTag(paragraph, contentTags).length ? That would seem easier than hand-rolling the reduce.

The other option, for reasons of speed, would be to iterate over contentTags (without a nested function) and to return false as soon as we find any of the content tags (as the return below will be false as soon as we find any of the elements in question). It might also make sense to put the _getInnerText call first in that case.

Both of those would still have the same benefit in terms of not requiring manual copy-pasting for each kind of inner tag that we're looking for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean: I fixed one problem of weird methodology in the original, but not the other. I'll iterate further when I find some time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JKingweb any chance you had a moment to look at this again? :-)


return totalCount === 0 && !this._getInnerText(paragraph, false);
}
Expand Down