Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use array reduction when cleaning paragraphs #906

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JKingweb
Copy link
Contributor

The current code for cleaning non-content paragraphs is straightforward, but brittle against change: one must remember to modifiy both the list of element counts as well as totalCount. This change removes the need to modify totalCount (or indeed to set any more variables) by instead having a list of tag names reduce()d into totalCount in one operation. It's slightly less readable, but it seemed worth it to me.

This is more robust if the list of content tags is ever changed, since we no longer need to add terms to totalCount
Copy link
Contributor

@gijsk gijsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good thing to identify that this code is a little repetitive. However, I think we can juice this particular fruit a little more - I left a comment inline.

Comment on lines +841 to +846
// We use an array reduction here so the counts of elements are summed
// without anyone having to make further code edits if the list of
// content tags is changed.
var totalCount = contentTags.reduce(function (total, tag) {
return total + paragraph.getElementsByTagName(tag).length;
}, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm a little lost. Any reason not to use this._getAllNodesWithTag(paragraph, contentTags).length ? That would seem easier than hand-rolling the reduce.

The other option, for reasons of speed, would be to iterate over contentTags (without a nested function) and to return false as soon as we find any of the content tags (as the return below will be false as soon as we find any of the elements in question). It might also make sense to put the _getInnerText call first in that case.

Both of those would still have the same benefit in terms of not requiring manual copy-pasting for each kind of inner tag that we're looking for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean: I fixed one problem of weird methodology in the original, but not the other. I'll iterate further when I find some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants