Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsing tags on word boundaries #319

Merged
merged 1 commit into from
Aug 16, 2023
Merged

Conversation

MatthewBaggins
Copy link
Collaborator

closes #318

@@ -59,7 +59,7 @@ def parse_tag(text: str) -> Optional[str]:
return None
tag_val = match.group(1)
tag_pat = _tag_pat.replace(r"\s", " ")
tag_idx = tag_pat.lower().find(tag_val.lower())
tag_idx = tag_pat.lower().find(rf"\b{tag_val.lower()}\b") + 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the + 2 part about, please?

Copy link
Collaborator Author

@MatthewBaggins MatthewBaggins Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's to offset the index so that it matches the start of the tag in the pattern
"\b" is 2 characters

(inb4: I considered getting rid of the regex and just iterating over all the tags and returning the first one that occurs in the string (if any) but in that case, we wouldn't be able to match on word boundaries, so the simplest reliable approach is to use regex anyway, so we can just leave it as is.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh? \b is an escape sequence that matches zero characters at the word boundary, the lenght of the regex input shouldn't matter, only the match, no? or what am I missing here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, it's a non-regex find inside a string that represents a regex for future match 🤦

I'm not sure why we have a list of tags in the format of a string that represents regex, but if it's not possible to use an actual list of tags like ['tag1', 'tag2'], then I guess the code is fine...

@MatthewBaggins MatthewBaggins merged commit 997d3df into master Aug 16, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Removing tags: problems with lowercase and parentheses
2 participants