Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SedRegex: add configuration option for regex delimiter #1610

Open
baldurmen opened this issue Dec 17, 2024 · 3 comments
Open

SedRegex: add configuration option for regex delimiter #1610

baldurmen opened this issue Dec 17, 2024 · 3 comments

Comments

@baldurmen
Copy link
Contributor

Hi,

At the moment, the SedRegex plugins supports a lot of different delimiters by default (", #, /, etc.) but also comes with ' hardcoded.

Sadly, this causes a lot of false-positives in the French channels I'm in, as sentences like s'en aller en bateau, c'est intéressant et agréable will cause a match because of the two apostrophes used. Indeed, this is seen as the same as s/en aller en bateau, c/est intéressant et agréable/, which isn't likely to be a match :P

Would it be possible to add a configuration option to either allowlist wanted delimiters (which IMO makes more sense) or to blocklist some (if you prefer this option).

Cheers!

@anarcat
Copy link

anarcat commented Dec 17, 2024

@pollo told me the logic behind this is in https://github.com/progval/Limnoria/blob/master/plugins/SedRegex/constants.py and i believe this specific instance can be fixed by adding ' to the exclusion, with [^\w\s'] instead of [^\w\s].

@jlu5
Copy link
Collaborator

jlu5 commented Dec 18, 2024

I would think a list of disallowed separators makes more sense. The actual sed implementation allows all characters, even letters and spaces though they probably aren't as useful. I don't want to be overly restrictive by default, as it's much easier to pick an alternate separator than the usual "/" if your text includes that character.

$ sed 's t b ' <<< test
best
$ sed 'sataba' <<< test
best

@progval
Copy link
Owner

progval commented Dec 19, 2024

we could make it configurable while keeping [^\w\s] as the default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants