-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Workaround for backreference #512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As far as I can tell, there isn't a simple workaround that works for backreference in general, but the one here can at least be generated easily by concatenating all possible letters (limiting ourselves to lowercase ASCII characters): a{3,}|b{3,}|c{3,}|d{3,}|e{3,}|f{3,}|g{3,}|h{3,}|i{3,}|j{3,}|k{3,}|l{3,}|m{3,}|n{3,}|o{3,}|p{3,}|q{3,}|r{3,}|s{3,}|t{3,}|u{3,}|v{3,}|w{3,}|x{3,}|y{3,}|z{3,} I know it's probably not the answer you were hoping for and it's not pretty, but that's the only option I see for your use case. When the number of possible captures for what you need to reference is relatively small, you can usually get away with generating all the possibilities with a script like this one in JS for instance, but that approach might not be suitable in more complex problems: function generateRegexPattern() {
let pattern = Array.from({ length: 26 }, (_, i) => {
let char = String.fromCodePoint(97 + i); // ASCII 97 is 'a'
return `${char}{3,}`;
}).join('|');
return `${pattern}`;
}
console.log(generateRegexPattern()); Side note for disconnected matchingThe fact that the backreference is looking at the very previous character makes the solution much easier than if the 2 instances of what we are matching are seperated. However, for simple examples we can still manage to do it. For instance, if we have a small subset of html tags that we want to match, we could generate a regex that matches any sequence for those tags: eg.: function generateRegexFromTemplate(template, values) {
let pattern = values.map(value => template.replaceAll("{{matchingOption}}", value)).join('|');
return `(${pattern})`;
}
// Example usage
let regexTemplate = "<{{matchingOption}}>.*?<\\/{{matchingOption}}>";
let matchingOptions = ["div", "span", "p", "a", "ul", "li", "table", "tr", "td"];
console.log(generateRegexFromTemplate(regexTemplate, matchingOptions)); Which gives: (<div>.*?<\/div>|<span>.*?<\/span>|<p>.*?<\/p>|<a>.*?<\/a>|<ul>.*?<\/ul>|<li>.*?<\/li>|<table>.*?<\/table>|<tr>.*?<\/tr>|<td>.*?<\/td>) |
I'm currently trying to get strings that contain 3 or more consecutive characters with the following regex:
for example: "mikeeee", "dylaaan", etc but bigquery complains
Cannot parse regular expression: invalid escape sequence: \1
how can I get around this limitation since I'm aware that backreference isn't supported. Any help would be much appreciated!
The text was updated successfully, but these errors were encountered: