-
-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datatable parsing, wrongly escape regexp "\w" #364
Comments
Not really. I expect a single backslash, as promised by the readme.
If you try to put in a data table cell: `a\b`, or `a\\b`, you never get a
single backslash back. It always returns a double backslash.
…On Tue, Feb 4, 2025, 08:10 M.P. Korstanje ***@***.***> wrote:
Does #335 <#335> provide an
answer?
—
Reply to this email directly, view it on GitHub
<#364 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAPWGPBZMCYATF4SIYLZ4UT2OBYVVAVCNFSM6AAAAABWNKRA66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZTGE3DONRTGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Oh yeah. That's definitely a bug. |
I was using a custom data-table parser I built and it handles these situations much better: # match all '|' char, without '\' char behind
COLUMN_SPLIT_REGEXP = re.compile(r"(?<!\\)\|")
def parse_datatable(input_str: str) -> list[list[str]]:
res = []
for line in input_str.split("\n"):
line = line.strip() # noqa: PLW2901
if not line:
continue # skip empty lines
if line.startswith("#"):
continue # skip comment lines
cells = [col.strip().replace(r"\|", "|") for col in COLUMN_SPLIT_REGEXP.split(line)]
# discard content before and after the table delimiter
if cells[0] != "" or cells[-1] != "":
raise ValueError("failed to parse datatable: bad syntax")
res.append(cells[1:-1])
return res |
Spaces are not required to separate the pipes. So at a glance, that would fail against
Which should contain We do have a test case to cover this functionality. So kinda surprising it passes, but Feel free to look into this deeper! |
My implementation indeed fails with your example, but the python gherkin-parser also fails to return the correct content:
returns:
|
👓 What did you see?
I use some regexp on data-tables to perform some assertions.
Parsing the datatable below:
I get
['var2', '[matches regexp] \\\\w+|cde']
and['var3', '[matches regexp] \\\\w+|cde']
, which breaks my matcher.✅ What did you expect to see?
I would expect the second row parsed to be:
['var2', '[matches regexp] \w+|cde']
but instead I get:
['var2', '[matches regexp] \\\\w+|cde']
which breaks my matcher.
I would expect the third row parsed to be:
['var3', '[matches regexp] \w+|cde']
but instead I also get:
['var3', '[matches regexp] \\\\w+|cde']
📦 Which tool/library version are you using?
python 3.10
pytest-bdd 8.1.0
gherkin-official 29.0.0
🔬 How could we reproduce it?
\w+
or other regex pattern.📚 Any additional context?
No response
The text was updated successfully, but these errors were encountered: