Datatable parsing, wrongly escape regexp "\w" #364

neskk · 2025-02-04T00:24:55Z

👓 What did you see?

I use some regexp on data-tables to perform some assertions.
Parsing the datatable below:

And the response body contains:
  | var1      | [matches regexp] abc\|cde     |
  | var2      | [matches regexp] \w+\|cde     |
  | var3      | [matches regexp] \\w+\|cde    |

I get ['var2', '[matches regexp] \\\\w+|cde'] and ['var3', '[matches regexp] \\\\w+|cde'] , which breaks my matcher.

✅ What did you expect to see?

I would expect the second row parsed to be:

['var2', '[matches regexp] \w+|cde']
but instead I get:
['var2', '[matches regexp] \\\\w+|cde']
which breaks my matcher.

I would expect the third row parsed to be:

['var3', '[matches regexp] \w+|cde']
but instead I also get:
['var3', '[matches regexp] \\\\w+|cde']

📦 Which tool/library version are you using?

python 3.10
pytest-bdd 8.1.0
gherkin-official 29.0.0

🔬 How could we reproduce it?

Create a step-defintion that expects a datatable.
Create a feature file that submits the datatable with \w+ or other regex pattern.
Log/print the received datatable.

📚 Any additional context?

No response

The text was updated successfully, but these errors were encountered:

neskk · 2025-02-04T12:18:58Z

Not really. I expect a single backslash, as promised by the readme. If you try to put in a data table cell: `a\b`, or `a\\b`, you never get a single backslash back. It always returns a double backslash.

…

On Tue, Feb 4, 2025, 08:10 M.P. Korstanje ***@***.***> wrote: Does #335 <#335> provide an answer? — Reply to this email directly, view it on GitHub <#364 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAPWGPBZMCYATF4SIYLZ4UT2OBYVVAVCNFSM6AAAAABWNKRA66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZTGE3DONRTGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

mpkorstanje · 2025-02-04T13:03:45Z

Oh yeah. That's definitely a bug.

neskk · 2025-02-04T14:30:38Z

I was using a custom data-table parser I built and it handles these situations much better:

# match all '|' char, without '\' char behind
COLUMN_SPLIT_REGEXP = re.compile(r"(?<!\\)\|")

def parse_datatable(input_str: str) -> list[list[str]]:
    res = []
    for line in input_str.split("\n"):
        line = line.strip()  # noqa: PLW2901
        if not line:
            continue  # skip empty lines
        if line.startswith("#"):
            continue  # skip comment lines

        cells = [col.strip().replace(r"\|", "|") for col in COLUMN_SPLIT_REGEXP.split(line)]

        # discard content before and after the table delimiter
        if cells[0] != "" or cells[-1] != "":
            raise ValueError("failed to parse datatable: bad syntax")
        res.append(cells[1:-1])

    return res

mpkorstanje · 2025-02-04T14:43:39Z

Spaces are not required to separate the pipes. So at a glance, that would fail against

|hello|world|
|\\|\|\||

Which should contain hello, world, \ and ||.

We do have a test case to cover this functionality. So kinda surprising it passes, but Feel free to look into this deeper!

neskk · 2025-02-04T17:34:00Z

My implementation indeed fails with your example, but the python gherkin-parser also fails to return the correct content:

|hello|world|
|\\|\|\||

returns:

[['hello', 'world'], ['\\\\', '||']

mpkorstanje added the 🐛 bug Defect / Bug label Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datatable parsing, wrongly escape regexp "\w" #364

Datatable parsing, wrongly escape regexp "\w" #364

neskk commented Feb 4, 2025 •

edited

Loading

neskk commented Feb 4, 2025 via email

mpkorstanje commented Feb 4, 2025

neskk commented Feb 4, 2025

mpkorstanje commented Feb 4, 2025 •

edited

Loading

neskk commented Feb 4, 2025

Datatable parsing, wrongly escape regexp "\w" #364

Datatable parsing, wrongly escape regexp "\w" #364

Comments

neskk commented Feb 4, 2025 • edited Loading

👓 What did you see?

✅ What did you expect to see?

📦 Which tool/library version are you using?

🔬 How could we reproduce it?

📚 Any additional context?

neskk commented Feb 4, 2025 via email

mpkorstanje commented Feb 4, 2025

neskk commented Feb 4, 2025

mpkorstanje commented Feb 4, 2025 • edited Loading

neskk commented Feb 4, 2025

neskk commented Feb 4, 2025 •

edited

Loading

mpkorstanje commented Feb 4, 2025 •

edited

Loading