Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protect against quadratic generated table size explosion #146

Merged
merged 1 commit into from
Feb 3, 2024

Conversation

notriddle
Copy link
Contributor

@notriddle notriddle commented Feb 1, 2024

Fixes #145

This commit adds a limit to the number of auto-completed cells around 200,000. The result is, in these original samples:

python -c 'N=100; print("x|" * N + "\n" + "-|" * N + "\n" + "x|\n" * N)' | commonmark --extension=pipe_tables | wc -c
102362

This is unchanged.

$ python -c 'N=1000; print("x|" * N + "\n" + "-|" * N + "\n" + "x|\n" * N)' | commonmark --extension=pipe_tables | wc -c
2025878

This, however, is much lower. The parser is refusing to parse the row where the limit is hit, and switches to parsing it as text instead.

$ python -c 'N=10000; print("x|" * N + "\n" + "-|" * N + "\n" + "x|\n" * N)' | commonmark --extension=pipe_tables | wc -c
2240258

It's not quadratic any more (without this commit, it was 1000230062).

At the limit, 200,000 autocompleted rows is approximately "</td><td>".len * 200000 generated data, or 1.8Mbyte of data. The above examples show about 2Mbytes, thanks to the row tags and text overhead, and when the parser breaks out of the table it's still including the remaining "rows" as text.

This commit adds a limit to the number of auto-completed cells
around 200,000. The result is, in these original samples:

    python -c 'N=100; print("x|" * N + "\n" + "-|" * N + "\n" + "x|\n" * N)' | commonmark --extension=pipe_tables | wc -c
    102362

This is unchanged.

    $ python -c 'N=1000; print("x|" * N + "\n" + "-|" * N + "\n" + "x|\n" * N)' | commonmark --extension=pipe_tables | wc -c
    2025878

This, however, is much lower. The parser is refusing to parse the row where the limit is hit,
and switches to parsing it as text instead.

    $ python -c 'N=10000; print("x|" * N + "\n" + "-|" * N + "\n" + "x|\n" * N)' | commonmark --extension=pipe_tables | wc -c
    2240258

It's not quadratic any more (without this commit, it was 1000230062).

At the limit, 200,000 autocompleted rows is approximately "</td><td>".len * 200,000
generated data, or 1.8Mbyte of data. The above examples show about 2Mbytes,
thanks to the row tags and text overhead, and when the parser breaks out of the table
it's still including the remaining "rows" as text.
@jgm jgm merged commit 6c07e48 into jgm:master Feb 3, 2024
6 checks passed
@notriddle notriddle deleted the notriddle/table-size-limit branch February 3, 2024 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Quadratic output size explosion with tables extension
2 participants