Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some characters break grid tables #9145

Open
lggruspe opened this issue Oct 14, 2023 · 6 comments
Open

Some characters break grid tables #9145

lggruspe opened this issue Oct 14, 2023 · 6 comments

Comments

@lggruspe
Copy link
Contributor

Explain the problem.

pandoc example.md -f markdown+grid_tables -s

The first table works just fine. The second table breaks even though the only difference is that some entries have additional diacritics (U+031E COMBINING DOWN TACK BELOW)

+-----------+-----------+-----------+-----------+-------+
| vowels    | front                 | central   | back  |
|           +-----------+-----------+           |       |
|           | unrounded | rounded   |           |       |
+===========+===========+===========+===========+=======+
| close     | i         | y         |           | u     |
+-----------+-----------+-----------+-----------+-------+
| mid       | e         | ø         |           | o     |
+-----------+-----------+-----------+-----------+-------+
| open      |           |           | ä         |       |
+-----------+-----------+-----------+-----------+-------+

+-----------+-----------+-----------+-----------+-------+
| vowels    | front                 | central   | back  |
|           +-----------+-----------+           |       |
|           | unrounded | rounded   |           |       |
+===========+===========+===========+===========+=======+
| close     | i         | y         |           | u     |
+-----------+-----------+-----------+-----------+-------+
| mid       || ø̞         |           ||
+-----------+-----------+-----------+-----------+-------+
| open      |           |           | ä         |       |
+-----------+-----------+-----------+-----------+-------+

Pandoc version?
pandoc 3.1.8, fedora 38

@lggruspe lggruspe added the bug label Oct 14, 2023
@jgm
Copy link
Owner

jgm commented Oct 17, 2023

jgm/doclayout has code to calculate widths for East Asian characters (which can be 0, 1, or 2 width). But this character is not in the lookup table doclayout uses -- presumably because it's a western accent combining character. We should modify doclayout so that it knows about all combining characters, not just those in the East Asian set.

Moving this issue to doclayout, where it will need to be addressed.

@jgm jgm transferred this issue from jgm/pandoc Oct 17, 2023
@jgm jgm reopened this Oct 17, 2023
@jgm
Copy link
Owner

jgm commented Oct 17, 2023

Sorry: there is in fact code to deal with these combining characters, so I'm not yet sure what is going on here.

@jgm
Copy link
Owner

jgm commented Oct 17, 2023

OK, transferring back to pandoc.
I think the problem lies there. Note if you take off the fancy header, the second table works fine:

+-----------+-----------+-----------+-----------+-------+
| close     | i         | y         |           | u     |
+-----------+-----------+-----------+-----------+-------+
| mid       | e̞         | ø̞         |           | o̞     |
+-----------+-----------+-----------+-----------+-------+
| open      |           |           | ä         |       |
+-----------+-----------+-----------+-----------+-------+

So, my guess is that somewhere in the code that was added to deal with row/colspans, someone used T.length instead of realLength.

@jgm jgm transferred this issue from jgm/doclayout Oct 17, 2023
@jgm
Copy link
Owner

jgm commented Oct 17, 2023

A more minimal test case:

+-----------+-------+
| a         | back  |
+===========+=======+
| ø̞         | o̞     |
+-----------+-------+

This is parsed as having rowspan=2. Replacing the characters on the bottom row with regular ascii makes the problem go away.

@jgm
Copy link
Owner

jgm commented Oct 17, 2023

The problem seems to lie in the gridtables package, which parses this table as:

ArrayTable {arrayTableCells = array ((1,1),(2,2)) [((1,1),ContentCell 2 1 [" a         ","==========="," \248\798         "]),((1,2),ContentCell 2 1 [" back  ","======="," o\798     "]),((2,1),ContinuationCell (1,1)),((2,2),ContinuationCell (1,2))], arrayTableHead = Nothing, arrayTableFoot = Nothing, arrayTableColSpecs = array (1,2) [(1,(AlignDefault,11)),(2,(AlignDefault,7))]

@tarleb can you have a look at this?

@jgm
Copy link
Owner

jgm commented Oct 17, 2023

Submitted issue quarto-dev/gridtables#10

@tarleb tarleb self-assigned this Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants