Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If some cells are undefined, the filter can generate an invalid AST #2

Open
wlupton opened this issue Sep 22, 2023 · 11 comments
Open

Comments

@wlupton
Copy link
Member

wlupton commented Sep 22, 2023

If some cells are undefined, the filter can generate an invalid AST. For example, with this markdown:

::: list-table
* - A
  - B

* - 1
:::

...we get the following (edited to show just the offending rows; rep.lua reports the AST using the logging module; see below):

% pandoc-3.1.8 -L list-table.lua list-table-bug.md -L rep.lua -t html -o /dev/null
(#) blocks Blocks[1] {
  [1] Table {
    ...
    bodies: List[1] {
      [1] {
        ...
        body: List[1] {
          [1] pandoc.Row {
            ...
            cells: List[1] {
              [1] pandoc.Cell {
                alignment: "AlignDefault"
                attr: Attr {
                  attributes: AttributeList {}
                  classes: List {}
                  identifier: ""
                }
                col_span: 1
                contents: Blocks[1] {
                  [1] Plain {
                    content: Inlines[1] {
                      [1] Str "1"
                    }
                  }
                }
                row_span: 1
              }
            }
          }
        }
        ...

The row only contains a single cell, so this is an invalid AST and isn't handled correctly by all writers (see jgm/pandoc#9096 for further discussion).

The filter should be updated to add empty cells for any missing ones.

Here's rep.lua

local logging = require 'logging'
function Pandoc(pandoc)
    if logging.loglevel > 0 then
        logging.temp('meta', pandoc.meta)
    end
    logging.temp('blocks', pandoc.blocks)
end
@wlupton
Copy link
Member Author

wlupton commented Oct 25, 2024

Looking at this again, I'm not sure that the filter should be responsible for adding any missing cells.

Updating the filter to do this would add considerable complexity, because it currently just copies list items to table cells without having to worry about whether any are missing and, in particular, without having to do rowspan and colspan processing.

So I propose that this should be the responsibility of the writer. If we agree on this then this issue can be closed, but #3 would still be relevant.

Any thoughts on this? @jgm? @tarleb?

@tarleb
Copy link
Member

tarleb commented Oct 25, 2024

In general it seems okay for a writer not to render malformed tables, but it should only affect that table, not the whole document. At the same time, I think it would be appropriate for pandoc to provide a function that turns a malformed table into a well-formed one, and that such a function should be exported to Lua.

@wlupton
Copy link
Member Author

wlupton commented Nov 4, 2024

Thanks @tarleb. Should I create a jgm/pandoc issue for this, or does such an issue already exist?

I propose making this issue depend on the above jgm/pandoc issue.

@jgm
Copy link

jgm commented Nov 4, 2024

I thought the existing table builder in Text.Pandoc.Builder already did something like this?

@wlupton
Copy link
Member Author

wlupton commented Nov 4, 2024

In this case the list-table lua filter created the Table object directly. Does this bypass Text.Pandoc.Builder?

@tarleb
Copy link
Member

tarleb commented Nov 4, 2024

I thought the existing table builder in Text.Pandoc.Builder already did something like this?

I forgot about that. The filters bypass the builder functions, but that would offer a good way to normalize the AST. I'll look into it.

@tarleb
Copy link
Member

tarleb commented Nov 4, 2024

Please do open a new issue

@wlupton
Copy link
Member Author

wlupton commented Nov 5, 2024

I've created jgm/pandoc#10356.

@bpj
Copy link

bpj commented Nov 5, 2024

FWIW my (no longer working) list-table filter did go out of its way to make sure that all rows were the same length simply by padding short rows with empty cells towards the end. It contained code like

-- NB at this stage rows[1] is the header row
local col_count = 0
-- First loop: get column count
for _, row in ipairs(rows) do
  if #row > col_count then
    col_count = #row
  end
end
-- Second loop: adjust row lengths
for _, row in ipairs(rows) do
  while #row < col_count do
    row[#row + 1] = { }
  end
end
-- Adjust widths list
while #widths < col_count do
  widths[#widths + 1] = 0.0
end
-- Adjust the alignments list
-- by simply copying the last explicit alignment!
if 0 == #aligns then
  aligns[1] = 'AlignDefault'
end
while #aligns < col_count do
  aligns[#aligns + 1] = aligns[#aligns]
end

(This wasn’t the actual code since my filter was written in MoonScript, but it compiled to Lua code equivalent to the above.)

My filter generated a SimpleTable, so things like colspan weren’t any concern.

It did require two loops over the rows list but neither loop is very complicated.

Instead of setting missing alignments to AlignDefault — unless there were no alignments — I copied the last alignment actually present. The rationale for that is that often all colums have the same aligment or the stub (leftmost) column is left aligned and all the rest are right or center aligned. The div around the list (which of course contained the class telling the filter that this was a list-table, but also some other formatting attributes) took an attribute aligns. In the common case where all cols had the same alignment you could just say aligns=c and in the second commonest case you could say aligns="lc" rather than aligns="ccccc" or aligns="lcccc". Of course there was a mapping with fields like d = 'AlignDefault', somewhere.

Nowadays I would use an re pattern {| ( %s* [dlcr] -> align_for %s* )* !. |} where align_for is the above-mentioned mapping, to parse the attribute value and create the list variable in one go, and similarly for the widths {| ( %s* %d+ -> pcnt2dec %s* )* !. |} where

pcnt2dec = function(p)
  return tonumber(p / 100)
end

(since I prefer using percentages for the widths!)

@wlupton
Copy link
Member Author

wlupton commented Nov 6, 2024

@tarleb, is jgm/pandoc#10356 close enough to what you were envisaging?

@tarleb
Copy link
Member

tarleb commented Nov 6, 2024

It is, thanks. I probably won't do a full integration (for performance reasons), but want to provide a function that runs the AST through the builder to get a normalized document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants