Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserving <i>, <b> and <cite> in non-HTML formats #6515

Closed
snan opened this issue Jul 9, 2020 · 11 comments
Closed

Preserving <i>, <b> and <cite> in non-HTML formats #6515

snan opened this issue Jul 9, 2020 · 11 comments

Comments

@snan
Copy link

snan commented Jul 9, 2020

So the position seems to be that em and strong are OK for * and ** but then how about this idea, as per discussed on commonmark's issue tracker: that we can write <i>, <cite> and <b> and have them be understood by pandoc (as an exception to most other HTML tags just being passed through or dropped) and turned to italics and bold in other formats, like ContTeXt and LaTeX?

@snan snan changed the title Preserving <i>, <b> and <cite> in non-HTML formats Preserving <i>, <b> and <cite> in non-HTML formats Jul 9, 2020
@mb21
Copy link
Collaborator

mb21 commented Jul 10, 2020

For HTML input this already works...

echo '<i>foo</i>' | pandoc -f html -t native
[Plain [Emph [Str "foo"]]]

But you're talking about markdown input, or..?

echo '<i>foo</i>' | pandoc -f markdown -t native
[Para [RawInline (Format "html") "<i>",Str "foo",RawInline (Format "html") "</i>"]]

true, there it's converted to raw HTML, which will only work for HTML output and be dropped otherwise... but why would you want to use HTML <i> inside markdown?

@snan
Copy link
Author

snan commented Jul 10, 2020

Yes, using a markdown source document to generate html and other output formats, including some custom lua ones.

@snan
Copy link
Author

snan commented Jul 10, 2020

I'm way OK with another solution to create b and i and cite too. We've previously argued for having * and ** always be i and b. I'm not invested in one particular solution, I just want to find some way to not generate In <em>Hamlet</em>, Shakespeare says…

@mb21
Copy link
Collaborator

mb21 commented Jul 10, 2020

I just want to find some way to not generate In <em>Hamlet</em>

ah, you can write a lua filter to change the native Emph [Str "foo"] into RawInline (Format "html") "<i>",Str "foo",RawInline (Format "html") "</i>"]... or post-process the HTML...

@snan
Copy link
Author

snan commented Jul 10, 2020 via email

@mb21
Copy link
Collaborator

mb21 commented Jul 10, 2020

haha, I don't get it. If you want to use portable markdown, just use * and **...? some markdown processors will convert to <em>, and a few weird ones might convert to <i>... so yes, you can post-process the HTML then I guess..?

@snan
Copy link
Author

snan commented Jul 10, 2020 via email

@mb21
Copy link
Collaborator

mb21 commented Jul 10, 2020

Ah, so this is a duplicate of #4297 ?

@snan
Copy link
Author

snan commented Jul 10, 2020 via email

@tarleb
Copy link
Collaborator

tarleb commented Jul 10, 2020

If I understand correctly, then you want to write <i>Saccharomyces cerevisiae</i> and have it appear verbatim in HTML, but as "normal" emphasized text in other formats? This would be doable with a Lua filter. Another filter-based solution would be to write [Escherichia coli]{.species}. The corresponding filter would be

function Span (span)
  if not span.classes:includes 'species' then
    return nil
  elseif FORMAT:match 'html' then
    return {pandoc.RawInline('html', '<i>')} .. span.content .. {pandoc.RawInline('html', '</i>')}
  else
    return pandoc.Emph(span.content)
  end
end

(Untested)

@tarleb
Copy link
Collaborator

tarleb commented Jul 10, 2020

For completeness, here's a filter which allows using <i> as an alternative to * or _.

if FORMAT:match 'html' then
  return {}
end

local List = require 'pandoc.List'

function is_italics_start(inln)
  return inln.t == 'RawInline'
    and inln.format:match 'html'
    and inln.text:match '<i>'
end

function is_italics_end(inln)
  return inln.t == 'RawInline'
    and inln.format:match 'html'
    and inln.text:match '</i>'
end

function Inlines (inlns)
  local result = List()
  local italics = nil
  for _, inln in ipairs(inlns) do
    if is_italics_start(inln) then
      italics = List()
    elseif is_italics_end(inln) then
      result:insert(pandoc.Emph(italics))
      italics = nil
    elseif italics then
      italics:insert(inln)
    else
      result:insert(inln)
    end
  end
  return result
end

Any additional discussions should happen on the pandoc-discuss mailing list.

@tarleb tarleb closed this as completed Jul 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants