Skip to content

Commit

Permalink
markup: add --citeproc to pandoc converter
Browse files Browse the repository at this point in the history
Adds the citeproc filter to the pandoc converter if pandoc >= 2.11 is
available.

There are several PRs for it this feature already. However, I think
simply adding `--citeproc` is the cleanest way to enable this feature,
with the option to flesh it out later, e.g., in #7529.

Some PRs and issues attempt adding more config options to Hugo which
indirectly configure pandoc, but I think simply configuring Pandoc via
Pandoc itself is simpler, as it is already possible with two YAML
blocks -- one for Hugo, and one for Pandoc:

    ---
    title: This is the Hugo YAML block
    ---
    ---
    bibliography: assets/pandoc-yaml-block-bibliography.bib
    ...
    Document content with @citation!

There are other useful options, e.g., #4800 attempts to use `nocite`,
which works out of the box with this PR:

    ---
    title: This is the Hugo YAML block
    ---
    ---
    bibliography: assets/pandoc-yaml-block-bibliography.bib
    nocite: |
      @*
    ...
    Document content with no citations but a full bibliography:

    ## Bibliography

Other useful options are `csl: ...` and `link-citations: true`, which
set the path to a custom CSL file and create HTML links between the
references and the bibliography.

The following issues and PRs are related:

- Add support for parsing citations and Jupyter notebooks via Pandoc and/or Goldmark extension #6101
  Bundles multiple requests, this PR tackles citation parsing.

- WIP: Bibliography with Pandoc #4800
  Passes the frontmatter to Pandoc and still uses
  `--filter pandoc-citeproc` instead of `--citeproc`.
- Allow configuring Pandoc #7529
  That PR is much more extensive and might eventually supersede this PR,
  but I think --bibliography and --citeproc should be independent
  options (--bibliography should be optional and citeproc can always be
  specified).
- Pandoc - allow citeproc extension to be invoked, with bibliography. #8610
  Similar to #7529, #8610 adds a new config option to Hugo.
  I think passing --citeproc and letting the users decide on the
  metadata they want to pass to pandoc is better, albeit uglier.
  • Loading branch information
shoeffner committed Jun 4, 2022
1 parent 69f7c73 commit c71d9f8
Show file tree
Hide file tree
Showing 4 changed files with 168 additions and 11 deletions.
10 changes: 6 additions & 4 deletions docs/content/en/content-management/formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Hugo passes reasonable default arguments to these external helpers by default:

- `asciidoctor`: `--no-header-footer -`
- `rst2html`: `--leave-comments --initial-header-level=2`
- `pandoc`: `--mathjax --citeproc`
- `pandoc`: `--mathjax` and, for pandoc >= 2.11, `--citeproc`

{{% warning "Performance of External Helpers" %}}
Because additional formats are external commands, generation performance will rely heavily on the performance of the external tool you are using. As this feature is still in its infancy, feedback is welcome.
Expand Down Expand Up @@ -137,7 +137,8 @@ This will render in your HTML as:
```
You will have to [add MathJax](https://www.mathjax.org/#gettingstarted) to your template to properly render the math.

Additionally, Pandoc enables [citations](https://pandoc.org/MANUAL.html#extension-citations) using, e.g., [BibTeX files](https://en.wikibooks.org/wiki/LaTeX/Bibliography_Management#BibTeX):
For **Pandoc >= 2.11**, you can use [citations](https://pandoc.org/MANUAL.html#extension-citations).
One way is to employ [BibTeX files](https://en.wikibooks.org/wiki/LaTeX/Bibliography_Management#BibTeX) to cite:

```
---
Expand All @@ -149,9 +150,10 @@ bibliography: assets/bibliography.bib
This is a citation: @Doe2022
```

Note that Hugo will **not** pass its metadata YAML block to Pandoc; however, it will pass the **second** meta data block, denoted with `---` and `...` to Pandoc. Thus, all pandoc settings should go there.
Note that Hugo will **not** pass its metadata YAML block to Pandoc; however, it will pass the **second** meta data block, denoted with `---` and `...` to Pandoc.
Thus, all Pandoc settings should go there.

You can also add all elements from a bibliography file (without citing them first) using:
You can also add all elements from a bibliography file (without citing them explicitly) using:

```
---
Expand Down
59 changes: 56 additions & 3 deletions markup/pandoc/convert.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,17 @@
package pandoc

import (
"bytes"
"fmt"
"strings"
"sync"

"github.com/gohugoio/hugo/common/collections"
"github.com/gohugoio/hugo/common/hexec"
"github.com/gohugoio/hugo/htesting"
"github.com/gohugoio/hugo/identity"
"github.com/gohugoio/hugo/markup/internal"

"github.com/gohugoio/hugo/markup/converter"
"github.com/gohugoio/hugo/markup/internal"
)

// Provider is the package entry point.
Expand Down Expand Up @@ -64,7 +69,10 @@ func (c *pandocConverter) getPandocContent(src []byte, ctx converter.DocumentCon
" Leaving pandoc content unrendered.")
return src, nil
}
args := []string{"--mathjax", "--citeproc"}
args := []string{"--mathjax"}
if supportsCitations(c.cfg) {
args = append(args[:], "--citeproc")
}
return internal.ExternallyRenderContent(c.cfg, ctx, src, binaryName, args)
}

Expand All @@ -77,6 +85,51 @@ func getPandocBinaryName() string {
return ""
}

var versionOnce sync.Once

// getPandocVersion parses the pandoc version output
func getPandocVersion(cfg converter.ProviderConfig) (string, error) {
var version string
var err error

versionOnce.Do(func() {
argsv := collections.StringSliceToInterfaceSlice([]string{"--version"})

var out bytes.Buffer
argsv = append(argsv, hexec.WithStdout(&out))

cmd, err := cfg.Exec.New(pandocBinary, argsv...)
if err != nil {
version = ""
return
}

err = cmd.Run()
if err != nil {
cfg.Logger.Errorf("%s --version: %v", pandocBinary, err)
}

outbytes := bytes.Replace(out.Bytes(), []byte("\r"), []byte(""), -1)
output := strings.Split(string(outbytes), "\n")[0]
version = strings.Split(output, " ")[1]
})

return version, err
}

// SupportsCitations returns true for pandoc versions >= 2.11, which include citeproc
func supportsCitations(cfg converter.ProviderConfig) bool {
pandocVersion, err := getPandocVersion(cfg)
supportsCitations := pandocVersion >= "2.11" && err != nil
if htesting.SupportsAll() {
if !supportsCitations {
panic(fmt.Sprintf("pandoc %s does not support citations", pandocVersion))
}
return true
}
return supportsCitations
}

// Supports returns whether Pandoc is installed on this computer.
func Supports() bool {
hasBin := getPandocBinaryName() != ""
Expand Down
104 changes: 100 additions & 4 deletions markup/pandoc/convert_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,18 +25,114 @@ import (
qt "github.com/frankban/quicktest"
)

func TestConvert(t *testing.T) {
func setupTestConverter(t *testing.T) (*qt.C, converter.Converter, converter.ProviderConfig) {
if !Supports() {
t.Skip("pandoc not installed")
}
c := qt.New(t)
sc := security.DefaultConfig
sc.Exec.Allow = security.NewWhitelist("pandoc")
p, err := Provider.New(converter.ProviderConfig{Exec: hexec.New(sc), Logger: loggers.NewErrorLogger()})
cfg := converter.ProviderConfig{Exec: hexec.New(sc), Logger: loggers.NewErrorLogger()}
p, err := Provider.New(cfg)
c.Assert(err, qt.IsNil)
conv, err := p.New(converter.DocumentContext{})
c.Assert(err, qt.IsNil)
b, err := conv.Convert(converter.RenderContext{Src: []byte("testContent")})
return c, conv, cfg
}

func TestConvert(t *testing.T) {
c, conv, _ := setupTestConverter(t)
output, err := conv.Convert(converter.RenderContext{Src: []byte("testContent")})
c.Assert(err, qt.IsNil)
c.Assert(string(b.Bytes()), qt.Equals, "<p>testContent</p>\n")
c.Assert(string(output.Bytes()), qt.Equals, "<p>testContent</p>\n")
}

func runCiteprocTest(t *testing.T, content string, expected string) {
c, conv, cfg := setupTestConverter(t)
if !supportsCitations(cfg) {
t.Skip("pandoc does not support citations")
}
output, err := conv.Convert(converter.RenderContext{Src: []byte(content)})
c.Assert(err, qt.IsNil)
c.Assert(string(output.Bytes()), qt.Equals, expected)
}

func TestCiteprocWithHugoMeta(t *testing.T) {
content := `
---
title: Test
published: 2022-05-30
---
testContent
`
expected := "<p>testContent</p>\n"
runCiteprocTest(t, content, expected)
}

func TestCiteprocWithPandocMeta(t *testing.T) {
content := `
---
---
---
...
testContent
`
expected := "<p>testContent</p>\n"
runCiteprocTest(t, content, expected)
}

func TestCiteprocWithBibliography(t *testing.T) {
content := `
---
---
---
bibliography: testdata/bibliography.bib
...
testContent
`
expected := "<p>testContent</p>\n"
runCiteprocTest(t, content, expected)
}

func TestCiteprocWithExplicitCitation(t *testing.T) {
content := `
---
---
---
bibliography: testdata/bibliography.bib
...
@Doe2022
`
expected := `<p><span class="citation" data-cites="Doe2022">Doe and Mustermann
(2022)</span></p>
<div id="refs" class="references csl-bib-body hanging-indent"
role="doc-bibliography">
<div id="ref-Doe2022" class="csl-entry" role="doc-biblioentry">
Doe, Jane, and Max Mustermann. 2022. <span>“A Treatise on Hugo
Tests.”</span> <em>Hugo Websites</em>.
</div>
</div>
`
runCiteprocTest(t, content, expected)
}

func TestCiteprocWithNocite(t *testing.T) {
content := `
---
---
---
bibliography: testdata/bibliography.bib
nocite: |
@*
...
`
expected := `<div id="refs" class="references csl-bib-body hanging-indent"
role="doc-bibliography">
<div id="ref-Doe2022" class="csl-entry" role="doc-biblioentry">
Doe, Jane, and Max Mustermann. 2022. <span>“A Treatise on Hugo
Tests.”</span> <em>Hugo Websites</em>.
</div>
</div>
`
runCiteprocTest(t, content, expected)
}
6 changes: 6 additions & 0 deletions markup/pandoc/testdata/bibliography.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
@article{Doe2022,
author = "Jane Doe and Max Mustermann",
title = "A Treatise on Hugo Tests",
journal = "Hugo Websites",
year = "2022",
}

0 comments on commit c71d9f8

Please sign in to comment.