Skip to content

Commit

Permalink
epub: fix fatal errors while parsing EPUB files
Browse files Browse the repository at this point in the history
After generating the EPUB file for the Elixir docs with this version,
and reviewing the result with `epubcheck`, I got the following summary:

```console
$ epubcheck doc/elixir/Elixir.epub --json elixir_docs.json                                                                                                    (base)

Check finished with errors
Messages: 0 fatals / 141 errors / 0 warnings / 0 infos
```

If you compare the previous result with what we had on #1851

```
Messages: 9 fatals / 425 errors / 0 warnings / 0 infos
```

you can see that now we don't have messages with `fatal` severity and we
have reduced considerably the number of errors =)

I manually checked the generated EPUB on Apple Books and the previous
truncated sections are solved, I don't see the banner _Below is a
rendering of the page up to the first error_ and also the links to
anchor different anchor seems to work.

Fixes: #1851
  • Loading branch information
milmazz committed Jan 26, 2024
1 parent 67e03ea commit 1d8ee00
Show file tree
Hide file tree
Showing 3 changed files with 49 additions and 2 deletions.
29 changes: 27 additions & 2 deletions lib/ex_doc/formatter/epub.ex
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,23 @@ defmodule ExDoc.Formatter.EPUB do
Path.relative_to_cwd(epub)
end

@doc """
Helper that replaces anchor names and links that could potentially cause problems on EPUB documents
This helper replaces all the `&` with `&` found in anchors like
`Kernel.xhtml#&&/2` or `<h2 id="&&&/2-examples" class="section-heading">...</h2>`
These anchor names cause a fatal error while EPUB readers parse the files,
resulting in truncated content.
For more details, see: https://github.com/elixir-lang/ex_doc/issues/1851
"""
def fix_anchors(content) do
content
|> String.replace(~r{id="&+/\d+[^"]*}, &String.replace(&1, "&", "&amp;"))
|> String.replace(~r{href="[^#"]*#&+/\d+[^"]*}, &String.replace(&1, "&", "&amp;"))
end

defp normalize_config(config) do
output =
config.output
Expand All @@ -63,7 +80,11 @@ defmodule ExDoc.Formatter.EPUB do
for {_title, extras} <- config.extras do
Enum.each(extras, fn %{id: id, title: title, title_content: title_content, content: content} ->
output = "#{config.output}/OEBPS/#{id}.xhtml"
html = Templates.extra_template(config, title, title_content, content)

html =
config
|> Templates.extra_template(title, title_content, content)
|> fix_anchors()

if File.regular?(output) do
ExDoc.Utils.warn("file #{Path.relative_to_cwd(output)} already exists", [])
Expand Down Expand Up @@ -157,7 +178,11 @@ defmodule ExDoc.Formatter.EPUB do
end

defp generate_module_page(module_node, config) do
content = Templates.module_page(config, module_node)
content =
config
|> Templates.module_page(module_node)
|> fix_anchors()

File.write("#{config.output}/OEBPS/#{module_node.id}.xhtml", content)
end

Expand Down
18 changes: 18 additions & 0 deletions test/ex_doc/formatter/epub_test.exs
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,9 @@ defmodule ExDoc.Formatter.EPUBTest do
assert content =~
~r{<a href="TypesAndSpecs.Sub.xhtml"><code(\sclass="inline")?>TypesAndSpecs.Sub</code></a>}

assert content =~
~r{<a href="https://hexdocs.pm/elixir/Kernel.html#&amp;&amp;/2"><code(\sclass="inline")?>&amp;&amp;/2</code></a>}

content = File.read!(tmp_dir <> "/epub/OEBPS/nav.xhtml")
assert content =~ ~r{<li><a href="readme.xhtml">README</a></li>}
end
Expand Down Expand Up @@ -248,4 +251,19 @@ defmodule ExDoc.Formatter.EPUBTest do
after
File.rm_rf!("test/tmp/epub_assets")
end

describe "fix_anchors/1" do
test "adapts anchor names to avoid parsing errors from EPUB readers" do
for {source, expected} <- [
{~S|<a href="Kernel.SpecialForms.xhtml#&/1">its documentation</a>|,
~S|<a href="Kernel.SpecialForms.xhtml#&amp;/1">its documentation</a>|},
{~S|<a href="Kernel.xhtml#&&/2"><code class="inline">&amp;&amp;/2</code></a>|,
~S|<a href="Kernel.xhtml#&amp;&amp;/2"><code class="inline">&amp;&amp;/2</code></a>|},
{~S|<h2 id="&&&/2-examples" class="section-heading">title</h2>|,
~S|<h2 id="&amp;&amp;&amp;/2-examples" class="section-heading">title</h2>|}
] do
assert ExDoc.Formatter.EPUB.fix_anchors(source) == expected
end
end
end
end
4 changes: 4 additions & 0 deletions test/fixtures/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,7 @@ hello
## more > than

<p><strong>raw content</strong></p>

The following text includes a reference to an anchor that causes problems in EPUB documents.

To remove this anti-pattern, we can replace `&&/2`, `||/2`, and `!/1` by `and/2`, `or/2`, and `not/1` respectively.

0 comments on commit 1d8ee00

Please sign in to comment.