-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dump metadata along with document fragment #2019
Comments
This could be done with a filter, e.g. import Text.Pandoc.JSON
import Text.Pandoc
import Data.Aeson.Encode
import Data.Aeson.Types
import Data.ByteString.Lazy.UTF8
import Data.List
import qualified Data.Map as M
main :: IO ()
main = toJSONFilter inputMeta
inputMeta :: Pandoc -> Pandoc
inputMeta (Pandoc m b) = Pandoc m (mb:b)
where
mb = RawBlock (Format "html") $
"<!-- metadata:\n" ++ toString (encode $ metaToJSON m) ++ "\n-->"
metaToJSON :: Meta -> Value
metaToJSON (Meta m) = toJSON $ M.map metaValueToJSON m
metaValueToJSON :: MetaValue -> Value
metaValueToJSON (MetaMap m) = toJSON $ M.map metaValueToJSON m
metaValueToJSON (MetaList xs) = toJSON $ map metaValueToJSON xs
metaValueToJSON (MetaString t) = toJSON t
metaValueToJSON (MetaBool b) = toJSON b
metaValueToJSON (MetaInlines ils) = toJSON $ toHtml ils
metaValueToJSON (MetaBlocks bs) = toJSON $ toHtml' bs
toHtml :: [Inline] -> String
toHtml ils = html
where
html = writeHtmlString options $ Pandoc nullMeta [Plain ils]
toHtml' :: [Block] -> String
toHtml' bs = writeHtmlString options $ Pandoc nullMeta bs
options :: WriterOptions
options = def{writerHTMLMathMethod=MathML Nothing} I don't think this makes a ton of sense as a core functionality. I would, however, appreciate a built-in |
Note, that whatever you want this for, you are probably better off just straight up writing a filter for it. You can choose between Haskell, Python, or in fact any language that can handle JSON input and output (e.g. NodeJS). Haskell and Python are supported though. You might want to look at http://johnmacfarlane.net/pandoc/scripting.html |
I have experimented with this approach, and I think I can make it work, but it is suboptimal. For context, I am trying to improve the metadata handling in liob/pandoc_reader, which uses Pandoc as the front end for a static site generator, Pelican; Pelican is written in Python. In this context, I am reluctant to require the Haskell compiler or the Pandoc libraries; the current code uses only the command-line tool. Now, if I'm writing a filter in other-than-Haskell, I can't get at
becomes
which I would then split at the |
Thinking out loud, a potential fix is (1) a new AST node type that means "render what's under this node in output format X and then quote it as a string literal for the surrounding context", (2) some way of generating a custom JSON tree (rather than a literal serialization of the AST). (1) might also be useful for, like, embedding examples of the rendered output in format X in a document of format Y. The thing I originally asked for seems simpler overall, and easier to implement, though. |
Can you not write a filter which just dumps the JSON to a file? If you then really want them in the same file you can then just cat the metadata dump and the output pandoc produces together. |
@mpickering The JSON structure passed to the filter is
The JSON structure I want is
The only way to get to B from A is to pass back through Pandoc's HTML generator. |
I haven't followed this in detail, but if you know the format of the structure ahead of time, could you just write a custom template? Note that template variable values will be interpreted as Markdown. Templates provide a mechanism for iterating across arrays and for object/property structures. |
I don't know the structure ahead of time; it appears that there is no way to iterate over all available variables, nor discriminate variables by origin, nor to recursively walk an unknown tree structure. Also, it appears that there is no way to request any sort of syntactic quotation. |
Look, in absolute majority of cases, if pandoc is installed, so is haskell runtime. It means that, at the very least, you can run haskell filters through pandoc itself. It is suboptimal in terms of speed, but since it's not used for dynamic content generation of some sort, it shouldn't be a big concern. |
+++ Nikolay Yakimov [Mar 24 15 07:28 ]:
Many people install pandoc with the binary packages, which don't install |
The following change to the HTML writer would add a diff --git a/src/Text/Pandoc/Writers/HTML.hs b/src/Text/Pandoc/Writers/HTML.hs
index 53dc931..93834c1 100644
--- a/src/Text/Pandoc/Writers/HTML.hs
+++ b/src/Text/Pandoc/Writers/HTML.hs
@@ -43,6 +43,8 @@ import Text.Pandoc.XML (fromEntities, escapeStringForXML)
import Network.URI ( parseURIReference, URI(..), unEscapeString )
import Network.HTTP ( urlEncode )
import Numeric ( showHex )
+import qualified Data.Aeson as Aeson
+import Text.Pandoc.UTF8 (toStringLazy)
import Data.Char ( ord, toLower )
import Data.List ( isPrefixOf, intersperse )
import Data.String ( fromString )
@@ -194,6 +196,7 @@ pandocToHtml opts (Pandoc meta blocks) = do
defField "revealjs-url" ("reveal.js" :: String) $
defField "s5-url" ("s5/default" :: String) $
defField "html5" (writerHtml5 opts) $
+ defField "meta-json" (toStringLazy $ Aeson.encode metadata) $
metadata
return (thebody, context) This could be used with a custom template like
to get what @zackw is looking for. So, one possible change to pandoc would be to define a |
Better, more general, patch, affecting all writers: diff --git a/src/Text/Pandoc/Writers/Shared.hs b/src/Text/Pandoc/Writers/Shared.hs
index 800e741..cc9e59d 100644
--- a/src/Text/Pandoc/Writers/Shared.hs
+++ b/src/Text/Pandoc/Writers/Shared.hs
@@ -45,7 +45,8 @@ import Text.Pandoc.Options (WriterOptions(..))
import qualified Data.HashMap.Strict as H
import qualified Data.Map as M
import qualified Data.Text as T
-import Data.Aeson (FromJSON(..), fromJSON, ToJSON (..), Value(Object), Result(..))
+import Data.Aeson (FromJSON(..), fromJSON, ToJSON (..), Value(Object), Result(..), encode)
+import Text.Pandoc.UTF8 (toStringLazy)
import qualified Data.Traversable as Traversable
import Data.List ( groupBy )
@@ -67,7 +68,8 @@ metaToJSON opts blockWriter inlineWriter (Meta metamap)
renderedMap <- Traversable.mapM
(metaValueToJSON blockWriter inlineWriter)
metamap
- return $ M.foldWithKey defField baseContext renderedMap
+ let metadata = M.foldWithKey defField baseContext renderedMap
+ return $ defField "meta-json" (toStringLazy $ encode metadata) metadata
| otherwise = return (Object H.empty)
metaValueToJSON :: Monad m |
I like this as long as it does the Right Thing with complicated quoting cases like
this being only what I could think of off the top of my head, I'm sure there are nastier constructs. |
+++ Zack Weinberg [Mar 28 15 15:22 ]:
It should, because we're using a robust and well tested json library to generate the json. |
Adding a 👍 here, because I have very similar needs to those outlined by @zackw, and he and I ended up independently working around the issue elsewhere (see liob/pandoc_reader#3, liob/pandoc_reader#4, and liob/pandoc_reader#5). I also note that integration with other possible static site generators could be a big win, since pandoc is in my experience meaningfully faster than many other implementations. (E.g. it moves at least twice as fast as the standard Python Markdown implementation—I just compared the two on a ~16k-line test file with as close to the same settings for parsing as possible, and it runs in half the time. For a file 10⨉ that size… well, Python Markdown just falls down; it never finished. 😛) |
Note that you don't need a filter to dump the metadata to a file. All you
then invoke pandoc with
and then decode 'metadata.yaml' with your nearest YAML parser! The
|
@bpj That's true in a general sense, but it doesn't get at the issue here, and it certainly doesn't give you the data back in a format (e.g. JSON) readily transformed or handed around within another application, which is the context which drove @zackw's request (and is my interest as well): both of us are using pandoc to drive Pelican, and are doing a bit of a dance to handle YAML metadata in that context. |
I've added |
I am looking for a way to get pandoc to dump out metadata along with a document fragment. This should behave as follows:
An example would probably help: given
pandoc -t html5+metadata --mathml
should produce something likeIt's quite possible that there's already a way to do something like this and I just can't find it, in which case I would appreciate a pointer.
A way to dump only the metadata, but still applying a rendering, would also be useful.
The text was updated successfully, but these errors were encountered: