Posted on 2023-06-20 · 12 min read · haskell
Contents
When writing a blog post that feels academic—or pretentious—enough to invoke the need for citations,
having them automatically generated feels like a mandatory requirement.
I can only shudder to imagine the alternatives.
The good news is that LaTeX has solved this problem long ago;
we now have BibTeX as a file format,
and any number of programs,
like
The rest of the post will be a step by step explanation of how I arrived at this result,
hopefully in a generic enough way such that the ideas presented here
may translate to other problems that can be solved with pandoc filters.
If you just want the code, however, the relevant commits are
here,
here,
and here.
Integrating this into a basic Hakyll setup is not much more complicated,
as some kind soul has given us
Hakyll.Web.Pandoc.Biblio—a library specifically written to make use of pandoc’s bibliography handling.
In fact, Jasper Van der Jeugt himself created a tutorial for this.
Slightly abbreviated, it goes a bit like the following.
First, we compile the csl and BibTeX files in our main function:
Next, we create a compiler
with the help of the
Since this kind of setup is so common,
it is already packaged up in
With all these pieces in place,
this compiler can now be used in place of the default
one would write
And that’s all there is to it!
Citations should now work out of the box using the
While the basic setup really is this easy,
incorporating citations into a real-world Hakyll code base
proves to be slightly more difficult—not to mention addressing all of my neuroses.
Let’s get straight into it.
which represents a full document plus some metadata.
The Block and Inline types then
contain more fine-grained stylistic information,
like the presence of lists, tables, bold text, and so on.
Writing a filter is further simplified by the Walkable
type class,
with which it becomes trivial to promote an
A basic filter materialises:
Notice that the type of this function transforms an
The easiest way I found to incorporate a
So
We can now easily incorporate the
And that’s it!
Citations can now be added using the
When the toc is saved in the
The last
This can now be incorporated into
All we need to do now is to actually create that header.
Again inspecting the html, one can spy a line along the lines of
when the references start.
Inserting a heading above that sounds like a plan.
This can again be done using filters,
this time inside our
The citations now look like this:
Much better:
Next, notice that there is no link from the label in the text
to the actual citation at the end.
This seems pretty inconvenient,
as at least I often jump to citations whose label is
unfamiliar—just to get an idea what kind of article it is.
Pandoc does accept a
One could also introduce a Hakyll metadata field
if this is to be done conditionally,
but I see no reason to not link citations,
so I didn’t.
Everything works as expected:
Lastly,
and this is perhaps the most important modification,
I think that not having a table-like
look—aligning labels and not letting the citation information run under its label—looks
a bit awkward.
As so many times before, pandoc filters come to the rescue here.
There is a
This may look kind of scary,
but for our simple use-case there is the aptly named
Basically, any particular cell is composed of a number of
Just signaling an
All that’s left is to nicely align everything:
the right side of the table will,
in general,
span multiple lines—in stark contrast to the label.
This is again just a few lines of css:
We get the expected result already showcased in the introduction.
For completeness, here it is again:
Neat.
biblatex
or natbib
,
to generate good-looking citations from that.
Further, everyone’s favourite document format converter—pandoc—has excellent support for leveraging this functionality.
All that’s left is to integrate this into Hakyll;
and to play around with it, of course!
Introduction§
Pandoc—using the citeproc library—can make use of csl, which is an xml-based formatting specification,If you want some more information,
the format describes itself as
in order to decide how the generated citations will be displayed.
There are a lot of styles to choose from;
the Zotero Style Repository alone sports around 10’000 different ones!
I settled on one that closely resembles an xml-based format to describe the formatting of citations, notes and bibliographies, offering:
- An open format
- Compact and robust styles
- Extensive support for style requirements
- Automatic style localization
- Infrastructure for style distribution and updating
- Thousands of freely available styles (Creative Commons by-sa licensed)
biblatex
’s “alphabetic” style.
To not keep anyone in suspense, the final result looks like this:
This is a line citing [Béna67] and [DaPaSt07].
References
[Béna67]
|
J. Bénabou, “Introduction to bicategories.” Bénabou, Jean et al., Reports of the Midwest Category Seminar. Lect. Notes Math. 47, 1–77, 1967.
|
|
[DaPaSt07]
|
B. Day, E. Panchadcharam, and R. Street, “Lax braidings and the lax centre,” in Hopf algebras and generalizations. AMS special session on hopf algebras at the crossroads of algebra, category theory, and topology, evanston, IL, USA, october 23–24, 2004., Providence, RI: American Mathematical Society (AMS), 2007, pp. 1–17.
|
While this style is good enough for now,
it’s still not quite perfect;
suggestions for other styles would be most welcome!
Simple setup§
On the command line, the incantation one needs to write is quite simple:$ pandoc --from=markdown --to=html \
--citeproc --biblatex \
--csl=bib/style.csl --bibliography=bib/bibliography.bib \
FILE.md
main :: IO ()
main = hakyll $ do
-- …
match "bib/style.csl" $ compile cslCompiler
match "bib/bibliography.bib" $ compile biblioCompiler
-- …
readPandocBiblio
function:
myPandocBiblioCompiler :: Compiler (Item String)
myPandocBiblioCompiler = do
csl <- load "bib/style.csl"
bib <- load "bib/bibliography.bib"
getResourceBody
>>= readPandocBiblio defaultHakyllReaderOptions csl bib
>>= pure . writePandoc
pandocBiblioCompiler
:
myPandocBiblioCompiler :: Compiler (Item String)
myPandocBiblioCompiler =
pandocBiblioCompiler "bib/style.csl" "bib/bibliography.bib"
pandocCompiler
;
for example, instead of
-- somewhere in main
compile $ pandocCompiler
>>= loadAndApplyTemplate "default.html" defaultContext
-- somewhere in main
compile $ myPandocBiblioCompiler
>>= loadAndApplyTemplate "default.html" defaultContext
[@citation-name]
syntax.
They appear like this:
This is a line citing [Béna67], and [DaPaSt07].
[Béna67]
J. Bénabou, “Introduction to bicategories.” Bénabou, Jean et al., Reports of the Midwest Category Seminar. Lect. Notes Math. 47, 1–77, 1967.
[DaPaSt07]
B. Day, E. Panchadcharam, and R. Street, “Lax braidings and the lax centre,” in Hopf algebras and generalizations. AMS special session on hopf algebras at the crossroads of algebra, category theory, and topology, evanston, IL, USA, october 23–24, 2004., Providence, RI: American Mathematical Society (AMS), 2007, pp. 1–17.
Integration into my Hakyll setup§
The most important bit is that, instead of a separate compiler, I would really rather have a pandoc filter for this feature. Briefly, filters are transformations of pandoc’s internal representation of a document’s structure.As one can imagine, the use-cases for filters are manifold.
From changing Hakyll’s default syntax highlighting,
to swapping out footnotes and producing this very sidenote,
almost anything one can imagine is possible.
This internal representation is encapsulated in the Pandoc
type
data Pandoc = Pandoc Meta [Block]
Inline -> Inline
function
to a full Pandoc -> Pandoc
transformation.
Luckily, Hakyll.Web.Pandoc.Biblio
exposes a function for this kind of use-case:
processPandocBiblio
:: Item CSL -> Item Biblio -- Formatting boiler plate
-> Item Pandoc -> Compiler (Item Pandoc) -- The actual transformation
processBib :: Item Pandoc -> Compiler (Item Pandoc)
processBib pandoc = do
csl <- load "bib/style.csl"
bib <- load "bib/bibliography.bib"
processPandocBiblio csl bib pandoc
Item Pandoc
!
An Item is a type internal to Hakyll,
which associates a unique identifier to some contents.
In the best case,
one would like to treat this as an implementation detail and not think about it at all.
However, most of Hakyll’s other pandoc functions that let you do ast
transformations—like pandocCompilerWithTransformM
—have a bit of a different api,
accepting only a Pandoc -> Compiler Pandoc
argument.
This means that some care is needed to get everything to type check.
To start, my personal pandoc compiler looks a little bit like this:
myPandocCompiler :: Compiler (Item String)
myPandocCompiler =
pandocCompilerWithTransformM
myReader
myWriter
myTransformations -- Pandoc -> Compiler Pandoc
processPandocBiblio
-like transformation into this was to
write a function that’s like pandocCompilerWithTransformM
,
but accepts a wider input range.
Looking at its definition
already gives some idea as to what needs to be done:
pandocCompilerWithTransformM
:: ReaderOptions -> WriterOptions
-> (Pandoc -> Compiler Pandoc)
-> Compiler (Item String)
pandocCompilerWithTransformM ropt wopt f =
getResourceBody >>= renderPandocWithTransformM ropt wopt f
renderPandocWithTransformM
:: ReaderOptions -> WriterOptions
-> (Pandoc -> Compiler Pandoc)
-> Item String
-> Compiler (Item String)
renderPandocWithTransformM ropt wopt f i =
writePandocWith wopt <$> (traverse f =<< readPandocWith ropt i)
-- readPandocWith :: ReaderOptions -> Item String -> Compiler (Item Pandoc)
pandocCompilerWithTransformM
is defined in terms of renderPandocWithTransformM
,
which in turn has quite a simple implementation.
Notice in particular the traverse f =<< readPandocWith ropt i
bit;
readPandocWith
returns a Compiler (Item Pandoc)
,
so the traverse
above exactly transform our f
into a function that works at the Item
level.
Omitting this yields the desired functions:
myRenderPandocWithTransformM
:: ReaderOptions -> WriterOptions
-> (Item Pandoc -> Compiler (Item Pandoc)) -- this changed!
-> Item String
-> Compiler (Item String)
myRenderPandocWithTransformM ropt wopt f i =
writePandocWith wopt <$> (f =<< readPandocWith ropt i)
myPandocCompilerWithTransformM
:: ReaderOptions -> WriterOptions
-> (Item Pandoc -> Compiler (Item Pandoc)) -- this changed!
-> Compiler (Item String)
myPandocCompilerWithTransformM ropt wopt f =
getResourceBody >>= myRenderPandocWithTransformM ropt wopt f
processBib
function defined above into our existing framework,
adding a traverse
where the old code was:
myPandocCompiler :: Compiler (Item String)
myPandocCompiler =
myPandocCompilerWithTransformM
myReader
myWriter
( traverse myTransformations -- Item Pandoc -> Compiler (Item Pandoc)
<=< processBib -- composed with the new stuff
)
[@citation-name]
syntax mentioned above,
and they look exactly the same as in the simple example.
Now, that particular formatting looks fine,
but I’m sure you can already discern a few not-so-nice bits.
Let’s address the most glaring ones,
at least in my opinion.
Changing the look§
Adding a header for the references§
As it stands now, the references are just dumped at the bottom of the page, without any additional heading. This looks subjectively ugly, so automatically adding one whenever at least one citation is present would be nice. There is another small complication because of my idiosyncratic Hakyll setup: in order to easily control the style of the headings in the table of contents, I pre-generate the toc before the actual compilation of the site. This means that in addition to theprocessBib
function,
we need to change the code in one other place.
The generation looks a little bit like this:
getTocCtx :: Context a -> Compiler (Context a)
getTocCtx ctx = do
writerOpts <- mkTocWriter
toc <- renderPandocWith
defaultHakyllReaderOptions
writerOpts
=<< getResourceBody
pure $ mconcat [ ctx
, constField "toc" $ doStuffWithTheToc toc
]
toc
variable,
it’s already rendered into a string,
which means that it’s time for some good old string manipulations.
The (simplified) html for a typical table of contents looks like this:
<div>
<p>Contents</p>
<ul>
<li>
High level structure
<ul>
<li>Topics</li>
<li>Files</li>
</ul>
</li>
</ul>
</div>
</ul>
block seems to be an appropriate target to attack.
The function to add a References
heading in its place is swiftly written:
addBibHeading :: String -> String
addBibHeading s = T.unpack . mconcat $
[ T.dropEnd 5 before
, "<li><a href=\"#references\">References</a></li></ul>"
, after
]
where
(before, after) = T.breakOnEnd "</ul>" (T.pack s)
getTocCtx
in a straightforward manner:
In the actual code,
I also check for a
bib
boolean field,
in order to decide whether this transformation should actually be applied.
If you are interested in that, see the relevant commit.getTocCtx :: Context a -> Compiler (Context a)
getTocCtx ctx = do
writerOpts <- mkTocWriter
toc <- renderPandocWith
defaultHakyllReaderOptions
writerOpts
=<< getResourceBody
pure $ mconcat [ ctx
, constField "toc" $ addBibHeading (doStuffWithTheToc toc)
-- ^^^^^^^^^^^^^
]
<div id="refs" class="references csl-bib-body" role="doc-bibliography">
processBib
function:
This example also nicely showcases the power of the
instance,
I seamlessly walk over all
Walkable
type class.
Via the
Walkable Block Pandoc
Block
s in the ast
and pick out the ones I’d like to change.
Pretty neat if you ask me.processBib :: Item Pandoc -> Compiler (Item Pandoc)
processBib pandoc = do
csl <- load "bib/style.csl"
bib <- load "bib/bibliography.bib"
fmap insertRefHeading <$> processPandocBiblio csl bib pandoc
-- ^^^^^^^^^^^^^^^^
where
-- Insert a heading for the citations.
insertRefHeading :: Pandoc -> Pandoc
insertRefHeading = walk $ concatMap \case
d@(Div ("refs", _, _) _) ->
[Header 1 ("references", [], []) [Str "References"], d]
block -> [block]
This is a line citing [Béna67], and [DaPaSt07].
References
[Béna67]
J. Bénabou, “Introduction to bicategories.” Bénabou, Jean et al., Reports of the Midwest Category Seminar. Lect. Notes Math. 47, 1–77, 1967.
[DaPaSt07]
B. Day, E. Panchadcharam, and R. Street, “Lax braidings and the lax centre,” in Hopf algebras and generalizations. AMS special session on hopf algebras at the crossroads of algebra, category theory, and topology, evanston, IL, USA, october 23–24, 2004., Providence, RI: American Mathematical Society (AMS), 2007, pp. 1–17.
Prettifying the generated references§
What immediately irks me in the above output is that a single citation is broken up into two lines. Thankfully, this is easily fixed by a tiny bit of css./* Don't split up a citation over multiple lines. */
div.csl-left-margin {
display: inline;
}
div.csl-right-inline {
display: inline;
}
This is a line citing [Béna67] and [DaPaSt07].
References
[Béna67]
J. Bénabou, “Introduction to bicategories.” Bénabou, Jean et al., Reports of the Midwest Category Seminar. Lect. Notes Math. 47, 1–77, 1967.
[DaPaSt07]
B. Day, E. Panchadcharam, and R. Street, “Lax braidings and the lax centre,” in Hopf algebras and generalizations. AMS special session on hopf algebras at the crossroads of algebra, category theory, and topology, evanston, IL, USA, october 23–24, 2004., Providence, RI: American Mathematical Society (AMS), 2007, pp. 1–17.
link-citations
option that controls this behaviour,
which works fine for my purposes.
Setting this can be done directly
by modifying the Meta field of the Pandoc
type:
processBib :: Item Pandoc -> Compiler (Item Pandoc)
processBib pandoc = do
csl <- load "bib/style.csl"
bib <- load "bib/bibliography.bib"
-- We do want to link citations.
p <- withItemBody
(\(Pandoc (Meta meta) bs) -> pure $
Pandoc (Meta $ Map.insert "link-citations" (MetaBool True) meta)
bs)
pandoc
fmap insertRefHeading <$> processPandocBiblio csl bib pandoc
This is a line citing [Béna67] and [DaPaSt07].
References
[Béna67]
J. Bénabou, “Introduction to bicategories.” Bénabou, Jean et al., Reports of the Midwest Category Seminar. Lect. Notes Math. 47, 1–77, 1967.
[DaPaSt07]
B. Day, E. Panchadcharam, and R. Street, “Lax braidings and the lax centre,” in Hopf algebras and generalizations. AMS special session on hopf algebras at the crossroads of algebra, category theory, and topology, evanston, IL, USA, october 23–24, 2004., Providence, RI: American Mathematical Society (AMS), 2007, pp. 1–17.
Table
constructor of the Block
type which we will use:
Table :: Attr -> Caption -> [ColSpec] -> TableHead -> [TableBody] -> TableFoot
-> Block
simpleTable
function:
simpleTable :: [Blocks] -- ^ Headers
-> [[Blocks]] -- ^ Rows
-> Blocks
-- where
type Blocks = Many Block
newtype Many a = Many (Seq a)
Blocks
,
a single row is a bunch (list) of those,
and all rows taken together then form a [[Blocks]]
.
Equipped with this knowledge,
we can just search for an instance of a citation,
which will be two Div
s inside of one Para
inside of one Div
,
and replace accordingly:
-- | Align all citations in a table.
tableiseBib :: Pandoc -> Pandoc
tableiseBib = walk \case
-- Citations start with a <div id="refs" …>
Div a@("refs", _, _) body ->
-- No header needed, we just want to fill in the body contents.
Div a (Many.toList (simpleTable [] (map citToRow body)))
body -> body
where
citToRow :: Block -> [Many Block]
citToRow = map Many.singleton . \case
Div attr [Para [s1, s2]] ->
[Div attr [Plain [s1]], Plain [Space], Plain [s2]]
_ -> error "citToRow: unexpected citation format."
error
here
in case of an unexpected format
was nice for debugging the code—I missed the Para
at first—and at this point I see no reason to change it.
Perhaps it is better to fail fast in theses kinds of situations,
instead of trying to desperately produce something based off garbage input.
The tableiseBib
function can
be incorporated into processBib
in a straightforward fashion:
processBib :: Item Pandoc -> Compiler (Item Pandoc)
processBib pandoc = do
-- …
fmap (tableiseBib . insertRefHeading) <$> processPandocBiblio csl bib p
/* Align citations to the top. */
div#refs td {
vertical-align: top;
}
This is a line citing [Béna67] and [DaPaSt07].
References
[Béna67]
|
J. Bénabou, “Introduction to bicategories.” Bénabou, Jean et al., Reports of the Midwest Category Seminar. Lect. Notes Math. 47, 1–77, 1967.
|
|
[DaPaSt07]
|
B. Day, E. Panchadcharam, and R. Street, “Lax braidings and the lax centre,” in Hopf algebras and generalizations. AMS special session on hopf algebras at the crossroads of algebra, category theory, and topology, evanston, IL, USA, october 23–24, 2004., Providence, RI: American Mathematical Society (AMS), 2007, pp. 1–17.
|
Conclusion§
Since smart people had already written all the hard parts, this was surprisingly easy to add for such a useful feature! Plus, playing around with pandoc filters is always fun. Especially themy{renderPandoc,PandocCompiler}WithTransformM
functions
could—with different names, of course—perhaps be contributed to upstream Hakyll.
A variant of any of the *PandocBiblio
functions
that explicitly accepts a list of additional arguments
to give to citeproc
might also be useful;
there are quite a few metadata fields one can specify,
after all.
Finally, I think a format along the lines of tableiseBib
would be quite nice to have with label-style citations.
However, the current implementation is much too specific to justify living anywhere but a personal configuration.
Some day, maybe.