fanf: (Default)
[personal profile] fanf

https://dotat.at/@/2024-05-05-frontmatter.html

As is typical for static site generators, each page on my web site is generated from a file containing markdown with YAML frontmatter.

Neither markdown nor YAML are good. Markdown is very much the worse-is-better of markup languages; YAML, on the other hand, is more like better-is-worse. YAML has too many ways of expressing the same things, and the lack of redundancy in its syntax makes it difficult to detect mistakes before it is too late. YAML's specification is incomprehensible.

But they are both very convenient and popular, so I went with the flow.

multiple documents

A YAML stream may contain several independent YAML documents delimited by --- start and ... end markers, for example:

    ---
    document: 1
    ...
    ---
    document: 2
    ...

string documents

The top-level value in a YAML document does not have to be an array or object: you can use its wild zoo of string syntax too, so for example,

    --- |
    here is a preformatted
    multiline string

frontmatter and markdown

Putting these two features together, the right way to do YAML frontmatter for markdown files is clearly,

    ---
    frontmatter: goes here
    ...
    --- |
    markdown goes here

The page processor can simply:

  • feed the contents of the file to the YAML parser
  • use the first document for metadata
  • feed the second document to the markdown processor
  • check that's the end of the file

No need for any ad-hoc hacks to separate the two parts of the file: the YAML acts as a lightweight wrapper for the markdown.

markdown inside YAML

The crucial thing that makes this work is that the markdown after the --- | delimiter does not need to be indented.

Markdown is very sensitive to indentation, so all the tooling (most importantly my editor) gets righteously confused if markdown is placed in a container that introduces extra indentation.

YAML in Perl

The static site generator for www.dns.cam.ac.uk uses --- | to mark the start of the markdown in its source files. This worked really nicely.

The web site was written in Perl, because most of the existing DNS infrastructure was Perl and I didn't want to change programming languages. YAML was designed by Perl hackers, and the Perl YAML modules are where it all went wrong started.

YAML in other languages

The static site generator for https://dotat.at is written in Rust, using serde-yaml.

I soon discovered that, unlike the original YAML implementations, serde-yaml requires top-level strings following --- | to be indented. This bug seems to be common in YAML implementations for languages other than Perl.

As a result I had to add a bodge to the page processor:

  • split the file using a regex
  • feed the first part to the YAML parser
  • feed the second part to the markdown processor

mainstream frontmatter

To make my bodge somewhat more tolerable, I made sure that it was compatible with other similar bodges.

For instance, Pandoc supports YAML metadata, and Emacs markdown mode supports Pandoc-style YAML metadata, so the road to hell is at least reasonably well paved.

grump

It works, but it doesn't make me happy. I suppose I deserve the consequences of choosing technology with known deficiencies. But it requires minimal effort, and is by and large good enough.

Date: 2024-05-06 13:08 (UTC)
emperor: (Default)
From: [personal profile] emperor
Might the serde-yaml authors be prepared to relax the erroneous indentation requirement?

December 2025

S M T W T F S
 123456
78910111213
14151617181920
21222324 252627
28293031   

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated 2026-01-10 08:39
Powered by Dreamwidth Studios