Another blog update

2025-10-24

In my last post, I mentioned that I was thinking of rewriting my web server from Go to Rust and essentially creating my own static site generator. Fortunately, in the meantime, I realized that there was a much simpler solution right under my nose.

The main issue I wanted to solve was that it was a bit annoying to convert the Markdown posts I’ve been writing into HTML to be served on my website. As I described in an earlier post, I’ve been using pandoc and m4 to perform this conversion, along with a couple of scripts that I’ve been running manually. This was better than the even more manual process I was following before, but it still wasn’t good enough. At the time, I was excited to cobble something together with m4 and some shell scripts, showing off snippets like:

syscmd({{sed -n '/def load_blogs/,/^$/p' scripts/publish.py | sed '$d'}})dnl

as a triumph. Again, this was somehow better than my earlier workflow with org-mode or writing HTML by hand, but the mental overhead of remembering how m4 worked was still keeping me from writing posts. You might reasonably be thinking that I wouldn’t really have to think about m4 in every post because it should only pop up when I want to use funny commands like this. Unfortunately, that is not the case. The default command syntax in m4 uses backticks to delimit commands, which means I always had to remember to include:

changequote(`{{', `}}')dnl

at the top of my posts, or every code snippet throughout the file would be mangled. And there was also a weird spacing quirk with newlines around this command that I still don’t fully understand. I don’t want to disparage m4 in general, but it really didn’t feel like the right tool for this job.

So what have I replaced it with? This Python script:

import subprocess
from pathlib import Path

from watchfiles import Change, watch

from publish import load_blogs, update_blogs

BLOG_FILE = "json/blogs.json"
BLOGS = load_blogs(BLOG_FILE)


def expand_templates(contents: str) -> str:
    """
    This is where I'd put my jinja templates, if I had any
    """
    return contents


def run_pandoc(contents: str) -> str:
    return subprocess.run(
        ["pandoc", "-f", "markdown", "-t", "html"],
        check=True,
        capture_output=True,
        text=True,
        input=contents,
    ).stdout


def update(path):
    path = Path(path)
    contents = path.read_text()
    html = run_pandoc(expand_templates(contents))
    out_path = (Path("blogs") / path.name).with_suffix(".html")
    out_path.write_text(html)

    update_blogs(BLOG_FILE, BLOGS, str(out_path))


if __name__ == "__main__":
    subprocess.Popen(["make", "run"])
    for changes in watch("drafts/"):
        for change, path in changes:
            match change:
                case Change.modified:
                    update(path)
                case Change.added if (Change.deleted, path) in changes:
                    update(path)

The main exciting thing here is the use of the watchfiles package to watch my drafts/ directory for changed files and to fire off the commands to update the corresponding HTML file when they change. These commands currently just include shelling out to pandoc, but as the comment notes, I’ve added an expand_templates function where I can fill in jinja templates at some point. I plan to use this for cases like the sed command above where I want to include a whole code file or a section of a file. This should be a lot nicer than the m4 commands I tried to write because I can write all of the logic in Python and have something like:

{{include("/path/to/file.py")}}

in the body of my post. I can also keep reusable functions like this within the watch.py script above instead of defining them within each post (or including another m4 file or whatever I was doing before).

Now if I want to write a post, all I have to do is open the repo, kick off this watch.py script, and start editing a file in the drafts/ directory. Saving it for the first time will automatically handle everything else.

The very last pain point is not too visible here, but the old publish.py file, which I can mostly delete now, looks like this with the imports and __main__ block hidden for brevity:

@dataclass
class Blog:
    Title: str
    Filename: str
    Date: str


def load_blogs(filename):
    with open(filename) as f:
        return [Blog(**d) for d in json.load(f)]


def update_blogs(filename, blogs, name):
    if not any((b.Filename == name for b in blogs)):
        date = datetime.today().strftime("%Y-%m-%d")
        blogs.insert(0, Blog(name, name, date))
    with open(filename, "w") as out:
        json.dump(blogs, out, default=lambda o: asdict(o), indent=4)

I’m just using the path to the HTML file as both the Title and Filename. If I could just extract a real filename from Markdown metadata at the start of the input, this setup would essentially be perfect. It looks like this already works with my basic pandoc command above, so I just need to update my Python scripts to get the title from a line like this at the start of the file:

% My title here

Now I might start writing these more often.