Entropic Thoughts

Article previews in RSS

Article previews in RSS

article-previews-in-rss.jpg

Since about three years past time immemorial, the rss feed for this site has been very anaemic. It had article titles and dates, and that was it. Many readers have requested that I include the full article in the feed, or at least a preview, but I’ve always put it off because it has sounded difficult to accomplish technically.


The way the rss feed for this site is generated is in two steps:

  1. First a little loop in Emacs Lisp runs through the first few items of a sorted and filtered list of files belonging to this project. This loop constructs an org-element syntax tree for the rss feed, and renders it out to a temporary file as an Org mode file.
  2. Then the regular Org exporting framework takes over and exports that file as an rss file using the ox-rss backend.

Article contents are not involved anywhere in this process! The feed is built entirely from file metadata. Any time I’ve pondered adding contents, I’ve balked at the complexity of the problem:

It’s not that I couldn’t solve all these problems, but I don’t want to take the time for it, because it would be a lot of time.


Yesterday I decided to simplify the problem: no internal links, no images, no code evaluation, etc. Just the first four basic elements of the article, like paragraphs, tables, and lists. Getting those from an article at a file path is possible with something like this.

(defun tw-get-teaser-contents (source-path)
  "Get org-element data of early parts of SOURCE-PATH."
  (with-temp-buffer
    ;; Load file contents into Org mode.
    (insert-file-contents source-path)
    (org-mode)
    ;; Take the first four elements ...
    (seq-take
     ;; ... from a subset of the full parse sequence.
     (org-element-map (org-element-parse-buffer)
         '(paragraph src-block quote-block plain-list)
       #'identity
       nil nil '(quote-block plain-list))
     4)))

Then we have to strip some difficulties (internal links), performance problems (code block evaluation), and non-essentials (sidenotes) from this. There are good ways to do it, but I discovered a roundabout way to do it instead. First I created an export backend for a limited subset of html.1 The code I’m actually running for this is a smidge more complicated to be able to handle my excessive use of small caps, etc. But the principle is there..

;; Custom backend for lo-fi RSS-embedded HTML.
(org-export-define-derived-backend 'teaser-html 'html
  :translate-alist
  '((link . (lambda (_ contents _) (or contents ""))
    (footnote-reference . (lambda (_ _ _) "")))))

This can be used to produce an rss friendly rendering of the teaser contents.

(defun tw-get-teaser-html (path)
  "Get HTML representing teaser for Org file at PATH."
  ;; Don't syntax highlight code in RSS.
  (let ((org-html-htmlize-output-type nil))
    ;; Practically unbind function to prevent code block
    ;; evaluation without disrupting other babel processing.
    ;;
    ;; This uses cl-letf rather than advice in order to
    ;; revert the change when control leaves this function.
    (cl-letf (((symbol-function 'org-babel-execute-src-block)
               (lambda (&rest _) "")))
      (org-export-string-as
       ;; Render data back as Org source markup.
       (mapconcat #'org-element-interpret-data
                  (tw-get-teaser-contents path)
                  "\n\n")
       ;; Then export that as limited HTML.
       'teaser-html t '(:with-footnotes nil)))))

Now we have lo-fi html, but the source for the rss file is an org-element syntax tree, so we have to embed the article preview as an html export block in the syntax tree.

(defun tw-feed-summary (path)
  (org-element-create 'export-block
    (list :type "HTML"
          :value (tw-get-teaser-html path))))

This produces an org-element node that can then be inserted into the Org document that makes up the rss feed.

There is still one problem remaining after this. The ox-rss export backend installs an output filter that prettifies the xml of the rss file by, among other things, running indent-region on the entire buffer.2 A clever trick for figuring out where a function is called is to run (debug-on-entry 'indent-region) and then look at the stack trace. Press c to continue in the debugger. Then cancel-debug-on-entry for the same function stops it from entering the debugger all the time. This does not respect indentation in the cdata blocks – including code blocks where indentation really matters! This does not appear configurable, but we can override the function that does this to do nothing instead.

(advice-add 'org-rss-final-function :override
            (lambda (contents _ _) contents))

There! That’s it, I think. It was far from trivial (it took a few hours to get right, after all), but also easier than I would have thought.


But also, this is terrible, terrible code. I’m sure there is a neat way to do it, but what I’ve done is not it. Think about it: this code parses an Org document, keeps some parts of it, renders it back to Org, exports that to html, embeds that in another Org file, then exports that once again to a html-derived format. Back and forth, back and forth.

When people ask me to share the code I have written to patch up Org to make this site the way I want, I tell them they would be better off reading the Emacs manual and the code for Org, because of shit like this. I don’t know this stuff very well, so I just wing it. It doesn’t turn out nice. I don’t want anyone to think they are learning anything from it!