No-one really needs three different word processing programs.
Yet that’s the situation I’m currently in. My six-year-old MacBook Pro is on its last legs and I’m desperately trying to eke out as much free storage space and processing power as possible, meaning a bit of spring cleaning is in order. Unfortunately for me, I’ve amassed text-based documents in (among others) .docx, .pages and .odt formats. Office for Mac has only recently added support for Open Document formats and I’m reluctant to get rid of LibreOffice, the originating program.
From a digital preservation perspective, my Documents folder is a mess. Converting all of these into more sustainable formats will, I’ve decided, be Future Alissa’s problem. But that doesn’t mean I need to keep living an unhealthy document lifestyle.
Instead, I’ve decided to try out one of the more intriguing lessons on The Programming Historian: ‘Sustainable Authorship in Plain Text using Pandoc and Markdown’. Any document that I would normally write in Pages will instead be written in a plaintext editor using Markdown and typeset in .docx or .pdf using Pandoc. I’ve been using Markdown for a while to write these blog posts, but Pandoc is a new experience.
Briefly, Markdown is a text markup language that is intended to be human-readable in a way HTML isn’t. Pandoc is a command-line program to convert one markup format into another, such as HTML to .docx (which at heart is an XML format). The primary benefit is that the manuscript (which is a plaintext .md file), will never need specialised word processing software to read and will remain intelligible to human eyes. Additional information that would otherwise be incorporated into a .docx or .pages file, such as bibliographic data and footnote stylesheets, is saved separately. These are also plaintext and easily human-readable.
There are plenty of reasons to kick the word processor habit (neatly summarised in this blog post by W. Caleb McDaniel). Personally, I spend way too much time mucking around with formatting before I even begin to type. A plain-text typing environment has no such distractions, allowing me to concentrate on content. If I need to bold or italicise something, for example, I can do that in Markdown without interrupting my sentence flow.
You’d be forgiven for asking, ‘Why bother with all this, when there are easier options?’ Certainly it’s a challenge for those unfamiliar with the command line. There’s also a lot this method won’t include–complex tables, mail merge, interactive elements, et cetera. And yes, there are plenty of other distraction-free apps out there. In the long run, however, I’m looking forward to three things:
1) a more fruitful and painless typing experience
2) not wasting hours of my life converting documents from one format to another (yes, this has been known to take me hours) and
3) improving my command-line and markup skills.
What I did, briefly
After installing Pandoc, and following the Programming Historian’s instructions (though I chose to forego LaTeX and hence .pdf conversion for want of disk space), I created a nice little test .md file, incorporating images, links and footnotes, in a nice desktop plaintext editor called Atom.
I then ran a Pandoc command in Terminal to convert the .md file to a .docx file. Disappointingly, the program did not return anything to suggest it had been successful. A quick $ ls
, however, revealed the new file.
I also converted the .md manuscript into .odt and .html, just to see what might happen and if there were any differences.
How it ended up
As it turned out, the .docx and .odt conversions were missing the footnotes and .html was missing the header (which is not standard Markdown, but rather a Pandoc extension), meaning that none of the target formats included 100% of the Markdown content. Considering I had done absolutely no styling, the .docx was surprisingly eye-catching.
I don’t know why parts were missing from each target file, but I plan to investigate why before using Pandoc more extensively for research work. Despite not quite getting all the output I was promised, I wasn’t dissuaded from using Markdown and Pandoc for my long-form writing. The tutorial goes into some depth on footnotes and bibliographies, which I didn’t have time to test and which might well solve my problem.
Ironically, a copy of Matthew Kirschenbaum’s Tracked Changes, a history of word processors and their effect on the art of writing, arrived at the post office while I was compiling this article. In a way, adopting Markdown and Pandoc is an effort to get back to those, uh, halcyon days of formatting-free word processing. Hopefully when I re-examine my Documents folder in a few years’ time, it will be full of plaintext files!