Pandoc Converts All Your (Text) Documents

Pandoc conversion networkFor the past few months we ProfHackers have been running an occasional series about using the command line. I got us started with a couple posts explaining why you might want to use the command line and how to get started using it. Konrad followed with a posts about the uniq command and the sort command for working with text and data files. Amy added a post about how the command line let her hack the NOOK Color, and I wrote about using pdftk to manipulate PDFs.

Taking up the command line is easier if you have a specific problem you’re trying to solve. For me, the problem was that I wanted to do all of my writing in a plain text format, like Markdown or LaTeX. But I need to be able to share my writing in a variety of formats: HTML for the web, PDF for printed documents or academic writing, and occasionally RTF or Microsoft Word or OpenOffice.

The best way I’ve found to move between these formats is Pandoc. Pandoc is a command line tool written by a philosophy professor, John MacFarlane. Its general use is to take a document in one format and convert it to another. You can get an idea of the wide variety of formats Pandoc can translate by looking at an enlargement of the header diagram.

Here’s an example of how this works. Suppose that you have a Markdown document like the one we created for the post on Markdown. (View pandoc-example.markdown on GitHub.) You can convert this to a number of text formats with a simple terminal command:

Markdown to HTML (HTML output on GitHub):

pandoc pandoc-example.markdown -o pandoc-example.html

Markdown to LaTeX (LaTeX output on GitHub):

pandoc pandoc-example.markdown -o pandoc-example.tex

Markdown to DOCX:

pandoc pandoc-example.markdown -o pandoc-example.docx

Markdown to PDF (download PDF):

pandoc pandoc-example.markdown -o pandoc-example.pdf

That command calls pandoc, tells it which file to convert (pandoc-example.markdown) and tells it which file to export (e.g., pandoc-example.html). Pandoc figures out what types of files these are from the extension, or you can pass it additional arguments. For some of the formats, you can convert the other way. For example, you could convert LaTex to Markdown or to a Word DOCX, or HTML to Markdown or LaTeX.  To convert to PDF, though, you’ll need to have LaTeX installed on your system.

Another useful thing that Pandoc can do is take a URL and convert the webpage to another format. For example, this command turns a page on my website into Markdown.

pandoc -s -r html -o test.markdown

You can see many more uses for Pandoc on its example page, and you can try some conversions with its online demo.

There are several pros to using Pandoc. It’s easy to install if you use the binaries for Windows or Mac. (It was a bit of a pain for me to compile from source, but there’s no reason you’d need to do that.) The tool is under active development, so bugs are being fixed and occasionally new formats are added. And there are quite a few advanced things that you can do, like create EPUB e-books and automatically generate citations using citeproc-hs and bibliographies like BibTeX (which you can export from Zotero). There are some conversions that it would be nice if Pandoc could do, but it can’t. For example, Pandoc can turn Markdown into a Word DOCX, but it can’t turn a DOCX into Markdown, HTML, etc., because of the limitations of the DOCX format.

If you do your writing in plain text or a markup format like LaTeX, Pandoc is an essential, everyday tool for moving between formats. And if you occasionally need to turn HTML into other formats, it’s handy to have Pandoc in your toolkit.

Have you tried Pandoc? What uses have you found for it?

Return to Top