Skip to main content

'Digital scholarship workflows'

This page covers digital tools and workflows that I use in my scholarly work, covering activities ranging from digitisation, annotation, referencing, plaintext authorship, storage and backup, to presentation and web presence. It includes workflows based on ScanTailor, OCR-ing, Zotero, Diigo, Hypothesis, Markdown, Atom, Pandoc, Git, Reveal.js and Nikola. My approach builds on practices of shadow librarianship, plain text authorship and autonomy from platforms. While these workflows are particularly useful for scholars, they can be practical for anyone doing a lot of reading and writing.

When I started my PhD in early 2018, I decided to re-organise the processes how I digitise, read, annotate and reference other people’s texts - and how I write, revise and publish my own. Three reasons motivated me to do so. First, given that I was now living between two countries and that I have a disability limiting my capacity to move around heavy loads of books, I have entirely shifted to reading text exclusively on screen. Second, I have decided to systematise my reading and writing workflows around simple standards that promise to be technologically maintainable over years and decades. Third, I have desired to create a workflow that would allow me to easily re-format my texts for publication in a variety of academic venues and on my own website.

The tools and workflows presented here cover digitisation of text, reference management, annotated bibliography, writing in plaintext, formating for publication, text revisions, storage and backup, and slide presentations and static websites. The workflows here build first and foremost on the impressive PhD Starter Kit by Achintya Rao. His detailed guide to how to set up work around PhD has helped me immensely - and makes much of this document redundant. This document, however, covers a different scope. I also draw from the Guide to Plain Text Social Science by Kieran Healy and the work of Dennis Yi Tenen and Grant Wythoff on Sustainable Authorship in Plain Text. Dennis has also written a monograph on Plain Text (Tenen 2017). Bits and bobs gleaned from various corners of the Internet are linked to in the text. While none of the tools and workflows I discuss requires much technological skill, they do require some very basic knowledge of command-line (or console, terminal, prompt). Ubuntu provides a short beginners guide. For discovery, experimentation and problem-solving or anything more sophisticated, I rely on the technical prowess of my friend and collaborator Marcell Mars.

Digitising

All books and texts I read, I prefer to read in the digital form. Digital text benefits from the fact that it is easily storable, portable, searchable, annotable and re-usable. Digital text, however, has its shortcomings compared to print, such as potential data rot or lack of physical navigation. However, these shortcomings are of lesser priority to me compared to the advantages offered by the digital text.

Digital text comes in many formats, but in my scholarly work I mostly operate with books, journals and articles in dedicated e-book formats such as .mobi or .epub or more commonly as .pdf. A PDF originates either from a digital layout or from a scan of a book. The latter is of interest here. You can scan a book or a journal with a book scanner, a copier or a simple camera. A scanned PDF can be pushed through the Optical Character Recognition (OCR) process, making it amenable to operations such as search, highlighting, annotation and citation that are central to scholarly reading and writing.

Scanning and post-processing

A detailed description of the workflow that I use to scan books and create PDFs can be found in my Book Digitisation tutorial. This tutorial perfected and documented together with Dubravka Sekulić and Ann Mertens for the Memory of the World shadow library. The workflow is mostly based on free software and should work with minor adaptations equally on Linux, OS X and Windows. In brief, to digitise a book, I use a hacklab scanner or regular office flat-bed scanner to create full-colour, high-resolution images of the book. In ScanTailor I crop out from the images everything but the content of the pages, correct distortions and reduce the full-colour of text sections to black and white. This results in much smaller, mostly b/w image files. Against these files I run an optical character recognition and output images and OCR-ed text into a PDF using either Tesseract frontend Gscan2pdf or Abbyy Finereader.

OCR-ing a non-OCR-ed .pdf file

To OCR a .pdf that isn’t OCR-ed previously, you either have to use proprietary software such as Abbyy Finereader or Adobe Acrobat Pro, or unstitch the PDF into image files, OCR them with a tool such as Tesseract, and stitch them together into a searchable PDF. To unstitch a multipage PDF into image files, I find that the command-line converter pdftoppm produces best output results, allowing me to set the output resolution and formats (.png, .jpg or .tiff) while retaining the quality of images:


pdftoppm -r 300 -png /path/to/<sourcefile>.pdf /path/to/destination/directory/

From there you can use the digitising workflow described in the previous section to OCR the output images and stitch them back into a .pdf file. Useful to remember: this process can also help you improve scans that consist of two-page spreads or scans that have a lot of dark shadings and specks.

Reference management and annotation

Central to scholarly work today is keeping abreast of a large amount of published research, scholarly writing and daily reporting relevant to one’s work. Not only does one have to read and keep an overview over a vast body of literature, but also be able to excerpt and summarise texts, easily retrieve segments for citation and manage bibliographic references. Therefore, there are many software tools out there that can assist you in managing your literature, annotations and references such as EndNote, Mandeley or RefWorks. I prefer Zotero for reasons that it is free software developed in a non-commercial ecosystem and supported by universities.

While I use Zotero to organise my reading workflow, the reading I do mostly on an old trusty Android tablet using the fantastic Moon+ Reader Pro. For reading and annotation of online texts, I use the social bookmarking tool Diigo and collective annotator Hypothes.is.

Zotero

How to use Zotero as a reference management and research tool is well explained on the project website. In brief, after installing Zotero and connector for your browser of choice, you can add a variety of bibliographic items either manually or more commonly by scraping bibliographic metadata from Worldcat, OpenLibrary, Google Books, Amazon.com, Google Scholar, academic repositories or any online source that have well organised metadata. You can add books, academic articles, book chapters, reports, documents, newspaper articles, blog entries, videos and many other formats.

The browser connector retrieves bibliographic metadata - author, title, journal, volume publisher, pagination, ISBN, ISSN, DOI… - and adds them to as an item in your local Zotero library. It is always advisable to manually double-check the scraped metadata, as the location of publisher, original year of publishing and similar details are frequently incomplete and inconsistent across those metadata repositories. To each item in the library, Zotero will attempt to add a PDF if available. Failing that, you can attach a digital file manually. Zotero also creates a snapshot of any webpages you add to the library.

The items added to the library can then be organised into collections and subcollections and tagged with keywords. This allows you to search and sort your references quickly, assemble the reading for your research projects, add more items as you read and write, and later return to your existing collections to find items to drag and drop into your future research project collections.

You can back up your Zotero library by syncing it to the account you can create at Zotero.org. With a Zotero.org account, you can create group libraries, which you can share when doing teaching or writing with others.

Zotero has LibreOffice, MS Word and Google Docs plugins that integrate search and citation in your word processors. Zotero repository offers almost 10.000 citation styles that you can install in your Zotero. Zotero then automates the creation of bibliography from the citations you have used.

I use Zotero as an organiser and sandbox for my research. Typically I organise future research projects in thematic collections and subcollections, meticulously adding the .pdf files of books or articles to their respective bibliographic items. As much as Zotero is central to my reading and writing workflow, the add-on ZotFile is central to my use of Zotero. ZotFile I use to send a .pdf file (right-click on the item -> Manage Attachments -> Send to Tablet) to a cloud directory that I sync with my tablet for reading. I then read, highlight and annotate the file on my tablet, using the Moon+ Reader Pro (no free software, but works excellent with a large number of digital formats, including .pdf, .mobi, .epub and .md, and is more than worth its US$5). Once I have recorded the edited file in the synced cloud folder, I use ZotFile to retrieve it back into Zotero (right-click on the item -> Manage Attachments -> Get from Tablet). ZotFile automatically extracts all the annotations, which it stores in a note, and retains the highlights you have made in the attached .pdf. This is a superb feature that allows me to quickly parse the highlights and quote the text at any point later. To set up ZotFile add-on, in ZotFile preferences you need to define the local path on your computer to the folder you sync with your cloud and tablet.

Another important add-on for my Zotero workflow is Better BibTeX. Better BibTeX automatically generates .bib file(s) that contain reference-lists of items from the entire library or any (sub-)collection and assigns to each item a unique citation key. The citation keys I can use to reference works when entering citations in the texts I write in Markdown (see the next section on writing). When converting a Markdown text into a formatted text, I only need to specify the citation style, and that citation key will be transformed into a properly formatted reference for that citation style. To set up Better BibTeX, you need to define the format of the citation key in the Better BibTeX tab of Zotero preferences. In my case, the pattern is the following:


[auth:fold:lower:condense=_]_[Title:nopunct:skipwords:select,1,1:lower]_[year]

For Dennis Yi Tenen’s monograph on Plain Text this translates into tenen_plain_2017. Also, while you’re in BibTex preferences tab, it is advisable to instruct Better BibTeX to do automatic export when idle as indexing large libraries whenever you make an edit in your library can slow down your Zotero.

Finally, some journals require that you include DOI links in the references. Zotero DOI Manager add-on can verify if those exist and add them to your library items.

Diigo and Hypothesis

Zotero was created primarily with journal articles and books in mind. However, you can’t easily annotate webpages in Zotero. While I still use Zotero to reference webpages, to annotate them, I use primarily social bookmarking service Diigo. Diigo annotations I add manually as a note to their respective Zotero references. I use Diigo also to bookmark websites from my phone for later reading. Similar functionality is offered by Pocket and Instapaper.

Hypothesis is a tool that adds a layer of collective annotation and public debate on top of any webpage, as well as on any online and local PDF you desire. To see it in action, go to the retractable shelf in the upper left corner of this webpage. It will reveal annotations and notes related to this webpage made on Hypothesis. If you are logged in on Hypothesis, selecting a section of a webpage will bring up a tool menu allowing you to either annotate or highlight the section. You can also enter a URL of any webpage after https://hyothes.is to open a Hypothesis thread on that webpage. The tool is particularly useful for reading groups and collective learning processes. However, I find it less convenient than Diigo for extracting annotations for storing in Zotero. Both you can use to read and annotate online texts both on your computer and your mobile devices.

Annotated bibliography

In order to have an overview of key arguments in a vast amount of texts you use in your research and their particular relevance for your research, it is advisable to maintain an annotated bibliography. An annotated bibliography typically consists of a list of sources with short description and evaluation of each source. Some universities may require students to keep an annotated bibliography and submit it for progress review panels, but it is generally a useful building block in creating a literature review. University of New South Wales provides a good overview of the purpose of an annotated bibliography and structure of annotations.

I follow Emory University’s helpful guide that advises creating annotated bibliographies in Zotero by using either “abstract” or “extra” fields and annotated bibliography citation styles from Zotero repository.

Writing

When I started my PhD, I faced the challenge of how to organise writing, formatting and revising of my texts. Chapter drafts have to be revised over and over creating versions upon versions with ever more complicated file names. Formatting, citations styles and front matter have to be adapted for each publication a text is submitted to. A text written for a publication should eventually appear on my own website. All these complications led me to adopt and adapt a writing workflow based on Markdown plain text and Pandoc document converter. Using Pandoc, a Markdown plain text can, with little or no tweaking, be output with different citation styles into different file formats, and be easily re-used for publication submissions, slide presentations or website pages. Markdown files are small, and I can easily store them and keep track of revisions on software development repositories using Git version control system. Elements of this workflow are covered in the rest of this document.

Markdown

Markdown is a markup language with a syntax extending the old email formatting conventions. Single *asterisks spanning words* indicates italics, double **asterisks** bold. Single # hashtag indicates heading 1, double # hashtag stands for heading 2. Principally, syntax indicates semantic structure - italics is a single emphasis, bold is a double emphasis - that can be later rendered into a formatted text with any selection of style. Markdown was initially developed by John Gruber in collaboration with Aaron Swartz with the intent of making the marked-up text readable to regular mortals who have little clue about computer languages. Here is a helpful tutorial to get you started, here a handy cheat sheet with Markdown syntax.

Atom

While written in a markup syntax and with an .md extension, Markdown files are plain text files and can be created using the simplest of text editors such as nano on *nix systems, TextEdit on OS X, or NotePad on Windows. However, advanced text editors such as Emacs or Vim can add a lot of functionality. I use Atom, a free software text editor, developed by GitHub, and available for all major operating systems.

Atom with its packages provides a lot of support for Markdown writing and integrates well with Zotero.

To improve the usability of Atom as a writing tool, I use:

To improve Markdown writing experience, I use:

  • Markdown Writer for syntax assistance
  • Markdown Preview Enhanced for previewing the document rendered in Pandoc (make sure you have ticked ‘Use Pandoc Parser’ and set the path to Pandoc executable and command-line arguments such as .bib file location and citation style location in package’s settings).

To integrate Atom with Zotero using a .bib file generated by BetterBibTex add-on in Zotero, I use:

  • Autocomplete-bibtex. To set up Autocomplete-bibtex enter in its settings path to the .bib file of your Zotero library. Markdown already comes with a syntax for citation in the form [@id]. If you type in @ followed by the first letters of an author’s name or title, Autocomplete-bibtext will bring up a drop-down menu offering you to select items from your Zotero library matching those letters. By selecting an item, a citation key for that item will be inserted into the text, resulting in an entry such as @tenen_plain_2017. A similar way to integrate Atom and Zotero is provided by Zotero-picker that emulates the search bar in the LibreOffice and MS Word plugins.

Spell-checking, grammar and style

As a native speaker of a minor language, my writing in English often has rough edges. To weed out spelling errors when writing in Atom, I use the Linter-spell package, which depends on hunspell or aspell for spell-checking. Linter-spell allows parallel spell-checking in multiple languages, which need to specify in the settings in the Default languages field. In my case, those are; en-GB, hr-HR. Make sure you have those hunspell or aspell language files installed on your system.

For issues of English grammar and style, several online services provide assistance well beyond what built-in spelling-checkers in word processors such as LibreOffice, MS Word or Linter-spell in Atom can offer. My assistant of choice is Grammarly. Grammarly both integrates with browsers as an add-on and has a dedicated page to paste in a text for correction. Already as a free service, it offers many suggestions related to grammar and style, and much more as a subscription service. The jury is still out as to whether the subscription is worth its money.

Formatting and publishing

A text written in Markdown plain text can be converted into a variety of formats using Pandoc. I use Markdown and Pandoc to write my PhD chapters, a book project, submissions for academic journals, texts for media, slide presentations and posts for this website. All these have different requirements on how the texts need to be formatted for publication: different citation styles, different text layouts, different front matter, different file formats. Writing a semantically structured text in Markdown syntax and rendering it into formatted text with Pandoc gives me maximum flexibility and reusability of text.

Pandoc

Pandoc, free software developed by the philosopher of science John MacFarlane, is a universal document converter that can translate between a large number of formats, including from Markdown plain text to fully formatted .html, .docx or .pdf documents - and back to Markdown. Pandoc can read syntax for front matter metadata, links, footnotes, tables, ordered lists, mathematical formulas and many other structural elements, and render them into beautifully styled documents. It also includes the pandoc-citeproc library, which can automatically generate citations and a bibliography in Pandoc-rendered documents using the Citation Style Language (CSL) files.

Pandoc manual provides a detailed guide on how to use Pandoc. My baseline Pandoc command for conversion from .md to .docx has the following structure:


pandoc <document>.md --metadata-file=metadata.yml --bibliography=/home/<user>/.pandoc/bibliography/zotero_library.bib --csl=/home/<user>/.pandoc/csl/harvard.csl --reference-doc=/home/coyu3/.pandoc/custom-reference.docx -s -o <document>.docx

This will convert the document.md file, using my metadata YAML file, my Zotero library .bib, Harvard referencing style file and custom-reference.docx template, into a document.docx file (make sure to replace placeholder paths to files and file names with your own). Flag -s (or –standalone) produces an output with an appropriate header and footer, but the header and footer is already automatically included for some formats (including .pdf, .docx and .odt) and not for others (including HTML, LaTeX or .rtf). Flag -o indicates the output file.

A separate YAML metadata file is a relatively recent feature and might not work in older Pandoc versions. Instead, you can write a YAML metadata block at the top of your text Markdown file. In fact, this is what I tend to do. In a YAML block, you can define many items, such as the font family, .csl style file and .bib file that will be used with the document. I tend to include only the front matter and retain flexibility with format elements by specifying them only when outputting in Pandoc. A typical YAML block of my article has the following structure:

---
title: 'The Postdigital Condition and the Accelerated Technocapitalism'
subtitle: 'Accelerated Instrumental Rationality and the Three Responses of the Far-Right

author:
  - Tomislav Medak
  - Coventry University
  - ORCID 0000-0003-3844-0434
  - medakt@coventry.ac.uk

date: June 26th, 2019

abstract: |
  **Abstract**: The article argues that the postdigital condition can be understood from the changing global economic and political context that was conducive to digital network technologies becoming ubiquitous...
...

Pandoc Manual has a more detailed explanation of metadata blocks and YAML metadata block.

Custom-reference.docx template file generated by Pandoc has a number of predefined styles to render your metadata, text structure and references. I have combined examples and tweaks I found online to create a .docx template that suits my needs. You can download it from here. If you want to add a style that is not already defined by Pandoc and the template file, you need to edit a text in the custom style in your .docx template file and then use the HTML style span tag around the relevant segment of text in your .md file to match that style. For example, for a custom style for keywords, I had to span the keywords in my Markdown text in the following way:


<span custom-style=“Keywords”>**Keywords:** postdigital, digital economy, technocapitalism, liberal capitalist hegemony, the far-right</span>

Pandoc generates PDF files using LaTeX and depends on a PDF engine installed on your system. The default LaTeX template produces beautiful PDFs. Plain HTML output, however, will be bare-bones and will require CSS or HTML5 templates to define the layout of the webpage. For more complex documents and uses, a good starting point is a repository with user-contributed Pandoc templates.

Citation style language files

Markdown syntax for citation is [@citationkey]. Pandoc-citeproc will use the specified .bib file and citation format file to output citations and a bibliography in the desired form. Almost 10.000 .csl files for various referencing standards and publications-specific styles can be downloaded from the Zotero Style Repository. If your editor doesn’t have a .csl file, you can try to detect what style they use at the Citation Style Language project. If it only approximates a style, you can edit an existing style to build a .csl file that matches their style.

Revising

The downside of Markdown plain text and Pandoc workflow is that tracking changes and comments among versions cannot be practically done in a document as in .docx or .odt files. If you receive suggestions and comments in a .docx format, you can use Pandoc to convert a .docx version of your document back into .md, using the following command:


pandoc -s <commented-document>.docx --wrap=none --track-changes=all --atx-headers -o <commented-document>.md

Option --track-changes can have three values: accept, reject and all. If we use ‘all’ to include all deletions, insertions and comments, thus replicating the ‘show changes’ function of a MS Word or LibreOffice, the results are underwhelming - there is so much noise from added inline code that it renders the Markdown document unreadable while also breaking a part of the syntax such as citation keys.

One course of action is to go through the suggested changes in the received .docx file and manually enter them into the original Markdown document. Another course of action is to run Pandoc conversion with the ‘accept’ option and then compare differences, merge them or edit the original with a tool such as Meld.

If your supervisors or editors want to see the changes you have made between two drafts, you can always output two versions of your .md file to a .docx or .odt and create a track-changes file with LibreOffice’s ‘compare document’ function. The same can be done for versions of PDFs - Timothée Poisot has an explanation of how to do that on his blog.

Naming, storing and backing-up files

To organise my scholarship files, I divide my projects between the ‘PhD’ directory that includes not only the thesis but everything related to my PhD study, ‘writing’ directory that includes not only my writing but also my collaborative writing projects, ‘web’ directory that includes my static website, ‘presentations’ directory that includes my conference texts and slides, and ‘technocapitalism’ book project directory. Following useful advice of Achintya Rao’s PhD Starter Kit, I have adopted a heterogenous yet consistent nomenclature for my directories and files. Directories and sub-directories are named differently, depending on what works best for my memory, search, sorting and command-line manipulation: short memorable titles, priority sorting using numeral prefixes 01_, 02_, 03_,…, or chronology sorting using ISO Date Format prefix YYYYMM_.

Naming versions of a text as it undergoes revisions can be a cause for confusion and frustration. There are always small edits that happen between major drafts, and there are always small edits that happen after the final submission. Adding suffixes ‘_v.1, _v.2, _v.3…’ or ‘_draft_1, _draft_2, _draft_3… final’ can easily get out of hand. For this reason, I use Git to version, store and backup my scholarship.

Git, Gitlab and SparkleShare

Git, a free software framework developed initially by Linus Torvalds, is a version control system for collaborative software development projects. Each of my five scholarship directories is a git directory that I keep synced with my online repository on GitLab (GitHub or BitBucket should serve you equally). Extensive Git documentation is available from the Git project, a handy quick overview of commands from the GitHub.

To initiate an existing project directory into a git directory, you have to execute the following sequence of commands:

cd <localdir>
git init
git add .
git commit -m 'message'
git remote add origin <url>
git push -u origin master

Before you do git commit, you should probably exclude all large and sensitive directories or files by creating a ‘.gitignore’ file in the same directory, specifying in the file the path to documents you want to omit. you will obtain by creating a new project on GitLab (or GitHub if you use that).

Once you have created a remote repository, you will be syncing the changes you have made locally by executing the following sequence of commands:

git add -all; git commit -m 'message for this revision'; git push

Changes in the remote repository made by your co-authors or collaborators you will be syncing to your local directory by doing:

git pull

The directory synced and backed up on your remote repository can always be copied to another computer locally by doing:

mkdir <localdirectory>
git clone https://gitlab.com/USERNAME/PROJECT-NAME.git

You can automate this process by using SparkeShare, a git-based file-sharing application, that acts in a similar way to cloud storage applications like Dropbox or Google Drive. After you have set up SparkleShare, it runs as a daemon in your system tray keeping your .git directories synced locally and remotely.

Backing-up the Zotero library

Zotero is central to my reading workflow and the Zotero library directory contains annotated PDFs of almost everything I read. Its directory is both large and precious to me. Zotero.org account will back-up your references, but given the limited amount of storage offered by Zotero.org (300mb for free, up to 6GB under an inexpensive payment plan), I store a back-up of my Zotero library backed up in the cloud using rclone, a command-line tool that lets you sync your local files and directories with a remote cloud destination. Although rclone has excellent documentation and assists in the setup through command line dialogues, there’s also the RcloneBrowser (a recent Ubuntu build can be found here) that provides a graphical frontend for rclone.

Presentations and website

To create slide presentations with Markdown, I use the Reveal.js - a highly capable framework for creating presentations in HTML, developed initially by Hakim el Hattab, and offering slick features such as nested slides, fragments, speaker notes and PDF export. To create my website with Markdown, I use Nikola static web generator, developed by Roberto Alsina.

Reveal.js

You can download and install Reveal.js from its GitHub repository. Reveal.js slides can be written directly into the index.html file, both as in the HTML markup and Markdown. However, Reveal.js can load external Markdown, a feature I find much more convenient for my workflow. It allows me to quickly transform the text of my talk that I have already written in Markdown into a slide show or conversely write my text as slide notes.

To create a presentation, first in the index.html file I enter metadata such as author and title, paths to stylesheets that will be used to render the presentation (Reveal.js comes with a number of beautiful themes), and most importantly the name of the Markdown file and the separators that I will use for horizontal and vertical slides:


<div class="slides">
  <!-- Use external markdown resource, separate slides by three newlines; vertical slides by two newlines -->
  <section data-markdown="technologies_and_ecological_transition.md"
           data-separator="!---!"
           data-separator-vertical="!--!"
           data-notes="Note:">
  </section>
</div>

Now I can write my presentation in the technologies_and_eclogical_transition.md file. A typical slide segment has the following structure:

!---!

# modeling the human needs within planetary boundaries

<img src="./raworth_embedding.png" height="400" />

<font size="4"> Kate Raworth, *Doughnut Economics: Seven Ways to Think Like a 21st-Century Economist*, 2017 </font>

Note: all economic processes are drawing on living matter, materials and energy from nature, transforming them from a more ordered state into a less ordered state, from a more usable condition to a less usable condition.

!--!

In this example, !---! is a separator for a vertical slide and !--! for a horizontal slide. # defines the heading, Note: my slide note. Slide segments typically combine Markdown and HTML syntax, at least to link to images, which I place into the presentation directory.

I also tend to tweak the existing behaviour and themes to suit my preferences, for instance by assigning in the index.html file an absolute position on the slide for Heading 1 content.

Nikola

Nikola (named after Nikola Tesla) is a static website generator comparable to Jekyll or Hugo. Most of the online publishing frameworks such as Wordpress or Mediawiki use databases and web programming to generate the webpage for each visitor dynamically. Yet most of the content we publish on our websites is, in fact, meant to be static - every visitor is supposed to see the same content. Static website generators emerged in response to this mismatch and its consequences. With static websites, you generate the website locally and then upload it to the server. What goes in is Markdown, HTML and CSS, what goes on the server is just HTML files and the linked images and documents. This has substantial advantages: software updates by your online hosting service will not obsolete your website if you stop updating it, websites are safer from attacks, they are far less resource hungry, they don’t lock you in with a vendor. You can always move your HTML files to another server, no matter how big or small, no matter its setup. Static websites provide you with autonomy from platforms and lock-ins, while offering everything you might expect from a modern website: themes, blogs, tags, comments, RSS/Atom fees or social media integration.

Nikola is easy to install across all major operating systems. Once you’ve installed Nikola, you can initialise your website locally. A Nikola website can behave as primarily a blog or as a site. Nikola has an extensive handbook providing you with all the ins and outs of installing, defining the behaviour and using the framework.

I host my website on GitHub Pages, using my domain, which costs me altogether the price of domain registration. This allows me to use Git to update the changes to the website. This is done with a single line of code run from my local Nikola website directory:

nikola build; git add --all; git commit -m "`date`"; git push

However, should GitHub ever terminate the service, I can always transfer the website elsewhere and use another protocol, as basic as FTP, to maintain my website.

Essential for my purposes, Nikola can use Pandoc to compile HTML. For this, you need to uncomment [a line in the list of compilers] (https://getnikola.com/listings/conf.py.html#listingsconfpy-305) and set PANDOC_OPTIONS in the conf.py file located in the local Nikola website directory. The options in my case read:


PANDOC_OPTIONS = ['--toc', '--template=/home/<user>/.pandoc/templates/html_nikola.template', '-t', 'html5', '--filter', 'pandoc-citeproc', '--bibliography=/home/<user>/.pandoc/bibliography/zotero_library.bib', '--csl=/home/coyu3/.pandoc/csl/harvard-coventry-university.csl' ]

Pandoc here uses a custom template created by my collaborator Marcell Mars to deal with some of my additional front matter. Nikola already offers a number of metadata entries. Nikola uses by default reST comments, but you can easily change that setting into YAML in the conf.py file. My template file adds to existing metadata affiliation for the authors, article abstract, and article lead. The file is available here.

On some of the pages on my website, including this one, I have Hypothesis installed. To add Hypothesis script to a page, you need to add the following code after your metadata entry:


<script type="application/json" class="js-hypothesis-config">

{"showHighlights": false}

</script>

<script src="https://hypothes.is/embed.js" async></script>

To improve how pages and posts are shared on Facebook or Twitter, it is necessary to include a preview image. This is done by placing an image of optimal size (for Facebook, it’s 1200x635 pixels) and use the ‘previewimage’ metadata element with the relative path to the image file (e.g previewimage:/images/<filename>.jpg). Before you share, it’s best to go to Facebook’s Object Debugger to test if the preview image works as expected.

To improve discoverability and ranking of your pages and posts on search engines, add in the ‘description’ metadata element of each Nikola page or blog post a description of its content that best matches potential searches. Finally, to add Google Analytics to your Nikola, you have to uncomment the BODY_END line in you conf.py file and enter the following code:


BODY_END = """
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-131829443-1"></script>
<script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());

  gtag('config', 'UA-131829443-1');
</script>
"""

For more customisations and tweaks, Nikola docmentation offer a plenitude of resources, but you can find also find on the web examples of Nikola-powered website customisations such as this highly resourceful Lois Tiao’s blog post.

Further reading:

Tenen, D. (2017) Plain Text: The Poetics of Computation. Stanford University Press