Donate to The Programming Historian today!

Lesson Index

Our lessons are organized by typical phases of the research process, as well as general topics. Use the buttons to filter lessons by category. If you can’t find a skill, technology, or tool you’re looking for, please let us know!

reset to see all lessons (81)
  • sort by publication date
  • sort by difficulty

Filtering Results: All Lessons date

  • Matthew J. Lavin

    Analyzing Documents with TF-IDF

    This lesson focuses on a foundational natural language processing and information retrieval method called Term Frequency - Inverse Document Frequency (tf-idf). This lesson explores the foundations of tf-idf, and will also introduce you to some of the questions and concepts of computationally oriented text analysis.

    analyzing distant-reading 2019-05-13 2
  • Kellen Kurschinski

    Applied Archival Downloading with Wget

    Now that you have learned how Wget can be used to mirror or download specific files from websites via the command line, it’s time to expand your web-scraping skills through a few more lessons that focus on other uses for Wget’s recursive retrieval function.

    acquiring web-scraping 2013-09-13 2
  • Ian Milligan

    Automated Downloading with Wget

    Wget is a useful program, run through your computer’s command line, for retrieving online material.

    acquiring web-scraping 2012-06-27 1
  • Taylor Arnold and Lauren Tilton

    Basic Text Processing in R

    Learn how to use R to analyze high-level patterns in texts, apply stylometric methods over time and across authors, and use summary methods to describe items in a corpus.

    analyzing distant-reading 2017-03-27 2
  • Brad Rittenhouse, Ximin Mi, and Courtney Allen

    Beginner's Guide to Twitter Data

    Learn how to acquire Twitter data and process them to make them usable for further analysis.

    acquiring data-manipulation api 2019-10-16 1
  • Amanda Visconti

    Building a static website with Jekyll and GitHub Pages

    This lesson will help you create entirely free, easy-to-maintain, preservation-friendly, secure website over which you have full control, such as a scholarly blog, project website, or online portfolio.

    presenting website data-management 2016-04-18 1
  • Seth van Hooland, Ruben Verborgh, and Max De Wilde

    Cleaning Data with OpenRefine

    This tutorial focuses on how scholars can diagnose and act upon the accuracy of data.

    transforming data-manipulation 2013-08-05 2
  • Laura Turner O'Hara

    Cleaning OCR’d text with Regular Expressions

    Optical Character Recognition (OCR)—the conversion of scanned images to machine-encoded text—has proven a godsend for historical research. This lesson will help you clean up OCR’d text to make it more usable.

    transforming data-manipulation 2013-05-22 2
  • William J. Turkel and Adam Crymble

    Code Reuse and Modularity in Python

    Computer programs can become long, unwieldy and confusing without special mechanisms for managing complexity. This lesson will show you how to reuse parts of your code by writing functions and break your programs into modules, in order to keep everything concise and easier to debug.

    transforming python 2012-07-17 2
  • Heather Froehlich

    Corpus Analysis with Antconc

    Corpus analysis is a form of text analysis which allows you to make comparisons between textual objects at a large scale (so-called ‘distant reading’).

    analyzing distant-reading 2015-06-19 1
  • Ryan Deschamps

    Correspondence Analysis for Historical Research with R

    This tutorial explains how to carry out and interpret a correspondence analysis, which can be used to identify relationships within categorical data.

    analyzing data-manipulation network-analysis 2017-09-13 3
  • William J. Turkel and Adam Crymble

    Counting Word Frequencies with Python

    Counting the frequency of specific words in a list can provide illustrative data. This lesson will teach you Python’s easy way to count such frequencies.

    analyzing python 2012-07-17 2
  • Miriam Posner and Megan R. Brett

    Creating an Omeka Exhibit

    Now that you’ve added items to your Omeka site and grouped them into collections, you’re ready for the next step: taking your users on a guided tour through the items you’ve collected.

    presenting website 2016-02-24 1
  • William J. Turkel and Adam Crymble

    Creating and Viewing HTML Files with Python

    Here you will learn how to create HTML files with Python scripts, and how to use Python to automatically open an HTML file in Firefox.

    presenting python website 2012-07-17 2
  • Patrick Smyth

    Creating Web APIs with Python and Flask

    Learn how to set up a basic Application Programming Interface (API) to make your data more accessible to users. This lesson also discusses principles of API design and the benefits of APIs for digital projects.

    presenting api data-management 2018-04-02 2
  • Jacob W. Greene

    Creating Mobile Augmented Reality Experiences in Unity

    This lesson serves as an introduction to creating mobile augmented reality applications. Augmented reality (AR) can be defined as the overlaying of digital content (images, video, text, sound, etc.) onto physical objects or locations, and it is typically experienced by looking through the camera lens of an electronic device such as a smartphone, tablet, or optical head-mounted display.

    presenting website mapping 2018-08-10 2
  • Marten Düring

    From Hermeneutics to Data to Networks: Data Extraction and Network Visualization of Historical Sources

    Network visualizations can help humanities scholars reveal hidden and complex patterns and structures in textual sources. This tutorial explains how to extract network data (people, institutions, places, etc) from historical sources through the use of non-technical methods developed in Qualitative Data Analysis (QDA) and Social Network Analysis (SNA), and how to visualize this data with the platform-independent and particularly easy-to-use Palladio.

    transforming network-analysis 2015-02-18 2
  • Caleb McDaniel

    Data Mining the Internet Archive Collection

    The collections of the Internet Archive include many digitized historical sources. Many contain rich bibliographic data in a format called MARC. In this lesson, you’ll learn how to use Python to automate the downloading of large numbers of MARC files from the Internet Archive and the parsing of MARC records for specific information such as authors, places of publication, and dates. The lesson can be applied more generally to other Internet Archive files and to MARC records found elsewhere.

    acquiring web-scraping 2014-03-03 2
  • Nabeel Siddiqui

    Data Wrangling and Management in R

    This tutorial explores how scholars can organize ‘tidy’ data, understand R packages to manipulate data, and conduct basic data analysis.

    transforming data-manipulation data-management distant-reading 2017-07-31 2
  • Jon MacKay

    Dealing with Big Data and Network Analysis Using Neo4j

    In this lesson we will learn how to use a graph database to store and analyze complex networked information. This tutorial will focus on the Neo4j graph database, and the Cypher query language that comes with it.

    analyzing network-analysis 2018-02-20 3
  • Adam Crymble

    Downloading Multiple Records Using Query Strings

    Downloading a single record from a website is easy, but downloading many records at a time – an increasingly frequent need for a historian – is much more efficient using a programming language such as Python. In this lesson, we will write a program that will download a series of records from the Old Bailey Online using custom search criteria, and save them to a directory on our computer.

    acquiring web-scraping 2012-11-11 2
  • Brandon Walsh

    Editing Audio with Audacity

    In this lesson you will learn how to use Audacity to load, record, edit, mix, and export audio files.

    transforming data-manipulation 2016-08-05 1
  • John Ladd, Jessica Otis, Christopher N. Warren, and Scott Weingart

    Exploring and Analyzing Network Data with Python

    This lesson introduces network metrics and how to draw conclusions from them when working with humanities data. You will learn how to use the NetworkX Python package to produce and work with these network statistics.

    analyzing network-analysis 2017-08-23 2
  • Stephen Krewson

    Extracting Illustrated Pages from Digital Libraries with Python

    Machine learning and API extensions by HathiTrust and Internet Archive are making it easier to extract page regions of visual interest from digitized volumes. This lesson shows how to efficiently extract those regions and, in doing so, prompt new, visual research questions.

    acquiring api 2019-01-14 2
  • Adam Crymble

    Using Gazetteers to Extract Sets of Keywords from Free-Flowing Texts

    This lesson will teach you how to use Python to extract a set of keywords very quickly and systematically from a set of texts.

    acquiring data-manipulation 2015-12-01 2
  • Evan Peter Williamson

    Fetching and Parsing Data from the Web with OpenRefine

    OpenRefine is a powerful tool for exploring, cleaning, and transforming data. In this lesson you will learn how to use Refine to fetch URLs and parse web content.

    acquiring data-manipulation web-scraping api 2017-08-12 2
  • William J. Turkel and Adam Crymble

    From HTML to List of Words (part 1)

    In this two-part lesson, we will build on what you’ve learned about Downloading Web Pages with Python, learning how to remove the HTML markup from the webpage of Benjamin Bowsey’s 1780 criminal trial transcript. We will achieve this by using a variety of string operators, string methods, and close reading skills. We introduce looping and branching so that programs can repeat tasks and test for certain conditions, making it possible to separate the content from the HTML tags. Finally, we convert content from a long string to a list of words that can later be sorted, indexed, and counted.

    transforming python 2012-07-17 2
  • William J. Turkel and Adam Crymble

    From HTML to List of Words (part 2)

    In this lesson, you will learn the Python commands needed to implement the second part of the algorithm begun in the lesson ‘From HTML to a List of Words (part 1)’.

    transforming python 2012-07-17 2
  • Jon Crump

    Generating an Ordered Data Set from an OCR Text File

    This tutorial illustrates strategies for taking raw OCR output from a scanned text, parsing it to isolate and correct essential elements of metadata, and generating an ordered data set (a python dictionary) from it.

    transforming data-manipulation 2014-11-25 3
  • Justin Colson

    Geocoding Historical Data using QGIS

    Learn how to use QGIS to convert lists of place names in to geographic coordinates, allowing you to map them.

    transforming mapping 2017-01-27 2
  • Beatrice Alex

    Geoparsing English-Language Text with the Edinburgh Geoparser

    This tutorial teaches users how to use the Edinburgh Geoparser to process a piece of English-language text, extract and resolve the locations contained within it, and plot them as a web map.

    presenting mapping 2017-10-31 3
  • Jim Clifford, Josh MacFadyen, and Daniel Macfarlane

    Georeferencing in QGIS 2.0

    In this lesson, you will learn how to georeference historical maps so that they may be added to a GIS as a raster layer.

    transforming mapping 2013-12-13 2
  • Eric Weinberg

    Using Geospatial Data to Inform Historical Research in R

    In this lesson, you will use R-language to analyze and map geospatial data.

    analyzing mapping 2018-08-20 2
  • Sarah Simpkin

    Getting Started with Markdown

    In this lesson, you will be introduced to Markdown, a plain text-based syntax for formatting documents. You will find out why it is used, how to format Markdown files, and how to preview Markdown-formatted documents on the web.

    presenting data-management 2015-11-13 1
  • Jeff Blackadar

    Introduction to MySQL with R

    This lesson will help you store large amounts of historical data in a structured manner, search and filter that data, and visualize some of the data as a graph.

    transforming data-manipulation distant-reading 2018-05-03 2
  • Jim Clifford, Josh MacFadyen, and Daniel Macfarlane

    Intro to Google Maps and Google Earth

    Google My Maps and Google Earth provide an easy way to start creating digital maps. With a Google Account you can create and edit personal maps by clicking on My Places.

    presenting mapping 2013-12-13 1
  • Adam Crymble

    Introduction to Gravity Models of Migration & Trade

    This lesson introduces gravity models as a means for determining the probable distribution of entities across space in historical datasets. It does so through a case study of historical migration patterns.

    analyzing data-manipulation 2019-03-18 3
  • Jonathan Reeve

    Installing Omeka

    This lesson will teach you how to install your own copy of Omeka.

    presenting website 2016-07-24 2
  • Fred Gibbs

    Installing Python Modules with pip

    There are many ways to install external python libraries; this tutorial explains one of the most common methods using pip.

    acquiring get-ready python 2013-05-06 1
  • Ian Milligan and James Baker

    Introduction to the Bash Command Line

    This lesson will teach you how to enter commands using a command-line interface, rather than through a graphical interface. Command-line interfaces have advantages for computer users who need more precision in their work, such as digital historians. They allow for more detail when running some programs, as you can add modifiers to specify exactly how you want your program to run. Furthermore, they can be easily automated through scripts, which are essentially recipes of text-based commands.

    transforming data-manipulation get-ready 2014-09-20 1
  • Jeri Wieringa

    Intro to Beautiful Soup

    Beautiful Soup is a Python library for getting data out of HTML, XML, and other markup languages.

    transforming web-scraping 2012-12-30 2
  • Jonathan Blaney

    Introduction to the Principles of Linked Open Data

    Introduces core concepts of Linked Open Data, including URIs, ontologies, RDF formats, and a gentle intro to the graph query language SPARQL.

    acquiring lod 2017-05-07 1
  • Ted Dawson

    Introduction to the Windows Command Line with PowerShell

    This tutorial will introduce you to the basics of Windows PowerShell, the standard command-line interface for Windows computers.

    transforming data-manipulation get-ready 2016-07-21 1
  • Shawn Graham

    An Introduction to Twitterbots with Tracery

    An Introduction to Twitter Bots with Tracery This lesson explains how to create simple twitterbots using Tracery and the Cheap Bots Done Quick service. Tracery exists in multiple languages and can be integrated into websites, games, bots.

    presenting api 2017-08-29 2
  • William J. Turkel and Adam Crymble

    Python Introduction and Installation

    This first lesson in our section on dealing with Online Sources is designed to get you and your computer set up to start programming. We will focus on installing the relevant software – all free and reputable – and finally we will help you to get your toes wet with some simple programming that provides immediate results.

    transforming python get-ready 2012-07-17 1
  • Dave Rodriguez

    Introduction to Audiovisual Transcoding, Editing, and Color Analysis with FFmpeg

    This lesson introduces the basic functions of FFmpeg, a free command-line tool used for manipulating and analyzing audiovisual materials.

    analyzing data-manipulation 2018-12-20 2
  • Go Sugimoto

    Introduction to Populating a Website with API Data

    This lesson introduces a way to populate a website with data obtained from another website via an Application Programming Interface (API). Using some simple programming, it provides strategies for customizing the presentation of that data, providing flexible and generalizable skills.

    acquiring api 2019-05-22 2
  • François Dominic Laramée

    Introduction to stylometry with Python

    In this lesson you will learn to conduct ‘stylometric analysis’ on texts and determine authorship of disputed texts. The lesson covers three methods: Mendenhall’s Characteristic Curves of Composition, Kilgariff’s Chi-Squared Method, and John Burrows’ Delta Method.

    analyzing distant-reading 2018-04-21 2
  • Matthew Lincoln

    Reshaping JSON with jq

    Working with data from an art museum API and from the Twitter API, this lesson teaches how to use the command-line utility jq to filter and parse complex JSON files into flat CSV files.

    transforming data-manipulation 2016-05-24 2
  • Quinn Dombrowski, Tassie Gniady, and David Kloster

    Introduction to Jupyter Notebooks

    Jupyter notebooks provide an environment where you can freely combine human-readable narrative with computer-readable code. This lesson describes how to install the Jupyter Notebook software, how to run and create Jupyter notebook files, and contexts where Jupyter notebooks can be particularly helpful.

    presenting python website 2019-12-08 1
  • William J. Turkel and Adam Crymble

    Keywords in Context (Using n-grams) with Python

    This lesson takes the frequency pairs collected in “Counting Frequencies” and outputs them in HTML.

    presenting python 2012-07-17 2
  • William J. Turkel and Adam Crymble

    Setting up an Integrated Development Environment for Python (Linux)

    This lesson will help you set up an integrated development environment for Python on a computer running the Linux operating system.

    transforming get-ready python 2012-07-17 1
  • William J. Turkel and Adam Crymble

    Setting Up an Integrated Development Environment for Python (Mac)

    This lesson will help you set up an integrated development environment for Python on a computer running a Mac operating system.

    transforming get-ready python 2012-07-17 1
  • William J. Turkel and Adam Crymble

    Manipulating Strings in Python

    This lesson is a brief introduction to string manipulation techniques in Python.

    transforming python 2012-07-17 2
  • Kim Pham

    Web Mapping with Python and Leaflet

    This tutorial teaches users how to create a web map based on tabular data.

    presenting mapping 2017-08-29 2
  • Vilja Hulden

    Supervised Classification: The Naive Bayesian Returns to the Old Bailey

    This lesson shows how to use machine learning to extract interesting documents out of a digital archive.

    analyzing distant-reading 2014-12-17 3
  • William J. Turkel and Adam Crymble

    Normalizing Textual Data with Python

    In this lesson, we will make the list we created in the ‘From HTML to a List of Words’ lesson easier to analyze by normalizing this data.

    transforming python 2012-07-17 2
  • William J. Turkel and Adam Crymble

    Output Data as an HTML File with Python

    This lesson takes the frequency pairs created in the ‘Counting Frequencies’ lesson and outputs them to an HTML file.

    transforming python website 2012-07-17 2
  • William J. Turkel and Adam Crymble

    Output Keywords in Context in an HTML File with Python

    This lesson builds on ‘Keywords in Context (Using N-grams)’, where n-grams were extracted from a text. Here, you will learn how to output all of the n-grams of a given keyword in a document downloaded from the Internet, and display them clearly in your browser window.

    presenting python 2012-07-17 2
  • James Baker

    Preserving Your Research Data

    This lesson will suggest ways in which historians can document and structure their research data so as to ensure it remains useful in the future.

    sustaining data-management 2014-04-30 1
  • Jim Clifford, Josh MacFadyen, and Daniel Macfarlane

    Installing QGIS 2.0 and Adding Layers

    In this lesson you will install QGIS software, download geospatial files like shapefiles and GeoTIFFs, and create a map out of a number of vector and raster layers.

    presenting mapping 2013-12-13 1
  • Taryn Dewar

    R Basics with Tabular Data

    This lesson teaches a way to quickly analyze large volumes of tabular data, making research faster and more effective.

    transforming data-manipulation 2016-09-05 1
  • James Baker and Ian Milligan

    Counting and mining research data with Unix

    This lesson will look at how research data, when organised in a clear and predictable manner, can be counted and mined using the Unix shell.

    transforming data-manipulation 2014-09-20 2
  • Zoë Wilkinson Saldaña

    Sentiment Analysis for Exploratory Data Analysis

    In this lesson you will learn to conduct ‘sentiment analysis’ on texts and to interpret the results. This is a form of exploratory data analysis based on natural language processing. You will learn to install all appropriate software and to build a reusable program that can be applied to your own texts.

    analyzing distant-reading 2018-01-15 2
  • Shawn Graham

    The Sound of Data (a gentle introduction to sonification for historians)

    There are any number of guides that will help you visualize the past, but this lesson will help you hear the past.

    transforming distant-reading 2016-06-07 2
  • Dennis Tenen and Grant Wythoff

    Sustainable Authorship in Plain Text using Pandoc and Markdown

    In this tutorial, you will first learn the basics of Markdown—an easy to read and write markup syntax for plain text—as well as Pandoc, a command line tool that converts plain text into a number of beautifully formatted file types: PDF, .docx, HTML, LaTeX, slide decks, and more.

    sustaining website data-management 2014-03-19 2
  • Alex Brey

    Temporal Network Analysis with R

    Learn how to use R to analyze networks that change over time.

    analyzing network-analysis 2018-11-04 3
  • Peter Organisciak and Boris Capitanu

    Text Mining in Python through the HTRC Feature Reader

    Explains how to use Python to summarize and visualize data on millions of texts from the HathiTrust Research Center’s Extracted Features dataset.

    analyzing distant-reading 2016-11-22 3
  • Shawn Graham, Scott Weingart, and Ian Milligan

    Getting Started with Topic Modeling and MALLET

    In this lesson you will first learn what topic modeling is and why you might want to employ it in your research. You will then learn how to install and work with the MALLET natural language processing toolkit to do so.

    analyzing distant-reading 2012-09-02 2
  • M. H. Beals

    Transforming Data for Reuse and Re-publication with XML and XSL

    This tutorial will provide you with the ability to convert or transform historical data from an XML database (whether a single file or several linked documents) into a variety of different presentations—condensed tables, exhaustive lists or paragraphed narratives—and file formats.

    transforming data-manipulation 2016-07-07 1
  • Seth Bernstein

    Transliterating non-ASCII characters with Python

    This lesson shows how to use Python to transliterate automatically a list of words from a language with a non-Latin alphabet to a standardized format using the American Standard Code for Information Interchange (ASCII) characters.

    transforming data-manipulation 2013-10-04 2
  • Doug Knox

    Understanding Regular Expressions

    In this lesson, we will use advanced find-and-replace capabilities in a word processing application in order to make use of structure in a brief historical document that is essentially a table in the form of prose.

    transforming data-manipulation 2013-06-22 2
  • Miriam Posner

    Up and Running with makes it easy to create websites that show off collections of items.

    presenting website 2016-02-17 1
  • Stephanie J. Richmond and Tommy Tavenner

    Using JavaScript to Create Maps of Correspondence

    Demonstrates how to use the JavaScript library “Leaflet” to produce an interactive map that can be hosted online or viewed locally, and demonstrates how to customize many of its features.

    presenting mapping 2017-04-24 2
  • Jim Clifford, Josh MacFadyen, and Daniel Macfarlane

    Creating New Vector Layers in QGIS 2.0

    In this lesson you will learn how to create vector layers based on scanned historical maps.

    presenting mapping 2013-12-13 2
  • William J. Turkel and Adam Crymble

    Understanding Web Pages and HTML

    This lesson introduces you to HTML and the web pages it structures.

    presenting python 2012-07-17 2
  • Charlie Harper

    Visualizing Data with Bokeh and Pandas

    In this lesson you will learn how to visually explore and present data in Python by using the Bokeh and Pandas libraries.

    analyzing python data-manipulation mapping 2018-07-27 2
  • William J. Turkel and Adam Crymble

    Setting Up an Integrated Development Environment for Python (Windows)

    This lesson will help you set up an integrated development environment for Python on a computer running the Windows operating system.

    transforming get-ready python 2012-07-17 1
  • Moritz Mähr

    Working with batches of PDF files

    Learn how to perform OCR and text extraction with free command line tools like Tesseract and Poppler and how to get an overview of large numbers of PDF documents using topic modeling.

    transforming data-manipulation data-management 2020-01-30 2
  • William J. Turkel and Adam Crymble

    Working with Text Files in Python

    In this lesson you will learn how to manipulate text files using Python.

    transforming python 2012-07-17 2
  • William J. Turkel and Adam Crymble

    Downloading Web Pages with Python

    This lesson introduces Uniform Resource Locators (URLs) and explains how to use Python to download and save the contents of a web page to your local hard drive.

    acquiring python 2012-07-17 2