Donate to The Programming Historian today!

Lesson Index

Our lessons are organized by typical phases of the research process, as well as general topics. Use the buttons to filter lessons by category. If you can’t find a skill, technology, or tool you’re looking for, please let us know!

reset to see all lessons (84)
  • sort by publication date
  • sort by difficulty

Filtering Results: All Lessons date

  • Andrew Akhlaghi

    OCR and Machine Translation

    This lesson covers how to convert images of text into text files and translate those text files. The lesson will also cover how to organize and edit images to make the conversion and translation of whole folders of text files easier and more accurate. The lesson concludes with a discussion of the shortcomings of automated translation and how to overcome them.

    transforming data-manipulation 2021-01-06 2
  • Matthew J. Lavin

    Analyzing Documents with TF-IDF

    This lesson focuses on a foundational natural language processing and information retrieval method called Term Frequency - Inverse Document Frequency (tf-idf). This lesson explores the foundations of tf-idf, and will also introduce you to some of the questions and concepts of computationally oriented text analysis.

    analyzing distant-reading 2019-05-13 2
  • Kellen Kurschinski

    Applied Archival Downloading with Wget

    Now that you have learned how Wget can be used to mirror or download specific files from websites via the command line, it’s time to expand your web-scraping skills through a few more lessons that focus on other uses for Wget’s recursive retrieval function.

    acquiring web-scraping 2013-09-13 2
  • Ian Milligan

    Automated Downloading with Wget

    Wget is a useful program, run through your computer’s command line, for retrieving online material.

    acquiring web-scraping 2012-06-27 1
  • Taylor Arnold and Lauren Tilton

    Basic Text Processing in R

    Learn how to use R to analyze high-level patterns in texts, apply stylometric methods over time and across authors, and use summary methods to describe items in a corpus.

    analyzing distant-reading 2017-03-27 2
  • Brad Rittenhouse, Ximin Mi, and Courtney Allen

    Beginner's Guide to Twitter Data

    Learn how to acquire Twitter data and process them to make them usable for further analysis.

    acquiring data-manipulation api 2019-10-16 1
  • Amanda Visconti

    Building a static website with Jekyll and GitHub Pages

    This lesson will help you create entirely free, easy-to-maintain, preservation-friendly, secure website over which you have full control, such as a scholarly blog, project website, or online portfolio.

    presenting website data-management 2016-04-18 1
  • Seth van Hooland, Ruben Verborgh, and Max De Wilde

    Cleaning Data with OpenRefine

    This tutorial focuses on how scholars can diagnose and act upon the accuracy of data.

    transforming data-manipulation 2013-08-05 2
  • Laura Turner O'Hara

    Cleaning OCR’d text with Regular Expressions

    Optical Character Recognition (OCR)—the conversion of scanned images to machine-encoded text—has proven a godsend for historical research. This lesson will help you clean up OCR’d text to make it more usable.

    transforming data-manipulation 2013-05-22 2
  • William J. Turkel and Adam Crymble

    Code Reuse and Modularity in Python

    Computer programs can become long, unwieldy and confusing without special mechanisms for managing complexity. This lesson will show you how to reuse parts of your code by writing functions and break your programs into modules, in order to keep everything concise and easier to debug.

    transforming python 2012-07-17 2
  • Amanda Visconti, Brandon Walsh, and Scholars' Lab Community

    Running a Collaborative Research Website and Blog with Jekyll and GitHub

    In this lesson you will be introduced to the challenges and opportunities that Jekyll, a popular, static site generator, offers for publishing collaborative, ongoing research online.

    presenting website data-management 2020-11-23 2
  • John R. Ladd

    Understanding and Using Common Similarity Measures for Text Analysis

    This lesson introduces three common measures for determining how similar texts are to one another: city block distance, Euclidean distance, and cosine distance. You will learn the general principles behind similarity, the different advantages of these measures, and how to calculate each of them using the SciPy Python library.

    analyzing distant-reading 2020-05-05 2
  • Heather Froehlich

    Corpus Analysis with Antconc

    Corpus analysis is a form of text analysis which allows you to make comparisons between textual objects at a large scale (so-called ‘distant reading’).

    analyzing distant-reading 2015-06-19 1
  • Ryan Deschamps

    Correspondence Analysis for Historical Research with R

    This tutorial explains how to carry out and interpret a correspondence analysis, which can be used to identify relationships within categorical data.

    analyzing data-manipulation network-analysis 2017-09-13 3
  • William J. Turkel and Adam Crymble

    Counting Word Frequencies with Python

    Counting the frequency of specific words in a list can provide illustrative data. This lesson will teach you Python’s easy way to count such frequencies.

    analyzing python 2012-07-17 2
  • Miriam Posner and Megan R. Brett

    Creating an Omeka Exhibit

    Now that you’ve added items to your Omeka site and grouped them into collections, you’re ready for the next step: taking your users on a guided tour through the items you’ve collected.

    presenting website 2016-02-24 1
  • William J. Turkel and Adam Crymble

    Creating and Viewing HTML Files with Python

    Here you will learn how to create HTML files with Python scripts, and how to use Python to automatically open an HTML file in Firefox.

    presenting python website 2012-07-17 2
  • Patrick Smyth

    Creating Web APIs with Python and Flask

    Learn how to set up a basic Application Programming Interface (API) to make your data more accessible to users. This lesson also discusses principles of API design and the benefits of APIs for digital projects.

    presenting api data-management 2018-04-02 2
  • Jacob W. Greene

    Creating Mobile Augmented Reality Experiences in Unity

    This lesson serves as an introduction to creating mobile augmented reality applications. Augmented reality (AR) can be defined as the overlaying of digital content (images, video, text, sound, etc.) onto physical objects or locations, and it is typically experienced by looking through the camera lens of an electronic device such as a smartphone, tablet, or optical head-mounted display.

    presenting website mapping 2018-08-10 2
  • Marten Düring

    From Hermeneutics to Data to Networks: Data Extraction and Network Visualization of Historical Sources

    Network visualizations can help humanities scholars reveal hidden and complex patterns and structures in textual sources. This tutorial explains how to extract network data (people, institutions, places, etc) from historical sources through the use of non-technical methods developed in Qualitative Data Analysis (QDA) and Social Network Analysis (SNA), and how to visualize this data with the platform-independent and particularly easy-to-use Palladio.

    transforming network-analysis 2015-02-18 2
  • Caleb McDaniel

    Data Mining the Internet Archive Collection

    The collections of the Internet Archive include many digitized historical sources. Many contain rich bibliographic data in a format called MARC. In this lesson, you’ll learn how to use Python to automate the downloading of large numbers of MARC files from the Internet Archive and the parsing of MARC records for specific information such as authors, places of publication, and dates. The lesson can be applied more generally to other Internet Archive files and to MARC records found elsewhere.

    acquiring web-scraping 2014-03-03 2
  • Nabeel Siddiqui

    Data Wrangling and Management in R

    This tutorial explores how scholars can organize ‘tidy’ data, understand R packages to manipulate data, and conduct basic data analysis.

    transforming data-manipulation data-management distant-reading 2017-07-31 2
  • Jon MacKay

    Dealing with Big Data and Network Analysis Using Neo4j

    In this lesson we will learn how to use a graph database to store and analyze complex networked information. This tutorial will focus on the Neo4j graph database, and the Cypher query language that comes with it.

    analyzing network-analysis 2018-02-20 3
  • Adam Crymble

    Downloading Multiple Records Using Query Strings

    Downloading a single record from a website is easy, but downloading many records at a time – an increasingly frequent need for a historian – is much more efficient using a programming language such as Python. In this lesson, we will write a program that will download a series of records from the Old Bailey Online using custom search criteria, and save them to a directory on our computer.

    acquiring web-scraping 2012-11-11 2
  • Brandon Walsh

    Editing Audio with Audacity

    In this lesson you will learn how to use Audacity to load, record, edit, mix, and export audio files.

    transforming data-manipulation 2016-08-05 1
  • John R. Ladd, Jessica Otis, Christopher N. Warren, and Scott Weingart

    Exploring and Analyzing Network Data with Python

    This lesson introduces network metrics and how to draw conclusions from them when working with humanities data. You will learn how to use the NetworkX Python package to produce and work with these network statistics.

    analyzing network-analysis 2017-08-23 2
  • Stephen Krewson

    Extracting Illustrated Pages from Digital Libraries with Python

    Machine learning and API extensions by HathiTrust and Internet Archive are making it easier to extract page regions of visual interest from digitized volumes. This lesson shows how to efficiently extract those regions and, in doing so, prompt new, visual research questions.

    acquiring api 2019-01-14 2
  • Adam Crymble

    Using Gazetteers to Extract Sets of Keywords from Free-Flowing Texts

    This lesson will teach you how to use Python to extract a set of keywords very quickly and systematically from a set of texts.

    acquiring data-manipulation 2015-12-01 2
  • Evan Peter Williamson

    Fetching and Parsing Data from the Web with OpenRefine

    OpenRefine is a powerful tool for exploring, cleaning, and transforming data. In this lesson you will learn how to use Refine to fetch URLs and parse web content.

    acquiring data-manipulation web-scraping api 2017-08-12 2
  • William J. Turkel and Adam Crymble

    From HTML to List of Words (part 1)

    In this two-part lesson, we will build on what you’ve learned about Downloading Web Pages with Python, learning how to remove the HTML markup from the webpage of Benjamin Bowsey’s 1780 criminal trial transcript. We will achieve this by using a variety of string operators, string methods, and close reading skills. We introduce looping and branching so that programs can repeat tasks and test for certain conditions, making it possible to separate the content from the HTML tags. Finally, we convert content from a long string to a list of words that can later be sorted, indexed, and counted.

    transforming python 2012-07-17 2
  • William J. Turkel and Adam Crymble

    From HTML to List of Words (part 2)

    In this lesson, you will learn the Python commands needed to implement the second part of the algorithm begun in the lesson ‘From HTML to a List of Words (part 1)’.

    transforming python 2012-07-17 2
  • Jon Crump

    Generating an Ordered Data Set from an OCR Text File

    This tutorial illustrates strategies for taking raw OCR output from a scanned text, parsing it to isolate and correct essential elements of metadata, and generating an ordered data set (a python dictionary) from it.

    transforming data-manipulation 2014-11-25 3
  • Justin Colson

    Geocoding Historical Data using QGIS

    Learn how to use QGIS to convert lists of place names in to geographic coordinates, allowing you to map them.

    transforming mapping 2017-01-27 2