We invite you to browse around. If you can’t find a skill, technology, or tool covered here, please let us know!
Application Programming Interfaces (APIs)
APIs let you programmatically request specific information from a website. Learn how to use them.
You put a lot of effort into your research. Make sure that effort lasts by adopting sustainable strategies to your code, your data, and your research processes. A little bit of planning can save you a lot of time.
Just like it sounds, learn how to use programming to change, move, clean, or count data. These are essential techniques for preparing data to be used in various tools.
- Introduction to the Bash Command Line
- Counting and mining research data with Unix
- Cleaning Data with OpenRefine
- Understanding Regular Expressions
- Cleaning OCR’d Text with Regular Expressions
- Transliterating non-ASCII Characters with Python
- Generating an Ordered Data Set from an OCR Text File
Getting Ready to Program
If you’re new to programming in python, you’ll first need to set up a programming environment. For the most flexibility, we recommend that you follow these instructions on setting up python on the command line.
Many of the tutorials require that you install one or more python modules to save you time coding. If this is a new concept for you, read about how to Install Python Modules.
Mapping and GIS
Mapping can be an effective way to visualize and interpret historical data. These lessons introduce historical geographic information systems (GIS) using open source software.
- Intro to Google Maps and Google Earth
- Installing QGIS 2.0 and Adding Layers
- Creating New Vector Layers in QGIS 2.0
- Georeferencing in QGIS 2.0
Omeka Exhibit Building
Learn how to present historical materials online.
A topic modeling tool takes a single text (or corpus) and looks for patterns in the use of words; it is an attempt to inject semantic meaning into vocabulary. It can help you to very quickly find ‘topics’ in a large corpus of texts.
Learn how to use programming to download material from the Internet in a controlled, semi-automated manner.
- Datamining the Internet Archive Collection
- Automated Downloading with Wget
- Applied Archival Downloading with Wget
- Intro to Beautiful Soup
- Downloading Multiple Records Using Query Strings
The Original Programming Historian
The Programming Historian was originally written as a series of lessons that were intended to be followed in sequence. The other lessons on the site are mostly independent of one another, and can be followed in any order.
- Python Introduction and Installation
- Installation Instructions for Mac, Linux, or Windows
- Viewing HTML Files
- Working with Text Files
- Code Reuse and Modularity
- Working with Web Pages
- Manipulating Strings in Python
- From HTML to a List of Words (part 1)
- From HTML to a List of Words (part 2)
- Normalizing Data
- Counting Frequency
- Creating and Viewing HTML Files with Python
- Output Data as an HTML File
- Keywords in Context (Using n-grams)
- Output Keywords in Context in HTML File