September 8, 2021

Call for Papers for Computational analysis skills for large-scale humanities data

James Baker and Anna-Maria Sichani

This Call for Papers is now closed

Many thanks to all of those who showed an interest in our first Call for Papers, and especially those who submitted proposals for exciting new lessons. Keep an eye out for the results of this initiative in 2022.

The Programming Historian invites proposals of new tutorials dealing with computational analysis of large-scale digital collections, as a part of a special series developed in partnership with the National Archives and Jisc.

O Programming Historian convida à apresentação de novos tutoriais relacionados com a análise computacional de coleções digitais em grande escala, como parte de uma série especial desenvolvida em parceria com os National Archives e o Jisc.

Le Programming Historian lance un appel à contributions pour de nouveaux tutoriels qui traitent de l’analyse computationnelle de grands corpus de collections numériques, dans le cadre d’une série spéciale développée en partenariat avec les Archives nationales du Rayaume-Uni (The National Archives) et l’organisation Jisc.

Programming Historian los invita a presentar propuestas para lecciones nuevas relacionadas con el análisis computacional de colecciones digitales a gran escala, como parte de la serie especial desarrollada en colaboración con los National Archives y Jisc.

Scholarly research has changed thanks to the proliferation of digital collections and the rapid emergence of computational methodologies and tools.

Awareness of digital methods is growing within the humanities. However, there is more work to be done to bring together scholars and cultural heritage organisations, especially with regards to aligning the skills of researchers with the size and characteristics of digital collections.

To address these challenges, The National Archives (a leading national archive), Jisc (an established provider of digital services for higher education), and the Programming Historian (a publisher of multilingual tutorials that support humanists in learning digital tools and methods) have formed a partnership that aims to publish a series of articles to aid humanities researchers wishing to use digital tools and methods in their analysis of large-scale digital collections.

As a result of this partnership we are delighted to invite authors to submit proposals for article-length tutorials on the computational analysis of large-scale digital collections. We anticipate that proposed articles will seek to achieve one or more of the following:

Teach humanities scholars how to solve humanities problems related to working with digital data;
Use digital collections as test beds for explaining a computational technique, and/or workflow;
Show how a computational methodology or technique can be applied to a digital collection in order to generate initial findings it as a precursor to in-depth research;
Demystify ‘big data’ analysis techniques for a humanities audience;
Describe methods that advance humanities research questions through the analysis of large-scale digital collections;
Demonstrate ‘Minimal Computing’ approaches to the analysis of large-scale digital collections and thereby meet the needs of scholars working ‘under some set of significant constraints of hardware, software, education, network capacity, power, or other factors’.

Examples of the kind of large-scale collections that would be in scope are digitised texts, email archives, social media data, web archives, bibliographic datasets, image collections, and catalogue data. This is not exhaustive, however, and no type of large-scale research collection is a priori excluded.

The deadline for submitting proposals is Friday 8th October 2021. Proposals can be for articles to be written in any language currently supported by the Programming Historian (English, Spanish, French, Portuguese). Proposals will be reviewed by a panel convened by the Programming Historian team. The panel will recommend those proposals most suitable to go forward for publication. Authors of both successful and unsuccessful proposals will be notified circa Monday 25th October 2021.

We aim to select up to 7 original articles to go forward for publication. Authors whose proposals are accepted to go forward for publication will receive an honorarium of £500. Articles selected to go forward for publication must be submitted by January 24th 2022 using the Programming Historian publication workflow (see the Programming Historian Author Guidelines, available in four languages). Publication of articles is subject to peer review. All published articles will be published under a CC-BY license. All published articles will be translated into a second language by a translator.

To submit a proposal, email proghist.data.cfp@gmail.com with the following details (which need be no longer than 1-2 pages):

## About You
Your name
Your primary email address

## Tutorial Metadata
Submission Language (delete as appropriate) English / Español / Français / Portuguese
Proposed Tutorial Title
Tutorial Abstract (3-4 sentences)
Case Study Description (details about your historical example or problem)
Learning Outcomes (between 2 and 3)
Research Phase most relevant to your tutorial (delete as appropriate) Acquire / Transform / Analyze / Present / Sustain
Research Area most relevant to your tutorial (delete as appropriate) APIs / python / data management / data manipulation / distant reading / set up / linked open data / mapping / network analysis / web scraping / digital publishing / other
Primary dataset(s) or research collections(s) on which the proposed tutorial is based, including current access and licensing conditions (see Note below)
Intended Submission Date [which should be no later than January 24th 2022]
Tutorial will use open technology and data at no cost to the reader Yes / No
Any other comments

Prospective authors are encouraged to consult the Programming Historian Author Guidelines and articles already published by the Programming Historian to get a sense of what makes a good Programming Historian article. There will be an author event on Thursday 23rd September 2021 at 14:00 BST (click here to book your place using Eventbrite) at which questions/queries can be raised. If you are unable to attend the event, questions/queries can also be directed to proghist.data.cfp@gmail.com.

Note: Two of the project partners - The National Archives and Jisc - are major providers of large-scale digital collections, and enhanced support would be available to authors who elected to base their article on one or more of these (some of which are listed below). However, authors are not required to base their proposed articles around datasets provided by The National Archives or Jisc, and the selection of proposals to go forward for publication will not favour articles that use these particular datasets.

The project would particularly welcome lessons which engage with web archives or large email corpuses, since these are currently particularly difficult for researchers to work with. Datasets available from The National Archives and Jisc include:

Archives Hub (https://archiveshub.jisc.ac.uk/)
the UK Medical Heritage library, on the Jisc Historical Texts platform (https://ukmhl.historicaltexts.jisc.ac.uk/home)
British Library 19th Century Books, also on the Jisc Historical Texts platform, (https://historicaltexts.jisc.ac.uk)
The National Archives’ Discovery platform (https://discovery.nationalarchives.gov.uk/). A sandbox API is available athttps://discovery.nationalarchives.gov.uk/API/sandbox/index.
The UK Government Web Archive (http://www.nationalarchives.gov.uk/webarchive/)

Other useful sources of data include:

The UK Web Archive (http://data.webarchive.org.uk/opendata/)
The GeoCities special collection at the Internet Archive (https://archive.org/web/geocities.php)
The Enron email corpus (https://www.cs.cmu.edu/~enron/)
Library of Congress Web Archive datasets (https://labs.loc.gov/work/experiments/webarchive-datasets/)
The GeoCities archive at Arquivo.pt (https://arquivo.pt/searchGeocities/)
The Portuguese Web Archive Arquivo.pt through its API tools (https://www.arquivo.pt/api)

Frequently Asked Questions (FAQ)

Why does the Call for Papers focuses on large-scale data?

We believe there is an appetite for learning how to work with large-scale (or big) data in the humanities and a lack of resources that teach people how to do so. The aim of this call for papers is to remedy that gap.

Does my lesson need to make use of JISC/TNA datasets?

No. There is absolutely no obligation to use datasets offered by JISC or TNA, and lessons that use them will have no preferential treatment in the selection process.

What should I do if I have more than one idea I would like to submit?

If you drop us a line at proghist.data.cfp@gmail.com, we’d be happy to give you some feedback as to which idea would be better suited for the call.

Where can I find guidance for writing my lesson?

You can read the Programming Historian Author Guidelines, which are available in all four of our languages. You can also read the Editor and Reviewer guidelines to get a sense of what we are looking for in a Programming Historian lesson.

How will I know you have received my application?

We will reply to your email application within two working days, acknowledging its receipt.

What is the selection criteria?

Proposals will be selected based on well they fit the remit of the call (the computational analysis of large-scale digital collections), and their originality in relation to other Programming Historian lessons (that is, your lesson can overlap and build upon existing lessons, but should not simply repeat its methods on a large dataset).

Who makes up the selection panel?

The panel is likely to be constituted by members of The Programming Historian and representatives from Jisc and TNA (though the exact composition is to be decided).

When will I hear if my lesson has been accepted?

We aim to inform all applicants of the outcome of their proposals by Monday 25th of October 2021.

Can I appeal the decision?

The panel decision is final and there is no appeal process. However, keep in mind that this call will only result in (up to) seven accepted lessons and the decision is not, therefore, a reflection on the quality of your proposal, but rather of its suitability to the call.

Can I still publish my lesson in The Programming Historian even if it hasn’t been selected for this call?

Yes! In fact, we encourage you to do so. If your proposal is not selected but you would still like to submit it for publication, follow the regular proposal submission process, also available in French, Spanish, and Portuguese.

Are you able to take personal circumstances (e.g. parental leave) into account for deadlines?

We understand that not everyone has the same time availability and that this might have an impact on the deadlines we sketched above. If your proposal is accepted, get in touch and we will try to find an agreement that works for everyone.

Will I need to translate (or arrange for the translation) of the lesson myself?

No. If your proposal is accepted, we will find someone to translate your work into another of The Programming Historian’s languages.

Do I need a GitHub account or to learn Markdown?

All of our publication workflow happens on GitHub, so you will need a GitHub account and a basic understanding of the platform. Similarly, all our articles are written in Markdown - which is surprisingly simple and, if you need an extra hand, there is a Programming Historian lesson to help you with it.

Why is the peer review process open?

Transparency is an important value of The Programming Historian. Our entire workflow is openly available, and that includes the peer review process. In keeping this normally secretive process open, we aim to foster an environment in which the review is as supportive and useful as possible. Read more about our peer review process in the Author Guidelines.

What should I do if my dataset is too big?

We encourage all of our authors to think about the sustainability and reproducibility of their lessons. In the particular case of this call, those concerns might have an impact on their dataset which might be too large (occupying a large amount of storage) or their processing too resource-intensive (as is the case for many AI methodologies). If you think your dataset might be too big, consider providing only the minimum subset of data necessary to demonstrate your method.

Can I make use of cloud services (e.g. Google Colab or Kaggle) for data processing in my lesson?

The sustainability of our lessons is very important to The Programming Historian and, as a rule, we tend to advise our authors not to base a lesson on an external service that might not be available a few years from the date of publication. Whenever possible, we prefer that our readers be able to work with any software that can be locally installed or configured on their machine. Given the nature of this call, however, this might not be possible: if that is the case for you, get in touch with us and we will discuss the situation in order to to reach the best result for your lesson and for our readers.

This call for papers is supported by the project ‘Programming Historian publications: developing computational skills for digital collections’, a partnership between Jisc, the Programming Historian, and The National Archives. For more information on the partnership see the partnership announcement.

About the authors

James Baker is Professor of Digital Humanities at the University of Southampton.

Anna-Maria Sichani is a literary and cultural historian and a Digital Humanist. She is currently a postdoctoral Research Associate in Digital Humanities at Digital Humanities Research Hub, School of Advanced Study, University of London.

Donate to Programming Historian today!