skip to content

The Machine Reading the Archive programme offers support for new and existing digital archives projects through a series of mentoring sessions which we hope will enable researchers to explore new methods and access expert advice on digital project design.

Short descriptions of selected projects in our 2019/20 cohort are below.

Eoin Carter

My project looks at the development of the intellectual culture of the 'zetetic societies', a subset of British working class radicalism of the 1820s, much of which was conducted by correspondence while imprisoned for blasphemy and sedition. The key datasets are the personal papers of one leader, Richard Carlile (c. 550 letters, fully transcribed by me), as well as the correspondence and articles he printed in his journals (published weekly through the 1820s). As part of this I want to; Map Carlile's personal correspondence spatially and diachronically to show his importance as a central node in British radicalism of the period, even while in prison. Map his correspondents’ locations against historical population to argue the movement was, contrary to stereotype, as much a rural as an urban phenomenon, and having devised an article classification schema (public address, editorial, correspondence etc), show quantitatively that earlier historiography has mischaracterised Carlile's journals as mere vehicles for his own voice, when in fact they were much more collaborative spaces for radical discourse.

Eoin Carter is a PhD candidate at the Department of History and Phiiosophy of Science, University of Cambridge.

Andrew Corrigan

I would like to explore the potential of automated image analysis to improve the accessibility of digitised archive collections. Using machine learning methods to extract data from images, focusing on four potential themes; Geometric shapes, Anthropomorphic or zoomorphic forms, Colour and tone and Similarity. In doing so, I aim to then explore if this data is suitable to be fed back into the images and act to enhance the experience of interacting with digital archives. I also aim to demonstrate the potential and current limitations of these tools and methods for more advanced academic purposes through the case-study generated by this project.

Andy Corrigan is the Digital Library Coordinator for Cambridge University Library.

Ying Dai

My PhD project aims at establishing the occupational structure of the Yangzi Valley of China in the long twentieth century using individual-level data from genealogy books. The dataset will provide a quantitative description of China’s economy of this period, respond to the debate concerning the foreign influence on China’s economy before the 1950s, contribute to the comparative research of global industrialisation, and display new phenomena and exposing new questions. Producing the dataset involves much labour in transcribing the books, which will take about a whole year if done manually. Meanwhile, this source is available in other regions of China, and I plan to extend the study regions in my research after the PhD, so there would be a substantial amount of transcribing work. It will be beneficial to use ‘digital’ methods to reduce manual labour in producing the dataset. Now I am using OCR to turn the pictures into machine- readable texts, and it works well with a lot of the books. I then aim to obtain individuals’ birth year, sex, occupation and location from the structured and unstructured texts. Meanwhile, I hope to develop a database that 1) well links the transcribed data with the original photos; and 2) can be connected to ArcGIS to produce maps.

Ying Dai is a PhD candidate at the Faculty of History, University of Cambridge.

Sam Kennerley

This project aims to develop digital approaches to the correspondence of Marcello II Cervini (1501-1555). Cervini was one of the most influential figures of the early Reformation, whose correspondence is stored in 75 volumes at the Archivio di Stato in Florence. However, this material has been little drawn upon by historians, due first to issues of access, and second to the palaeographical problems posed by hundreds of sixteenth-century hands. In June 2018 I visited the Archivio di Stato and photographed over 7000 images from Cervini’s correspondence. This material has been of immense value to my research, sparking a desire to make it more widely accessible to scholars. I have two primary goals for the digital use of this material. First, I aim to create an online, annotated edition of selected correspondence that is of particular importance in this collection. More precisely, I aim to digitise the letters that Cervini exchanged with a single correspondent as a test case for a much bigger project to digitise the correspondence as a whole. Next, Cervini’s secretaries were exceptionally scrupulous, recording the date that letters were dispatched, received, and answered, and on occasion even the route and messenger that communicated them. My second goal would be to exploit this data through GIS to create a detailed map of postal routes, which would be of interest to any scholar concerned about the communication of news and information in sixteenth century Europe.

Sam Kennerley is a research fellow at the Faculty of History, University of Cambridge

Chris Schaefer

My dissertation is a transatlantic cultural and intellectual history of American foreign relations from the Vietnam War until the Iraq War. Of particular interest is The International Herald Tribune, an American newspaper edited in Paris, France from 1887 until 2013. From the 1970s on, it used facsimile and then satellite transmission technology to open remote distribution sites across Europe and then eventually the entire world. In the years before the internet, it was read by Anglophone business, political, and diplomatic elites, which included both American expatriates and non-Americans. 

In its second to last instantiation from 1967 until 2002, which is of most relevance to my project, Paris-based editors re-edited news copy from two of its joint owners, the New York Times and the Washington Post. As such, its corpus provides an excellent opportunity to investigate differences between the original articles from the New York Times and the Washington Post, written for Americans, to the re-edited International Herald Tribune versions, edited by long-time American expatriates for a mixed non-American and American expatriate audience. By placing American coverage of the same events among these three newspapers, my project should reveal something about how different understandings of the world are produced and sustained inside and outside the United States.

Chris Kennerley is a PhD candidate at the Faculty of History, University of Cambridge

Liz Stevenson

During my PhD I aim to use quantitative analysis methods focused on R and text mining methodologies to explore the level to which the perceived gender of a character matches their hierarchical position in gendered and non-gendered relationships through an examination of pronouns, gendered linguistic conventions, and gendered linguistics demonstrated in character language. This exploration will be combined with contemporary scholarship and historical perspectives on these questions in stage literature, such as in historical diarized accounts of the topics being studied. One of the key aims of this project is to reveal new dimensions of early modern scholarship on gender using technical and digital methods that have only previously been applied to later time periods. As the project is currently in its early stages, this year I will be examining the relationship between gendered authority, types, plots and genres of plays, and use of specific language in the form of word-type categories. I envision using R to study the extent to which there is – or is not – a provable relationship between the level to which authority is gendered according to conventional mores of the time period, and the level to which a character spends the main body of a play or piece of literature acting out a traditional gender role. Examples of study for this are characters such as Viola, Rosalind and Coriolanus.

Liz Stevenson is an English, Renaissance candidate at the Faculty of English, University of Cambridge

Future sessions

There are currently no upcoming events..

Past sessions

Title Start Date location
Digital research project design for beginners Tuesday, 17 October, 2017 - 14:00 S2, Alison Richard Building
Curating your own digital archive Thursday, 16 November, 2017 - 11:00 S3, Alison Richard Building
Webscraping for beginners Tuesday, 21 November, 2017 - 14:00 B4, Criminology
How to get bulk data from websites Tuesday, 16 January, 2018 - 11:00 S3, Alison Richard Building
Turn your PDFs into searchable text Tuesday, 23 January, 2018 - 14:00
Beyond words (2): challenges in reading historical document collections at scale Tuesday, 6 February, 2018 - 11:30 Raleigh Seminar Room, Maxwell Centre, Cavendish Laboratory
Automatic Text Recognition: an introduction to Transkribus Monday, 26 March, 2018 - 14:00 S2, Alison Richard Building
Automatic Text Recognition: Diving into the background Tuesday, 27 March, 2018 - 11:00 S1, Alison Richard Building
Text-mining the archive 1 Tuesday, 24 April, 2018 - 11:00 S2, Alison Richard Building
Text-mining the archive 2 Tuesday, 1 May, 2018 - 11:00 S2, Alison Richard Building