20 Mar 2023 - 24 Mar 2023 Cambridge, UK


This Data School has unfortunately had to be cancelled due to industrial action. You can find out more about the UCU Strikes here:

The Cultural Heritage Data School is a teaching programme which aims to bring together participants from the wider Galleries, Libraries, Archives and Museums (GLAM) sector and academia to explore the methods used to create, visualise and analyse digital archives and collections.

At the data school you will learn new methods and theory in Digital Humanities which will inform and enrich your current research or practice. The school is intensive, but will include time to go over what you learned in the taught sessions. We encourage anyone working with cultural heritage data to apply!

Q&A (25 January, 2pm GMT)
Join us for this Q&A with the school’s convenors to learn more about the content of the data school and the application process. Registration closed.

Theme: Digital Image Curation

At this school we will focus on the curation of digital images, from the process of digitisation to some of the most dazzling ways of presenting these materials, like online content curation and 3D models. Students will learn from leading researchers working on projects at the forefront of tackling those issues, in practical ways as well as critically addressing theoretical assumptions about these practices, equipping them with tools, methods, and ideas to face the challenges GLAM institutions are facing in the ever-changing digital environment.

Nonetheless, other relevant subjects for people working in the Digital Humanities sector will also be addressed, including text processing and encoding, and machine learning for collections. Participants will also have the opportunity to have horizontal discussions and networking with peers and researchers from Cambridge University.

We will also visit the Digital Content Unit at Cambridge University Library, have hands-on exercises with teachers available on-site to help, and go out for dinner.

Modules will cover the following content:

  • Machine Learning for Large-scale Image Collections
  • Principles of Galleries, Libraries, Archives, and Museums (GLAM) Imaging
  • Managing 3D Content
  • Automating the Archive: From Card Catalogues to Computer Bots
  • Geospatial DH with QGIS
  • Named Entity Recognition with Python
  • Digital text markup and TEI
  • Workshop by the (Anti) Colonial Archives Working Group

You can view the full programme and module descriptions here: CHDS Programme

Note: content and timings may be subject to change.

Who can apply?

The school welcomes applications from all backgrounds.

You might be in galleries, museums, archives, libraries, higher education, research – or a different sector entirely. Anyone who works with cultural heritage data is welcome to apply.

No previous experience of coding is required and there are no specific academic requirements, however the course content is broadly suitable for those with an undergraduate degree or equivalent professional experience. The School is taught in English. You will need to bring your own laptop to join in, and the ability to download free, open software for use during the school.

We are committed to facilitate participation by women, black and minority ethnic candidates as they have historically been under-represented in the technology and data science sector. We also welcome applications from outside the UK, assuming they can travel to Cambridge for the week of the data school. Sessions will not be streamed or recorded and therefore live attendance is required.

When and Where

The school will be held in person in Cambridge, UK: 20-24 March 2023. Sessions will take place between 9am and 5pm daily in the central University buildings on Sidgwick Site:

You will need to book your own travel and accommodation for the school. You can look for accommodation on the following sites:


Teaching will be by University of Cambridge staff and industry professionals.

The full teaching team will be published in January.


  • £695 per person

This fee covers around 20 hours of sessions, access to resources, discussion groups with top practitioners and technical drop-in sessions.

Cambridge Digital Humanities is committed to democratising access to digital methods and tools, and is offering subsidised participation fees to encourage applications from those who do not normally have access to this type of training. There are a limited concessionary places for the unemployed, community or unfunded project researchers, and Global South residents. In addition, a small number of full bursaries are available to those who can demonstrate financial need. You can apply for concessionary and bursary places on the application form, but we may not be able to give concessions to everyone who applies. We will assume that if you have applied for a concession and we are not able to offer one, you will not be able to take up the place.

We are not able to cover accommodation, transport or visa costs.

How to apply

Applications will be considered on a rolling basis until all the places are filled, so we encourage you to fill out your application as soon as possible. The application form will close on February 15th. You will hear by February 20th at the latest whether or not your application was successful.

The Cultural Heritage Data School has limited places. During your application you should make best use of the free text sections to explain your current experience, and what you would get out of attending the school.

This school has been cancelled due to industrial action.



Modules will take place between 10am-5pm each day

In-person CHDS March 2023 Modules

Machine Learning for Large-scale Image Collections

with Dr Leo Impett, convenor of the MPhil in Digital Humanities at CDH

(More information coming soon)


Principles of GLAM Imaging  

with Maciej Pawlikowski

This session will cover learning about principles of archival imaging standards and practical approach to taking images fit for project purpose. It will provide the participants with the basic vocabulary and understanding of methodological approach to digitisation applicable to any project. It may also address any more advanced imaging topics such as image stitching, Optical Character Recognition, Multispectral Imaging or Photogrammetry if these are in the interest of the participants.


Managing 3D Content 

with Andy Corrigan, Cambridge Digital Library Coordinator, Cambridge University Library

The materiality of our collections is often overlooked by digitisation, which can remove the sensations of touch, smell and sound. Limiting the data we present to simple flat image and text formats impedes our ability to engage fully with our objects, no matter how flat you think they are. But the complexity of 3D imaging techniques can be a daunting minefield. In this module, we’ll first explore some of the different modalities such as RTI (reflectance transformation imaging), CT Scanning and photogrammetry. Hosting these digital objects so that users can find and engage with them is the next step, and we’ll take a look at two different platforms – Sketchfab (https://sketchfab.com/) and MorphoSource (https://www.morphosource.org/). Finally, we’ll discover how interoperability can enable us to engage users with complex imaging data and explore some ongoing initiatives that will help you keep an eye on developments in this fast-paced area. We’ll be using past and live projects as case studies such as the Dimensions of Darwin initiative (https://www.cdh.cam.ac.uk/media/blog/dimensions-of-darwin/), and after the session, you will have the opportunity to go away and create your own virtual story using mixed digital media using the Exhibit.so platform (https://www.exhibit.so/).


Automating the Archive: From Card Catalogues to Computer Bots

With Dr Siddharth Soni, Isaac Newton Trust Research Fellow at CDH

Archive is, after all, a technology. Its methods and processes are designed to record and preserve experiential memory into matter. As archivists and knowledge-workers, we engage with the technology behind the archive all the time. In this lecture, I explore some of these technologies, from card catalogues to computer bots. I examine their origins in war-time bureaucracy, and their basis in colonial conceptions of social memory, governmentally, and rule.


Geospatial DH with QGIS 

with Dr Jess Parr, Northeastern University

(More information coming soon)


Named Entity Recognition with Python

with Mary Chester-Kadwell, Senior Software Engineer at Cambridge University Library

Text-mining is extracting information from unstructured text, in other words, text that has not been encoded with semantic markup. In this module we will look at one way of extracting information from unstructured text by recognising named entities automatically. A named entity is any type of real-world object or concept, such as a person, organisation, location or date. Using the example of letters from the 19th-century botanist John Stevens Henslow, we will introduce how to: recognise and visualise named entities using machine learning; create training data for improving the results; and link named entities to existing knowledge bases.

Participants will be able to choose either a ‘no code’ or a ‘Python’ track for this module. Everyone will join the same virtual sessions, and have access to the same self-paced study materials and exercises, but the suggested directions given will be different depending on which track you choose to follow. For those with experience in Python, the materials include a set of Jupyter notebooks using the spaCy NLP library, but prior knowledge of Python is not required in order to complete the module.


Digital text markup and TEI

with Huw Jones, Head of the Digital Library Unit, and Yasmin Faghih, Head of the Near and Middle Eastern Department at Cambridge University Library

The TEI (Text Encoding Initiative https://tei-c.org/) is a standard for the transcription and description of text bearing objects, and is very widely used in the digital humanities – from digital editions and manuscript catalogues to text mining and linguistic analysis. This module  will take you through the basics of the TEI – what it is and what it can be used for – with a particular focus on uses in research, paths to publication (both web and print) and the use of TEI documents as a dataset for analysis. There will be a chance to create some TEI yourself as well as looking at existing projects and examples. The module will take place over two sessions – with an introductory taught session, then a chance to work on TEI records yourself, followed by a review and discussion session.


Workshop by the (Anti) Colonial Archives Working Group 

(More information coming soon)

Cambridge Digital Humanities

Tel: +44 1223 766886
Email enquiries@crassh.cam.ac.uk