From Corpus to Context: Word Embeddings as a Digital Humanities Research Methodology

10 May 2023

13:00 - 17:00

Milstein Room, University Library

Description

Description

Speaker: Mark Algee-Hewitt, Associate Professor of English and Director of the Stanford Literary Lab.

About this Methods workshop

At the heart of many of the current computational models of language usage, from generative A.I. to recommendation engines, are large language models that relate hundreds of thousands, or millions, of words to each other based on shared contexts. Mysterious products of complex modelling algorithms, these objects raise a number of practical (and ethical) questions for Humanities scholars: How are these language models created? What kinds of relationships does their math encode? How do biases in the corpus affect the model? And how can we effectively use them to answer humanities-based questions?

In this workshop, we will explore these questions using a medium-sized language embedding model trained on a corpus of novels. Using approachable code in the R software environment, participants will learn how to manipulate a model, assess similarities and difference within it, visualise relationships between words and even train their own embeddings.

About the speaker

Mark’s research uses quantitative and statistical methods to explore questions of humanities interest, particularly around the literature and philosophy of the long eighteenth century in Britain and Germany. His current project leverages word embedding models to study the history and evolution of concepts in the eighteenth and nineteenth centuries, focusing particularly on aesthetic theories formed around the nascent literature study. In the Stanford Literary Lab, he has led a variety of collaborative projects that use computational methods to explore textual data from the late Medieval period until the present, including a project on the formal causes of suspense in literary texts, a project that explores the evolution of disciplinary writing styles, a project on the use of neologisms for world-building in contemporary Science Fiction texts; and a project on the development of the short story in twentieth-century women’s magazines. His background includes English literature, literary theory, and computer science degrees.

He has taught courses at the graduate and undergraduate levels in the fields of literary study, environmental humanities, and humanities computation/digital humanities. Currently, he is leading a multi-institutional collaborative project that seeks to understand the effectiveness of novels that centre climate change for large-scale public science education. He has held grants from the National Endowment for the Humanities, the Social Sciences and Humanities Research Council of Canada, and the Human Centered Artificial Intelligence Initiative at Stanford University. He has been involved in the international digital humanities community since 2012 and sits on the Journal of Cultural Analytics editorial board.

This in-person workshop is open to graduate students and staff at the University of Cambridge. Early career researchers are particularly encouraged to apply.

From Corpus to Context: Word Embeddings as a Digital Humanities Research Methodology

Description

Speaker: Mark Algee-Hewitt, Associate Professor of English and Director of the Stanford Literary Lab.

About this Methods workshop

About the speaker

About CDH

Get in touch

Legal links

Stay Connected

Receive our Newsletter

Cambridge Digital Humanities