|7 Dec 2020
|11:30 - 15:30
Methods Workshop: Automated writing in the age of Machine Learning
Professor Caroline Bassett (Director Cambridge Digital Humanities) and Dr Anne Alexander (Director of Learning, Cambridge Digital Humanities)
Computer programmes which predict the likely next words in sentences are a familiar part of everyday life for billions of people who encounter them in auto-complete tools for search engines and the predictive keyboards used by mobile phones and word processing software. These tools rely on “language models” developed by researchers in fields such as natural language processing (NLP) and information retrieval which assign probabilities to words in a sequence based on a specific set of “training data” (in this case a collection of texts where the frequencies of word pairings or three-word phrases have been calculated in advance).
Recent developments in machine learning have led to the creation of general language models trained on extremely large datasets which can now produce ‘synthetic’ texts, answer questions, summarise information without the need for lengthy or costly processes of training for each new task. The difficulties in distinguishing the outputs of these language models from texts written by humans has provoked widespread interest in the media. Researchers have experimented with prompting GPT-3, a language model developed by OpenAI to write short stories, answer philosophical questions and apparently propose potential medical treatments -although GPT-3 did have some difficulty with the question “how many eyes does a horse have?”. Meanwhile, The Guardian ‘commissioned’ an op-ed from GPT-3.
This Methods Workshop will explore the generation of ‘synthetic’ texts through presentations, discussion and demonstrations of text generation techniques which participants will be encouraged to try out for themselves during the sessions. We will also report back from the Ghost Fictions Guided Project, organised by Cambridge Digital Humanities Learning Programme in October and November this year. The project looks at how ideas about the distinction between ‘fact’, ‘fiction’ and ‘nonfiction’ are shaping the reception of text generation methods and aims to stimulate deeper critical engagement with machine learning by humanities researchers.
Prior knowledge of programming, computer science or Machine Learning is not required. In order to try out the text generation techniques demonstrated during the course you will need access to Google Drive (accessible via Raven login for University of Cambridge users).
The workshop will be delivered live online on 7 December via Zoom over two sessions with a lunch break in between.
Session 1 – 11.30-1pm
Lunch break: 1-2pm
Session 2 – 2 – 3.30pm
The course is open to graduate students and staff at the University of Cambridge. Spaces are limited and must be booked in advance.
Click here to register.