This event spans multiple dates:
11 Mar 2021 11:00 to 12:00 Online event
18 Mar 2021 11:00 to 12:00 Online event


Methods Workshop: Introduction to Text-mining with Python
Mary Chester-Kadwell (CDH Methods Fellow)

Please note this workshop has limited spaces and an application process in place. Application forms should be returned to CDH Learning ( by Monday, 22 February 2021. Successful applicants will be notified by end-of-day Thursday, 25 February 2021. Preparatory material will be released on Thursday 4th March, one week in advance of the first session.​
Text-mining is extracting information from unstructured text, such as books, newspapers, and manuscript transcriptions. This foundational course is aimed at students and staff who are new to text-mining, and presents a basic introduction to text-mining principles and methods, with coding examples and exercises in Python. To discuss the process, we will walk through a simple example of collecting, cleaning and analysing a text.
If you are interested in attending this course, please fill in the application form. Places will be prioritised for students and staff in the schools of Arts & Humanities, Humanities & Social Sciences, libraries and museums. If you study or work in a STEM department and use humanities or social sciences approaches you are also welcome to apply.
By the end of this course you should be able to:
Understand the broad overview of different text-mining methods and their uses.
Plan a basic text-mining pipeline for your work.
Expand your skills in using Python and Jupyter Notebooks into text-mining.
We will cover:
What text-mining is for and what text-mining methods are available (including topic modelling, sentiment analysis, named entity recognition).
The text-mining pipeline and 5 steps of text-mining: choosing and collecting text, cleaning and preparing, exploring, analysing and presenting results.
Revision of basic Python:
Working with text using strings and manipulating lists of strings;
Importing code and calling functions;
Using Jupyter notebooks.

Methods for:
Harvesting text from the web;
Reading from and saving text to files;
Working with TEI-XML;
Cleaning up text (normalising);
Splitting strings into words and sentences (tokens);
Removing unwanted words (stopwords);
Counting tokens (frequency analysis);
Visualising results.

Next steps: resources and directions.
This course takes a ‘flipped classroom’ approach whereby much of the learning takes place self-paced in your own time. Preparatory material is released in the week before the course takes place. The course starts with a 1-hour remote video session to introduce the topics and materials, and ends with another 1-hour remote video session to discuss progress and next steps. Self-paced materials are provided to work through in between the sessions. A chat forum will be used on Moodle for asking/answering questions during the week.
Please make sure you can plan time in your schedule to complete the preparatory and self-paced materials in order to get the most out of the course. Time estimates for working through these materials are as follows:
Preparatory materials (total: 15 minutes-3 hours):
Introductory video: 15 minutes
Optional: Installing Python: 1 hour
Optional: Revision of basic Python: 1-2 hours

Self-paced Jupyter Notebooks (total: 2-4 hours) 
The amount of time you may wish to spend on the self-paced materials depends on your pre-existing experience and own personal goals.
We expect you to have some basic knowledge of Python, or coding in another language. At a minimum, we recommend that you have attended the CDH Basics session “First steps in coding and Jupyter Notebooks” and subsequently done some follow-on independent learning in basic Python. Alternatively, you may have equivalent basic coding experience in Python or a different language from another course of study.
If you are unsure whether your coding experience is sufficient, please apply anyway and we can talk about it together.
You will need a laptop/desktop to join the sessions and follow the self-paced materials. Installation of Python 3 and Jupyter is needed, but full instructions will be provided in the preparatory materials if you don’t already have these installed.

Cambridge Digital Humanities

Tel: +44 1223 766886