This event spans multiple dates:
24 Mar 2022 11:00 to 12:00 Online event
31 Mar 2022 11:00 to 12:00 Online event


Please note this workshop has limited spaces and an application process in place.

Text-mining is extracting information from unstructured text, such as books, newspapers, and manuscript transcriptions. This foundational course is aimed at students and staff who are new to text-mining, and presents a basic introduction to text-mining principles and methods, with coding examples and exercises in Python. To discuss the process, we will walk through a simple example of collecting, cleaning and analysing a text.

Places will be prioritised for students and staff in the schools of Arts & Humanities, Humanities & Social Sciences, libraries and museums. If you study or work in a STEM department and use humanities or social sciences approaches you are also welcome to apply.

By the end of this course you should be able to:
— Understand the broad overview of different text-mining methods and their uses.
— Plan a basic text-mining pipeline for your work.
— Expand your skills in using Python and Jupyter Notebooks into text-mining.

We will cover:
— What text-mining is for and what text-mining methods are available (including topic modelling, sentiment analysis, named entity recognition).
— The text-mining pipeline and 5 steps of text-mining: choosing and collecting text, cleaning and preparing, exploring, analysing and presenting results.
— Revision of basic Python:
— Working with text using strings and manipulating lists of strings;
— Importing code and calling functions;
— Using Jupyter notebooks.

Methods for:
— Harvesting text from the web;
— Reading from and saving text to files;
— Working with TEI-XML;
— Cleaning up text (normalising);
— Splitting strings into words and sentences (tokens);
— Removing unwanted words (stopwords);
— Counting tokens (frequency analysis);
— Visualising results.

Next steps: resources and directions.
This course takes a ‘flipped classroom’ approach whereby much of the learning takes place self-paced in your own time. Preparatory material is released in the week before the course takes place. The course starts with a 1-hour remote video session to introduce the topics and materials, and ends with another 1-hour remote video session to discuss progress and next steps. Self-paced materials are provided to work through in between the sessions. A chat forum will be used on Moodle for asking/answering questions during the week.

Please make sure you can plan time in your schedule to complete the preparatory and self-paced materials in order to get the most out of the course. Time estimates for working through these materials are as follows:
— Preparatory materials (total: 15 minutes-3 hours):
— Introductory video: 15 minutes
— Optional: Installing Python: 1 hour
— Optional: Revision of basic Python: 1-2 hour
–Self-paced Jupyter Notebooks (total: 2-4 hours)

The amount of time you may wish to spend on the self-paced materials depends on your pre-existing experience and own personal goals.

We expect you to have some basic knowledge of Python, or coding in another language. At a minimum, we recommend that you have attended the CDH Basics session “First steps in coding and Jupyter Notebooks” and subsequently done some follow-on independent learning in basic Python. Alternatively, you may have equivalent basic coding experience in Python or a different language from another course of study.
If you are unsure whether your coding experience is sufficient, please apply anyway and we can talk about it together.

You will need a laptop/desktop to join the sessions and follow the self-paced materials. Installation of Python 3 and Jupyter is needed, but full instructions will be provided in the preparatory materials if you don’t already have these installed.

Application timeline
To apply for a place, please visit UTBS to request a place and complete the application.
Applications close end-of-day Monday, 7 March 2022
Applicants notified outcome by end-of-day Thursday, 10 March 2022

Cambridge Digital Humanities

Tel: +44 1223 766886