skip to primary navigationskip to content
 

Automatic Text Recognition: Diving into the background

When Mar 27, 2018
from 11:00 AM to 01:00 PM
Where S1, Alison Richard Building
Add event to calendar vCal
iCal

Speaker: Professor Roger Labahn (University of Rostock, READ project)

Automatic Text Recognition (ATR) is increasingly becoming an essential core component of application software in Digital Humanities. After years of working with "classical" OCR (Optical Character Recognition) to printed texts, we are now seeing impressive results from applying ATR to demanding handwritten texts. The shift from OCR to ATR requires, however, the development of entirely new paradigms in algorithms and technology: Rather than processing single characters, entire sequences have to be considered, e.g. words, lines, whole text cells or paragraph blocks, and even entire pages. Moreover, rather than meeting the demands of traditional full-text reading, utilization targets like Keyword Searching / Spotting (KWS) and Advanced Text Investigation (ATI) are receiving increasing attention in both the techology and the application domains.

This presentation will offer a general survey of the foundations of these new approaches and explain in more detail selected basic algorithms of contemporary ATR technology, focussing on Machine Learning with mainly Recurrent Neural Networks. We will also explore fundamental decoding ideas, i.e. how to move from the network's 'magic' output to meaningful recognition results, providing a more elaborated introduction to its technological background in order to successfully use advanced KWS software. Finally, we will demonstrate realistic application examples from and with the Transkribus platform.

Spaces are limited and should be booked in advance here

A sandwich lunch will be provided - please email Michelle Maciejewska (mm405 @ cam.ac.uk) by 19 March if you have any specific dietary requirements. If you book a ticket and find you can no longer attend, please cancel through Eventbrite so that we can cater accurately.

What is CDH?

Cambridge Digital Humanities is a creative and collaborative space where students, researchers and international visitors can come together to engage in dialogue, experiment with technology and advance scholarship.

CDH on Twitter