The Cambridge University Library has acquired digital archives from Gale Cengage, a publisher of large primary source materials, including historical documents and newspapers. These digital archives are now available within a new resource called the “Gale Digital Scholar Lab” which has been specifically designed for the purpose of enabling text-mining and analysis.
Using the Lab you can search the archives as you would on their native platforms and build content sets from these search results. You can make multiple content sets and analyse the corpus that you amass using the tools provided in the Lab. The tools available in the Lab now are all Open Source (and it is the ambition of the publisher that these will be expanded on over time): Topic Modelling (Mallet); Frequencies (Lucene); Clustering (SciKit Learn); Parts-of-Speech Tagger (spaCy); Sentiment Analysis (OpenNLP); Named Entity Recognition (spaCy); Ngrams (Lucene).
The Lab promises to open up new possibilities for the relative newcomer to digital scholarship in this area, allowing natural language processing tools to be applied to raw text data (OCR), facilitating new discoveries and insights. The Lab makes much of visualization of results and data and thus lends itself to scholarly sharing and “bridging the gap between scholarly resources and faculty researchers/students”. The Lab facilitates organisation of content sets, including renaming, duplicating and versioning as well as identifying the searches used to create the content set, which makes sharing and reproducing research projects easier than is usually the case.
Archives included in the Lab to which Cambridge has access for analysis are:
17th and 18th century Burney collection
19th century UK periodicals
British Library newspapers
Economist historical archive, 1843–2014
Eighteenth century collections online
Illustrated London News historical archive, 1842–2003
Making of modern law: legal treatises, 1800–1926
Nineteenth century U.S. newspapers
Times digital archive
Times literary supplement historical archive
U.S. declassified documents online
The access to the Lab is on a trial basis to help Cambridge assess its usefulness to the practitioner and to encourage and promote the resource to digital humanities scholarship in Cambridge generally. Access is available now from the details below, up to 31 December 2018. We have requested further “guide”-type materials to help the complete novice get started on the Lab and hope to be able to forward these on soon.
The Gale Digital Scholar Lab can be accessed via the University of Cambridge. To obtain Lab access, please contact ejournals@lib.cam.ac.uk.