Gale Digital Scholar Lab

Back in January 2021, the University Library’s E-Resources Team signed a new agreement with Gale Cengage that not only increased the number of major digital archives available to members of the University of Cambridge, but also gave us access to Gale’s Digital Scholar Lab (DSL). This is an exciting development for budding DH-ers in Cambridge!

 

screenshot_2021-04-28_at_15.51.19.png

 

Based on a relatively simple “Build -> Clean -> Analyse” workflow, Gale’s DSL provides the interface and tools that make it possible to undertake text and data mining (TDM) using large datasets derived from digital archives, in an easy to use and understand way.    

The primary goals of the Digital Scholar Lab are to tackle some of the barriers into TDM that many researchers might face:

Create data to analyse

Proximity of tools to the data corpus

Handle larger/more complex corpus

Reduces time it takes to build & prepare corpus

Easy to learn, no need to use command line or python/R

Whilst Cambridge Digital Humanities facilitates a variety of formats of training in more detailed and nuanced TDM tools and methods, and importantly tackles potential ethical issues, Gale’s Digital Scholar Lab provides a concise environment in which you can begin to explore the possibilities of TDM with plenty of clear advice and help to get you going. The platform allows a high level of granularity when curating your data set, with features such as a clear indication of OCR confidence level that will soon have you feeling positive about undertaking your first sentiment analysis!

 

Just having a quick play, I created two data sets, one based on a search for “Isaac Newton” and one for “Board of Longitude”. These are both large manuscript archive collections we ourselves have digitised on Cambridge Digital Library and I wondered if searching Gale’s digitised newspaper archives and analysing the results might demonstrate any impact of the act of us making those digital manuscripts available? Would the availability of our own digitised collections have an impact on the quantity of their occurrence in news articles, and would there be any perceived impact in the sentiment with which these subjects are discussed? Quite broad and ambitious goals… but after only a short time playing around with the DSL, I managed to get results that, perhaps don’t show much, but do make me wonder enough to want to explore the idea further.  

 

I first ran a sentiment analysis tool on a data set containing occurrences of the term “Board of Longitude” and this was the result:

 

screenshot_2021-04-28_at_16.40.26.png

 

This shows a marked decline in the sentiment score in more recent decades. I’ll be honest, I have no idea what that might mean but it stands out enough to makes me curious and perhaps warrants further investigating to explore if there is any evidence of a connection to our making the digitised archive of the Board of Longitude available back in 2013?

 

Our own Cambridge Digital Library platform was launched in late 2011, the first collection consisting of the manuscript archives of Isaac Newton. Next, I ran the same sentiment analysis tool on a data set containing occurrences of “Isaac newton” published within the last four decades:

 

screenshot_2021-04-28_at_16.46.27.png

 

This seems to show a decline in sentiment towards Isaac Newton in the last decades of the 20th century, but several peaks occurring in recent years. Again, I’m not sure at this stage what that really means, but it does demonstrate that there is perhaps something worth pursuing?

 

Fortunately for me, Chris Houghton (Head of Digital Scholarship for Gale) will be joining us to deliver a suite of CDH Labs sessions throughout May 2021. In addition to some introductory sessions to Gale Digital Scholar Lab, together we will also explore the tools in greater depth. For more details of these sessions and to book a place, please see CDH’s list of upcoming events: https://www.cdh.cam.ac.uk/events  

 

A walk-through guide to the DSL, and a complete list of the digital archives available to us in Cambridge from Gale Cengage is available via Cambridge University Library. If you would like to explore TDM further, there is also a Cambridge Libraries LibGuide to explore.

 

Andy Corrigan

Cambridge Digital Library Co-ordinator

  • Posted 4 May 2021

Cambridge Digital Humanities

Tel: +44 1223 766886
Email enquiries@crassh.cam.ac.uk