15-28 June 2022 (online)


Overview

The Social Data School is an online summer school structured around the life-cycle of a digital research project.

It will cover:

  • principles of research design
  • data collection
  • cleaning and preparation
  • methods of analysis and visualisation
  • and data management and preservation practices.

The data school welcomes applications from all backgrounds, including journalists, NGOs, activists, trade unionists and members of civil society organisations.


Visualising Data/Investigating Images

This year we will extract data from multiple video and photographic sources to create spatial visualisations for forensic analysis and learn how computer vision algorithms are used to classify images and generate ‘deepfakes’.

We will explore how images can become places of inquiry and instruments of communication, and develop a critical toolkit for interrogating the image-based cultures of the digital age.

Modules will cover the following content:

  • Methodology for Digital Investigations
  • Computer Generated Imagery (CGI) processing and 3D visualisation
  • Geolocation and open source investigations
  • Critical Approaches to Data Visualisation
  • Computer vision, automated image generation and deepfakes
  • Introduction to Social Media data mining, analysis and visualisation

Online participation will be available for all sessions.


Q&A session

Join us for an online Q&A session with the convenors of this year’s Social Data School. Learn more about the school, and the application process, and ask any questions you have.

This session was held on 9 May 2022, and you can watch the recording here.


Organisers

Led by Cambridge Digital Humanities in association with the Minderoo Centre for Technology and Democracy:

www.mctd.ac.uk

The team introduces the school

Watch this series of videos of the teaching team introducing the Social Data School.





When & Where

When and Where

Data School live sessions are timetabled daily 15-28 June online from 1–5pm (BST). To convert this to your timezone you can use this Time Zone Converter. The timetable is still being finalised, but you can view last year’s timetable as an example here.

During the course you will be provided with links to our virtual learning environment (Moodle) where we will publish course content and links to our online video delivery platform (Zoom) for teaching and social interactions.

There is a possibility that we may be able to run the Data School in a hybrid mode for the last two days, with the option of attending project development group workshops, mentoring sessions and the final plenary in Cambridge during the final two days (27 and 28 June 2022). This option would involve full day attendance in person on Monday 27 June followed a two-hour final plenary on the afternoon of Tuesday 28 June. However, all of these sessions will also be available online, enabling remote participation for the entire Data School.

Cost

What’s the cost

Cambridge Digital Humanities is committed to democratising access to digital methods and tools and is offering the following subsidised participation fees to encourage applications from those who do not normally have access to this type of training. The costs include all teaching costs.

  • £245 (Organisational Rate)
  • £45 (Concessionary Rate)
  • Free (Bursary)

Organisational Rate is applicable if you are part of an organisation with paid staff. However, on the application form you will have the opportunity to apply for a discounted rate if needed (for example if your organisation has a small number of paid staff or limited funding).

Concessionary Rate is for students, unemployed, community projects, unfunded projects, and Global South residents.

Bursary: We are pleased to be able to offer bursaries to the data school. You can apply for a bursary on the application form.

Please note that the costs of travel and accommodation for the in person sessions are not included in the Data School fees, however a small number of bursaries may be available to support participants who can demonstrate financial need.

The deadline for payment is two weeks before the start of the School (1 June 2022).

Requirements

Requirements

  • No previous experience of coding is required and there are no specific academic requirements, however the course content is broadly suitable for those with an undergraduate degree or equivalent professional experience.
  • The School is taught in English.
  • You will need a reliable internet connection to join in, and the ability to download free, open software for use during the School.

While we encourage applications from everyone, we particularly welcome applications from women and black and minority ethnic candidates as they have historically been under-represented in the technology and data science sector. We also strongly welcome applications from outside the UK, assuming they can attend the live workshop slots during 1:30-4:30pm GMT. Sessions will not be recorded and therefore live attendance is required.

We will give priority to those who can demonstrate engagement with civil society organisations or the media and who would find it difficult to access this kind of training through their own academic institution. As in previous years, we strongly encourage applications from candidates from the Global South.

Places are generally not open to Cambridge University students or staff, who should be redirected to the CDH Learning pages for information on year-round workshops. If you would like to request an exception to this rule, please get in touch: dataschool@cdh.cam.ac.uk

Teaching

Teaching

The 2022 Social Data School teaching team includes:

Sessions will include live-taught instruction, demonstrations and discussions online, with access to self-paced study materials and support via email-based discussion groups between sessions. Participants will need a laptop or desktop computer and internet access to participate in the sessions. Some sessions will require software installation – full instructions will be provided but please ensure you have access rights to install software on the device you will be using.

Timetable

Timings are in BST (British Summer Time), and are subject to change.

 Date Time Topic
 Wednesday 15 June 1:30–2:00pm Introduction and welcome
 Wednesday 15 June 2:00–3:00pm Session 1: Methodology for Digital Investigations
 Wednesday 15 June 3:15–4:15pm Session 2: Computer Vision: a critical introduction
 Thursday 16 June 2:00–3:00pm Session 3: Introduction to Text Mining: Analysing Fake News with R I
 Thursday 16 June 3:15–4:15pm Session 4: Visual AI, image generation and deepfakes
 Friday 17 June 2:00–3:00pm Session 5: Social Network Analysis with Digital Data I
 Friday 17 June 3:15–4:15pm Session 6: What makes for good and bad data visualisation? I
 Monday 20 June Self-paced Study Day
 Tuesday 21 June 2:00 – 3:00pm Session 7: Introduction to Text Mining: Analysing Fake News with R II
 Tuesday 21 June 3:15 – 4:15pm Session 8: Social Network Analysis with Digital Data II
 Wednesday 22 June 2:00-3:00pm Session 9: The ethics of data collection and provenance
 Wednesday 22 June 3:30-5:00pm Public Event I: Grassroots Data Wranglers
 Thursday 23 June 2:00–3:00pm Public Event II: Spotlight on Geolocation (ft. OSINT for Academics)
 Thursday 23 June 3:15–4:15pm Session 10: What makes for good and bad data visualisation? II
 Friday 24 June 4:00–5:00pm Session 11: Data Spatialisation using Python and Blender I
 Friday 24 June 5:15–6:15pm Session 12: Data Spatialisation using Python and Blender II
 Monday 27 June 10:00-12:00pm Investigation Incubators (Mentoring) AM
 Monday 27 June 2:00–4:00pm Investigation Incubators (Mentoring) PM
 Tuesday 28 June 10:00-12:00pm Investigation Incubators (Mentoring) AM
 Tuesday 28 June 2:00–3:30pm Session 13: Closing plenary and next steps

 

Modules

Methodology for Digital Investigations

(Irving Huerta)

This module addresses fundamental aspects of investigative practice in digital environments and dwells on the importance of using methodology(ies) for data inquiry. Researchers doing investigations using Open Source Intelligence (OSINT) tools, data collection and analysis, as well as developing automated tools for investigations will benefit from this module. It critically reflects on the essential phases of digital investigations at large: Identification of a Problem (formulation of hypotheses), Information Gathering, Preservation, Verification, Analysis, and Dissemination.

By the end of the module, participants would have the principles to conduct investigations that effectively identify, prove, and strategically disseminate issues in the public interest, with fairness and rigour. Its scope is meant to be applied along with the rest of tools and methods from SDS 2022 modules.


Computer vision, automated image generation and deepfakes 

Session 1 (Anne Alexander)

The first session in this module will provide an overview of the basic tasks of computer vision such as Image Classification, Object Detection, Image-to-Image Translation, Image Captioning, Image Segmentation with examples of their application in different sectors of the economy and society. We will introduce participants to a critical framework for understanding computer vision systems as interpretative rather than descriptive tools, exploring the history of their development through a case study investigating the genesis of ImageNet, a massive set of training data which was instrumental in creating conditions for breakthroughs in “deep learning” approaches to designing AI systems.

Session 2 (Leo Impett)

This session will look at AI-generated images, from Hollywood CGI to “deepfakes”. We’ll start by asking how we got here: looking at the first generative image models, including Eigenfaces (from the 1990s) and early computer graphics. We’ll look at what a generative AI model is formally, before looking at the most widely used image generation technique of the last decade, Generative Adversarial Networks (GANs). We’ll try to train and use some GANs, both for generating images from scratch and for converting between ‘image domains’ (a commonly used deepfake technique). Finally, we’ll look at what the future holds for image generation: image diffusion models and text-to-image models (e.g. OpenAI’s DALL·E 1 and 2).


Introduction to Text Mining: Analysing Fake News with R

(Meng Liu)

This module serves as an entry point into the world of text mining. Text mining, also called text data mining or quantitative text analysis, refers to the process of transforming unstructured textual data into a structured format to identify meaningful patterns and to generate new insights. We will go over fundamental concepts such as tokenisation, n-grams, document-term matrix, bag-of-words and tf-idf, with the aim to build a foundational understanding of quantitative text analysis.

We will be working with an open dataset of fake news as well as a variety of packages handy for text mining (e.g., tidyverse and tidytext). One of the outputs of this module will be a hands-on project of topic modelling. Participants will have the opportunity to practise exploring and visualising text data. Study materials and exercises (with code scripts) will be provided to all participants.No prior experience with R is assumed to complete the module: participants can decide whether to code along the self-paced learning materials and exercises (solutions provided).


Social Network Analysis with Digital Data

(Hugo Leal)

“Social network” has become a catch-all term for the online spaces where we connect with other people and trade information in exchange for our personal data and attention. Considering the societal impacts of data-driven economics and politics, knowing how to reclaim and reappropriate these data to trace the form and content of online social networks is a vital skill for journalists, civil society and academics alike.

This module will provide a gentle introduction to the field of social network analysis (SNA) with digital data. Social Data School participants will be given the opportunity to “learn by doing” the process of digital data collection as well as the basics of social network visualisation and analysis. After being introduced to the fundamental concepts of SNA, the participants will explore all stages of a social network analysis project, including research design, data collection, data wrangling, graph visualisation, and analysis with essential network measures. The focus will be on the retrieval of electronic archival data (e.g., social media platforms) for non-programmers, and on practical examples of network analysis with specialised software (e.g., Gephi). At the end of the two sessions, participants will be equipped with the basic tools to perform meaningful visualisations and analyses of network data. Typical use cases of SNA range from investigative journalism to NGO monitoring and academic research.


What Makes for Good and Bad Data Visualisation?

(Tobias Lunde)

Data visualisation is a potentially very effective investigative tool which can be used to explore and ‘see’ data and identify patterns and structures hidden within it, and to communicate these to others.

This module will cover basic principles of what makes for good data visualisation, make suggestions for practical considerations like use of colour and type, and discuss how to use different types of graphs to effectively explore and visualise different types of data.

However, data visualisation can also easily deceive and mislead, and we will cover some common pitfalls and how to avoid these.

We will also look at some popular examples of data visualisations from the media, and discuss their strengths and weaknesses.


The ethics of data collection and provenance 

(Anne Alexander)

Where you find your data and how you capture it raises a number of ethical and legal issues for researchers, journalists and activists alike. This session will explore the implications for ethical project design of different methods for collecting data and introduce participants to good practice in documenting the provenance of data from online sources. We will also discuss how the architecture of digital platforms affects ethical data collection and consider data protection, intellectual property and other legal frameworks which are likely to affect your work.


Public event: Spotlight on Geolocation

(Laetitia Maurat and Nik Yazikov, from the Digital Verification Corps’s OSINT for Academics Course)

This session will cover geolocation, a crucial stage of any open-source investigation. Geolocation seeks to answer a key question: where did the events depicted happen? We will explore the basic principles of geolocation and introduce participants to a range of tools and techniques. We will cover essential resources including Google Earth Pro and Mapillary, and highlight the advantages of different data sources in a platform-agnostic manner.

This workshop aims to encourage a reflexive and critical approach to open-source data, introducing practical skills while emphasising the importance of ethical and transparent research methods. Drawing from the human rights sphere, this methodology is useful for scholars and citizens using open source data such as social media content, online databases and satellite images.

By the end of this session, participants will be able to identify useful clues in online content, perform reverse image searches and combine satellite information with street-view data.


Data Spatialisation using Python and Blender

Session 1 (Nicholas Masterton)

In the first session we will look at the interface of Blender, and talk about the various workspaces and 3D tools available and how they relate to data visualisation and spatialisation.  From here we will import a geographical shapefile using an addon called Blender GIS and manipulate it to create a 3D height field. We will use a node-based shader to apply a gradient to it.  Through this process we will develop an understanding of how to manipulate objects in 3D space and how to use colour and shading to communicate gradation within the data.

Session 2 (Nicholas Masterton)

In the second session we will look into the Blender text editor, the interactive console, and the system console to understand ways of working with python.  We will look at a script which is able to read a csv (comma-separated values) file. We will look at the process of iterating through columns and rows of data, using python to output a result into 3D space.  This will allow us to develop a methodology for spatialising datasets which are bespoke, hand-crafted, or obscure.


This programme may be subject to changes.

FAQs

What will be the process for creating the project that we are asked to describe on the form? For example, will it be by an individual or in a team? 

Participants will work on a project that they bring to the Data School, rather than working with other Data School participants on a joint project. However, this project might be one which you are working on with a team of others in your organisation or between several organisations.


How many scholarships do you offer? How likely is it to be chosen? In case of receiving it, what type of commitments must the scholarship holders fulfill?

We don’t have a set number of concessionary and bursary places and we make the awards based on our assessment of the quality of the applications and the likely impact that participation in the Data School will have for the individuals involved and their wider social or organisational networks. All participants in the Data School are asked to commit to attending the full event live and to participate as much as possible in the programme. We don’t ask anything additional from participants who have been awarded a bursary.


Would it be possible to know the list of required software in advance, so that institutional IT departments can take care of installation?

We will provide details of software which needs to be pre-installed in advance of the Data School (we will aim to provide this at least a week beforehand. All software is free to use and open source and for those installing software themselves we will provide individual troubleshooting sessions and support remotely if needed.


Is it mandatory to present the finished proposed project?

We don’t require a finished project by the end of the Data School and it isn’t mandatory to present. We encourage all participants to take part in the final plenary and share what they have been working on and/or reflect on what they have learnt during the Data School.


Are there suggested readings and preparation that could be done in advance to support our learning in the program? 

There are no required readings in advance of the Data School. Participants will have access to Moodle (our virtual learning platform) around a week before the School starts and so you will be able to see the resources for most of the modules in advance – you are of course welcome to read through this before the sessions.


Will there be group discussion or support session outside the tabulated schedule? 

We will be running an online social space during the Data School. This can be used by participants to organise informal gatherings for discussion and interaction. The Moodle also includes an email-based chat forum which we encourage participants to use to ask questions and discuss in between the live sessions.


Will the technical background of the applicant be taken into consideration during selection irrespective of the industry they are from? 

Yes, we will look at the applicant’s individual experience and assess their technical knowledge based on what they put in the application form.


In the form I did not see any section to attach files. Is it not necessary to attach a CV, certificate of active student, etc? 

No, there is no need to submit a CV or any certificates.


Should the research projects be based on the primary data collected or can they be based on pre-generated data?

Your project can use primary or secondary data (data that you collect directly yourself, or data that others have collected).


Timings might be an issue given the timezone differences, wouldn’t it possible to follow and deliver the assignments after the live working hours?

We ask that all participants are available to follow the live sessions, rather than catching up afterwards. The sessions are highly interactive and only involve around 20 participants so we prioritise applications from people who are able to attend and engage during the live sessions. Of course we understand that sometimes circumstances beyond your control might mean you have to miss a session, but the expectation is that all participants will be attending live. We do not record sessions.


Might there be a waiting list for the program? Will candidates who are initially not selected have a chance to participate if anyone shortlisted decided that they are unable to join?

Yes we will operate a waiting list. After applications close we will make offers to our first choice of around 20 participants. If they are unable to join the Data School we will offer places to those on the waiting list.


I have already submitted my application, but reading more makes me uncertain I wrote the best application. Can I apply again?

You are welcome to apply again up to the deadline. If you are resubmitting please say so in the form which is your final version.

Apply

How to apply

Fill in the application form by 15 May 2022. You will hear whether your application was successful or not by 24 May 2022.

The Social Data School is application-only with limited places. During your application you should make best use of the free text sections to explain your current experience, and what you would get out of attending the School.

Contact

Dr Anne Alexander

Dr Anne Alexander

Learning Director

Dr Irving Huerta

Dr Irving Huerta

Researcher, Data Schools Convenor

Heather Stallard

Heather Stallard

Communications and Events Coordinator

Cambridge Digital Humanities

Tel: +44 1223 766886
Email enquiries@crassh.cam.ac.uk