Datasheets for Datasets: Describing Web Archives Collections Workshop

When: Friday 26 May 2023

Time:   1:30-4pm

Where: British Library, Staff Entrance, Midland Road, NW1 2DB

Capacity: 16 people (Registration closes 17th May)

This workshop explores how web archives collections can be described using the Datasheets for Datasets framework.

Significant work in web archives scholarship focuses on the description and provenance of collections and their data. Looking beyond the worlds of libraries, archives and cultural heritage can provide valuable alternative approaches, which we can experiment with and use. Datasheets for Datasets is a method for describing large datasets from the field of machine learning, which uses a standard set of questions arranged by stages of the data lifecycle.

During this workshop participants will discuss how web archives collections can be described using the Datasheets for Datasets framework. Specifically a datasheets template that is arranged into nine sections. This template asks questions about a dataset, focusing on the specific needs of machine learning researchers. More information on these questions can be found here: https://www.microsoft.com/en-us/research/project/datasheets-for-datasets/

Participants will consider how these questions can be adopted for the purposes of describing web archives datasets. Considering and assessing how each question might be adapted and applied to describe datasets from UK Web Archive curated collections.

After a description of the Datasheets for Datasets framework, there will be a group card-sorting exercise. Each group will evaluate a set of questions using the MoSCoW technique, sorting them into categories of Must, Should, Can’t, and Won’t have. Groups will report back on this task via a facilitated discussion about the priorities and resources available for generating descriptive metadata and documentation for public web archives datasets.

About the Instructors:

Emily Maemura is an Assistant Professor in the School of Information Sciences at the University of Illinois Urbana-Champaign. Her research focuses on data practices and the activities of curation, description, characterization, and re-use of archived web data.

Helena Byrne is the Curator of Web Archives at the British Library. She was the Lead Curator on the IIPC Content Development Group 2022, 2018 and 2016 Olympic and Paralympic collections.

These workshops will be held in-person only due to the format of the activity, they won't be recorded

Sign in to Google to save your progress. Learn more
Name
Institution
Email address
Submit
Clear form
Never submit passwords through Google Forms.
This content is neither created nor endorsed by Google. Report Abuse - Terms of Service - Privacy Policy