June 03, 2025
Introduction to Data and Metadata
Digitization does not equal access. The mere act of creating digital copies of collection materials does not make those materials findable, understandable, or utilizable to our ever-expanding audience of online users. But digitization combined with the creation of carefully crafted metadata can significantly enhance end-user access—and our users are the primary reason we create digital resources.
Martha Buca, in Introduction to Metadata v.3, published by Getty Research Institute (2016).
Pre-Activities
Assignment
Please bring one or two hats to the session today but hide it until we have our exercise relating to the hats. If you do not own a hat or have access to one you can borrow, simply bring any accessory that could be worn on your head. The hat might have an origin story that you wouldn’t mind sharing in class, or can be completely utilitarian. Increased diversity of headgear will make this exercise the most meaningful, so the weirder the hat, the better.
Readings
Briefly skim:
- Wikipedia’s entry on Musical Instrument Classification
- FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration. Apple Product Diagram.
Read:
- “What Metadata Is and Why It Matters,” (2016). Chapter 1 in Metadata: Shaping Knowledge from Antiquity to the Semantic Web, by Richard Gartner
- Humanities Data: A Necessary Contradiction (2015) by Miriam Posner
Optional:
- What do we mean by “Collections As Data” (CAD)? by Cory Lampert & Emily Lapworth
- Dorothy Porter, Archives, and the Preservation of Black Studies, Derrion Arrington, Black Perspectives, African American Intellectual History Association (2024). (See especially the second half of the blog post.)
- “Metadata as Ideology,” (2016). Chapter 5 in Metadata: Shaping Knowledge from Antiquity to the Semantic Web, by Richard Gartner
Session
Defining Your Data in the Humanities
As you work on your projects this summer, you’ll become more and more familiar with the objects before you, how they relate to one another, and what they mean in an assemblage. In order to get a good grasp on your objects from a data perspective, it’s often useful to figure out what you have.
Using a writing implement and a piece of paper, take the next 5 minutes and write down what data you have. How much data do you have (and how much of each type)? What kinds of data do you have (think broadly, like media type, and narrowly, like file type)? Why have you collected the types of data you have?
As we go through the processing of thinking about how we can categorize our collection data through the use of metadata, keep the basics of what data you actually have in mind.
Class Exercise #1: Ontology of Jars
Class Exercise #2: Ontology of Hats
Some Digital Collections to examine
- Hip Hop Party and Event Flyers, Cornell University Library
- Queer Digital History Project
Metadata Application Profiles (MAPs): some examples
- The Mountain West Digital Library MAP
- Cornell Unviersity Library’s Digital Collections Portal Style Guide (Requires login with Cornell netID.)
- DublinCore
- VRACore
Controlled Vocabularies: some examples
- Getty Vocabularies, a collection of controlled vocabularies related to art, artists, cultural objects, and geographic names.
- Homosaurus: An International LGBTQ+ Linked Data Vocabulary
- Thesaurus for Graphic Materials, Library of Congress
Introduction to Collections as Data
As the Collections cohort of this fellowship, you all are working with some sort of data that can be defined as a collection. Given that a collection can mean many different things, let’s explore different kinds of collections. In pairs, take a look at the following collections and consider:
- What makes this dataset or project a collection? What unifies the items in this collection?
- What types of data are in this collection? How can you tell the types of data apart?
- How could you use digital tools or computation to analyse this collection? If you need inspiration, take a peek at our Introduction Collections as Data.
List of Collections:
- Truckee Meadows Regional Planning Agency Oral History Transcripts
- Revisualize Archives
- Greek Medieval Texts
Post-Activities
Daily Comment
Please post a reflection on something you learned today or that you would still like to learn.
Some questions to consider:
- What do you wish you understood better?
- What have you learned today that was or might be especially useful?
- How will you define the universe of objects in your collection?
- How would you like users to interact with those objects?
- What kinds of metadata will you need to allow the types of interactions you want to enable?
- What are some elements and standards that your MAP will include?
Prep Assignment
In preparation for our web programming session tomorrow, please complete the “Build & Deploy Your First Website” section of Scrimba’s Learn HTML & CSS lesson. If you encounter any issues accessing the scrim, please email Kiran to let her know.