June 10, 2025

Maintenance and Stewardship

Pre-Activities

Reflection Post

  1. See the discussion page for the day’s written reflection assignment.
  2. If you haven’t completed the June 9 investigation and reflection, please do so.

Readings:

Session Activities

Discussion

The readings talked about three main themes: the impermanence of online content, sustainable/minimal computing, and the static web. Here are a couple of questions we can work through together:

  • Did anything surprise you about the Pew reading? Why or why not?
  • The Pew study gives a lot of data about how much content is disappearing. But it doesn’t say why this is disappearing. Any guesses or ideas about why online content disappears? What factors do you think contribute to whether something slips from the web or not?
  • Wikle and Williamson introduce the idea of planning for neglect — that is, anticipating that a research project will likely have tons of upfront work, and then largely be forgotten afterward. What do you think about this approach?
  • If what Wikle and Williamson say about the minimal and sustainable maintenance of static sites is true, why do you think dynamic sites (like Wix, Wordpress, Squarespace, etc.) are still so popular?

Making a Timeline

Click on this link to a spreadsheet. When prompted, make a copy of the spreadsheet in your Google Drive. Now from this spreadsheet, create a public-facing interactive timeline. Yes, these are all the instructions.

Let’s debrief: how many of you successfully launched a timeline? Why or why not? For those who did, how did you figure it out?

Let’s try again. This time, follow this link to create a timeline. Follow the steps in the documentation (1-4). When you’re done, raise, your hand.

Developing a Maintenance Plan

One of the most important steps for ensuring that your project survives is to develop a maintenance plan BEFORE you get started building your project. As Wilke and Williamson underscore, certain technologies and tools are better suited to smaller teams with less resources and time for maintenance, while others will require a lot of constant, hands-on work to ensure that the content and functionality is preserved in the long-term.

In this time, consider the following questions in relation to your project data and write them down (on paper or on your computer):

  • What are your central research questions?
    • Looking at your data outline from Day 2, what pieces or parts of your data are most essential to answering these questions?
    • Go through each type of data you have in your project. Rank these data types my high priority to low priority for answering your research questions. If you’re having trouble ranking them, write down why!
  • What file format(s) is your data stored in? If you have multiple file formats, how might this impact preserving or maintaining your data?
  • How do you currently access your data (in folders, in a Google Drive, on your computer desktop, etc.)?
    • Do you have multiple access points for your data, or only one way to access your data?
  • What permissions might you need to use your data? What permisssions might you give to others to use your data?
    • Will you need to renew or check-in about these permissions, or will they remain the same?
    • Do you own each piece of data, or does someone else own or provide some of your data?
  • Do you want your data to be fully public or somewhat private?
    • Does your data contain any personally-identifiable or sensitive information?
  • Will you need long-terms support from anyone else (a software company, a specific platform, a group of contributors, crowd-sourcing, etc.) to support preserving and sharing your data?



Break (15 minutes)


Mini Lecture: Web Archiving

Web archiving is a process which can take many forms but most commonly involves making and storing “preserved copies of live web content collected for permanent retention and access”. Practically, this means creating a copy of all of the code behind a webpage and the way that code is displayed at a very specific point in time, with the intention of being able to access that capture of the webpage as-is in the future.

Exploring Archived Datasets

Below are four different archived datatsets. In pairs, go through each dataset and the original page it was located on. Determine whether the dataset has been fully-archived or partially archived. Either way, is there any missing from the original page? Does the dataset have adequate context? Can you tell where the dataset is from and what organization, institution, or agency published it?

Exploring Archived Pages

Below are four different webpages. In pairs, go through each webpage below and determine whether the page has been fully-archived or only partially archived. If you think it’s only partially-archived, what’s missing? Why might that feature or section be missing? How does that affect your ability to use the webpage?

Archiving Pages Using the Internet Archive, Perma.cc, ArchiveBox, and Conifer

There are a few platforms that support one-click web archiving. These platforms allow you to enter a link, press a button, and have the page you’d like to archive added to their servers and publicly-available for viewing. Try archiving a page with both the Internet Archive and Perma.cc. You can use the login credentials in this Box note for Perma.cc if you don’t want to make an account.

Next, try archiving a webpage using Conifer. You can use the login credentials in this Box note if you don’t want to make an account. Conifer lets you make both private and public collections, and let’s you repeat capture and patch broken snapshots from your browser.

ArchiveBox Demo

Because ArchiveBox takes some setup and knowledge of the command line, Kiran’s going to give you a demo of what it looks like and how it works, both the CLI and Web UI.

Post-Activities

Readings

In preparation for our session on Artificial Intelligence (AI), please read the following: