Break (15 minutes)
Web archiving is a process which can take many forms but most commonly involves making and storing “preserved copies of live web content collected for permanent retention and access”. Practically, this means creating a copy of all of the code behind a webpage and the way that code is displayed at a very specific point in time, with the intention of being able to access that capture of the webpage as-is in the future.
Below are four different archived datatsets. In pairs, go through each dataset and the original page it was located on. Determine whether the dataset has been fully-archived or partially archived. Either way, is there any missing from the original page? Does the dataset have adequate context? Can you tell where the dataset is from and what organization, institution, or agency published it?
Below are four different webpages. In pairs, go through each webpage below and determine whether the page has been fully-archived or only partially archived. If you think it’s only partially-archived, what’s missing? Why might that feature or section be missing? How does that affect your ability to use the webpage?
There are a few platforms that support one-click web archiving. These platforms allow you to enter a link, press a button, and have the page you’d like to archive added to their servers and publicly-available for viewing. Try archiving a page with both the Internet Archive and Perma.cc. You can use the login credentials in this Box note for Perma.cc if you don’t want to make an account.
Next, try archiving a webpage using Conifer. You can use the login credentials in this Box note if you don’t want to make an account. Conifer lets you make both private and public collections, and let’s you repeat capture and patch broken snapshots from your browser.
Because ArchiveBox takes some setup and knowledge of the command line, Kiran’s going to give you a demo of what it looks like and how it works, both the CLI and Web UI.
In preparation for our session on Artificial Intelligence (AI), please read the following: