June 10, 2025

Word Vectors and Topic Modelling

Before the Session

“Topic Modeling – Overview” (Walsh, 2021)
Walsh, Melanie, and Maria Antoniak. “The Goodreads ‘Classics’: A Computational Study of Readers, Amazon, and Crowdsourced Amateur Criticism.” Post45: Peer Reviewed, Apr. 2021. post45.org.
Read through (skimmming code): “TF-IDF with Scikit-Learn” (Walsh, 2021)
Soni, Sandeep, et al. “Abolitionist Networks: Modeling Language Change in Nineteenth-Century Activist Newspapers.” Journal of Cultural Analytics, vol. 6, no. 1, Jan. 2021, p. 18841. culturalanalytics.org, https://doi.org/10.22148/001c.18841.

Similarities:

But there are also key differences:

Topic Modelling	Word Embeddings
* Exploratory method for identifying themes across documents in a corpus	* Looks at the relationships between words in a smaller window of context across the corpus

Let’s spend some time working with Voyant, a web-based suite of tools that help facilitate text analysis and create visualizations.

Content to cover:

You can start by exploring tools that correspond to the methods we’ve been discussing in our recent sessions:

Or, you can browse the visualizations and their documentation in the Voyant documentation, on the sidebar under “topics.”

Consider the following when looking at the texts:

What questions do you have as you explore the visualizations?
Are the visualizations effective in helping you learn about the texts?
What stopwords might help or hinder your exploration of the texts?
Overall, do the tools help you learn anything new or unexpected about your texts?

Text as Data:
- Ch. 7: The Vector Space Model and Similarity Metrics
- Ch. 8: Distributed Representations of Words
- Ch. 13: Topic Models