June 10, 2025
Word Vectors and Topic Modelling
Before the Session
Readings
- “Topic Modeling – Overview” (Walsh, 2021)
- Walsh, Melanie, and Maria Antoniak. “The Goodreads ‘Classics’: A Computational Study of Readers, Amazon, and Crowdsourced Amateur Criticism.” Post45: Peer Reviewed, Apr. 2021. post45.org.
- Read through (skimmming code): “TF-IDF with Scikit-Learn” (Walsh, 2021)
- Soni, Sandeep, et al. “Abolitionist Networks: Modeling Language Change in Nineteenth-Century Activist Newspapers.” Journal of Cultural Analytics, vol. 6, no. 1, Jan. 2021, p. 18841. culturalanalytics.org, https://doi.org/10.22148/001c.18841.
Tutorials
- If you have time: Watch Python for DH lessons 9-12 by William Mattingly.
Hour one: TF-IDF, Word Vectors and Topic Modelling
Comparing the methods
Similarities:
- Both methods are based on co-occurrence patterns
- Both are unsupervised methods in text analysis
But there are also key differences:
Topic Modelling | Word Embeddings |
* Exploratory method for identifying themes across documents in a corpus | * Looks at the relationships between words in a smaller window of context across the corpus |
Readings
Hour two: Voyant Tools
Topic Modelling
Voyant Tools
Let’s spend some time working with Voyant, a web-based suite of tools that help facilitate text analysis and create visualizations.
Content to cover:
- Uploading files
- Including and editing stopwords
You can start by exploring tools that correspond to the methods we’ve been discussing in our recent sessions:
- TF-IDF tool: Document Terms
- Word vector tool:
- Topic modelling tool: Topics
- Words as networks: Collocates Graph and Mandala
- Named entity recognition
Or, you can browse the visualizations and their documentation in the Voyant documentation, on the sidebar under “topics.”
Consider the following when looking at the texts:
- What questions do you have as you explore the visualizations?
- Are the visualizations effective in helping you learn about the texts?
- What stopwords might help or hinder your exploration of the texts?
- Overall, do the tools help you learn anything new or unexpected about your texts?
Further learning
- Text as Data:
- Ch. 7: The Vector Space Model and Similarity Metrics
- Ch. 8: Distributed Representations of Words
- Ch. 13: Topic Models