adapted from Jorge Moraleda

Turning it in: You may do this assignment alone or in a group of 2–3 people. Turn in just a write-up as a pdf (just one write-up per group).


This assignment involves analyzing dialogue text. Dialogue text is common on forums, chat, or phone transcripts. For this assignment, however, you’ll analyze a dialogue from media: a play, movie, TV show, podcast, or similar. Specifically, you’ll select your favorite dialogue script for which you can get the text, analyze it in some interesting way (using whatever software you please), and write up a report. The report should describe the methods, the results, and your interpretation, as well as some visual representation of (at least some of) the results.


The grading will be based on:

  • How interesting are your questions?
  • Are your methods appropriate to your questions?
  • Did you implement/interpret your methods correctly?
  • How extensive is your analysis?
  • How useful / appealing are the visualizations you present?
  • Is there enough interpretation / conclusions, or is the report mostly just raw facts?
  • Is the interpretation too speculative, and could use more facts / analysis to back it up? (Though some speculation is encouraged, so long as it’s clear that it’s speculative!)


There are no restrictions on or requirements for what you might want to analyze. To help get you started thinking, a few ideas include:

  • What are the topics of the script? You could make a break down by character. Do these topics evolve over time (e.g., by act)?
  • Who talks (spends time) with whom? Does it change over time? Which fraction of the talk does each speaker contribute?
  • What is the mood of each speaker (e.g., the average sentiment of the words they utter)? Does it change over time? Does it depend on who they are talking to or who or what they are talking about?
  • Who or what does each speaker talk about? What is the sentiment of the speaker about each entity (i.e., what is the sentiment of the words that appear near them in the dialogue?). You will probably want to do coreference resolution (to identify what pronouns matches what noun) when identifying who talks about what or whom. In dialogs, participants agree about the antecedents of pronouns, so you may want to process consecutive utterances from the various dialogue participants as a single unit of text for the purpose of coreference resolution.
  • What are the Named Entities (e.g., people, places, organizations) that appear in your script? Are there any generalizations about when or where they appear?
  • What are the similarities between the characters in the script (e.g., defined in terms of vector similarity between each character’s ‘corpus’? Which characters are the most and least similar to each other, and do these results have an intuitive explanation?

Have fun!