Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas for POCS final project pieces #14

Open
jwzimmer-zz opened this issue Oct 13, 2021 · 2 comments
Open

Ideas for POCS final project pieces #14

jwzimmer-zz opened this issue Oct 13, 2021 · 2 comments

Comments

@jwzimmer-zz
Copy link
Owner

Goal: do things that capitalize on Denis and Tyler's skills while they're available : ) <3

Ideas:

  • We want to test the stability of the dimensions we're seeing, as in, we want to see that they're a genuine reflection of the space and not an artifact of the specific way we've set things up and chosen to interpret the result.
    • One way to do this is to change the matrix we're running SVD on.
      • Simplest way is to drop characters or traits with low numbers of ratings and re-run SVD. We could do this sequentially, e.g. increase the lower threshhold by 10 at a time or something (so, first run it will characters, then run it with all characters with at least 10 ratings for every trait, etc.).
      • How do we show that the dimensions are "the same" (or not)?
        • Are the vectors for the first few dimensions pointing in the same directions? Something about inner products? I don't actually know how to do this but it sounds like it must be fairly straightforward? We probably don't care about magnitude that much but I think if we don't pull in the sigma values they'll be unit vectors (in that space) anyway?
        • We could do some kind of string comparison between a subset of the traits (e.g. highest magnitude)?
          • Alternately, but kind of the same idea, we could use word2vec or similar (or PDS)?
        • All the traits are in every vector, but they can change order by magnitude, so if we ranked them that way we could make an allotaxonomothing showing how they change in importance from one version of the first dimension to another version of the first dimension (etc).
  • We want to label the main dimensions.
    • This is very fun to do using phrases, words, and idioms, but I would like to explore more systematic approaches.
    • Word2vec could bring up possible synonyms that are closest to e.g. the highest magnitude traits?
  • We want to justify paying attention to only the first few dimensions.
  • We want to examine specific works.
  • We want to compare to personality models like FFM, Hexaco, the Dark Triad.
  • We want to explore the vowel space analogy.
  • Rotating it? Is there another orientation that splits the variance across the first dimensions more evenly?
@jwzimmer-zz
Copy link
Owner Author

Instructions for Denis & Tyler

@jwzimmer-zz
Copy link
Owner Author

Current thoughts on project options

  • Math-related
    • Justification for the number of dimensions to look at, beyond variance explained
    • Justification for or against vowel-space analogy
    • Justification for mapping onto PDS space, and potentially other frameworks like FFM
  • Coding-related
    • Nice way to pass [2 characters, 2 groups of characters e.g. all the characters from a given storyverse compared to another, 2 dimensions e.g. 2 versions of the first row of V^T from different subsets of the full matrix] into allotaxonometer and get back plots
    • Nice visualization of characters from a storyverse in character space
    • Related to above: pick a few specific works and pull in info from IMDB or Gutenberg for a more detailed look... for sure we should include Pride & Prejudice, since there's a lot of info and analysis available we could use (e.g. story wrangler)
    • Try to find another dataset that we could run SVD on and compare the resulting dimensions?

More nebulous ideas

I think it would be really cool to translate some literary analysis into something we can look at in character space. For example, in an ideal world, the storyverses would include a bunch of really old Germanic and English texts so we could verify the influence of epics on Tolkien's writing, by seeing if the characters reflect the same locations in PDS space (for example, there's a source in my draft that references specific traits of what Tolkien thought of as "true heroism"... that might be the kind of thing we could find some support for in character space). However I don't think any of the relevant texts are in our dataset... one thing we might do is try to come up with something like this we could check. We might be able to see if works from the same time period or within the same author tend to surface the same sets of traits for their characters?

Related to this: I really like this idea of "translating" the main dimensions we have found from the whole data set into trait adjectives specific to an author, time period, or genre. What I mean is, we know what traits best describe those dimensions over all of the storyverses in the data set, but if we restricted the matrix only to works by a certain author, and then ran SVD, the traits highest in those dimensions would reflect the interpretation of that author. We might need to do some pre-processing like only using the traits their characters scored highest in in the first place, but if the dimensions we're seeing are genuinely universal rather than just arbitrary results from the specific dataset we have, then I think this general concept makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant