Ideas for POCS final project pieces #14

jwzimmer-zz · 2021-10-13T23:58:02Z

Goal: do things that capitalize on Denis and Tyler's skills while they're available : ) <3

Ideas:

We want to test the stability of the dimensions we're seeing, as in, we want to see that they're a genuine reflection of the space and not an artifact of the specific way we've set things up and chosen to interpret the result.
- One way to do this is to change the matrix we're running SVD on.
  - Simplest way is to drop characters or traits with low numbers of ratings and re-run SVD. We could do this sequentially, e.g. increase the lower threshhold by 10 at a time or something (so, first run it will characters, then run it with all characters with at least 10 ratings for every trait, etc.).
  - How do we show that the dimensions are "the same" (or not)?
    - Are the vectors for the first few dimensions pointing in the same directions? Something about inner products? I don't actually know how to do this but it sounds like it must be fairly straightforward? We probably don't care about magnitude that much but I think if we don't pull in the sigma values they'll be unit vectors (in that space) anyway?
    - We could do some kind of string comparison between a subset of the traits (e.g. highest magnitude)?
      - Alternately, but kind of the same idea, we could use word2vec or similar (or PDS)?
    - All the traits are in every vector, but they can change order by magnitude, so if we ranked them that way we could make an allotaxonomothing showing how they change in importance from one version of the first dimension to another version of the first dimension (etc).
We want to label the main dimensions.
- This is very fun to do using phrases, words, and idioms, but I would like to explore more systematic approaches.
- Word2vec could bring up possible synonyms that are closest to e.g. the highest magnitude traits?
We want to justify paying attention to only the first few dimensions.
We want to examine specific works.
We want to compare to personality models like FFM, Hexaco, the Dark Triad.
We want to explore the vowel space analogy.
Rotating it? Is there another orientation that splits the variance across the first dimensions more evenly?

jwzimmer-zz · 2021-10-15T21:20:27Z

Instructions for Denis & Tyler

The data is from https://openpsychometrics.org/_rawdata/
- Specifically, https://openpsychometrics.org/_rawdata/characters-aggregated.zip
- We dropped the trait differentials that had emojis in them (about 30) and we fixed a few typos
- A dict mapping the trait names ("BAP1, BAP2, ...") to the trait differentials ("happy<->sad", etc.): https://github.com/jwzimmer/tv-tropening/blob/main/June2021_column_dict_original.json
- IIRC I fixed typos in this one: https://github.com/jwzimmer/tv-tropening/blob/main/July2021_cleaned_column_dict.json
- The original data saved as a dataframe: https://github.com/jwzimmer/tv-tropening/blob/main/June2021_df_original.json
- The number of ratings saved as a dataframe: https://github.com/jwzimmer/tv-tropening/blob/main/June2021_df_n_original.json
Currently we're removing 50 from every trait score to normalize them before running SVD
- the 800 characters x 236 trait-differentials matrix: https://github.com/jwzimmer/tv-tropening/blob/main/x_with_theoretical_mean_removed.json
- Output from SVD
  - U: https://github.com/jwzimmer/tv-tropening/blob/main/u_with_theoretical_mean_removed.json
  - d, which is just the non-zero parts of Sigma: https://github.com/jwzimmer/tv-tropening/blob/main/d_with_theoretical_mean_removed.json
  - Sigma: https://github.com/jwzimmer/tv-tropening/blob/main/sig_with_theoretical_mean_removed.json
  - V^T: https://github.com/jwzimmer/tv-tropening/blob/main/v_with_theoretical_mean_removed.json

jwzimmer-zz · 2021-10-22T19:51:13Z

Current thoughts on project options

Math-related
- Justification for the number of dimensions to look at, beyond variance explained
- Justification for or against vowel-space analogy
- Justification for mapping onto PDS space, and potentially other frameworks like FFM
Coding-related
- Nice way to pass [2 characters, 2 groups of characters e.g. all the characters from a given storyverse compared to another, 2 dimensions e.g. 2 versions of the first row of V^T from different subsets of the full matrix] into allotaxonometer and get back plots
- Nice visualization of characters from a storyverse in character space
- Related to above: pick a few specific works and pull in info from IMDB or Gutenberg for a more detailed look... for sure we should include Pride & Prejudice, since there's a lot of info and analysis available we could use (e.g. story wrangler)
- Try to find another dataset that we could run SVD on and compare the resulting dimensions?

More nebulous ideas

I think it would be really cool to translate some literary analysis into something we can look at in character space. For example, in an ideal world, the storyverses would include a bunch of really old Germanic and English texts so we could verify the influence of epics on Tolkien's writing, by seeing if the characters reflect the same locations in PDS space (for example, there's a source in my draft that references specific traits of what Tolkien thought of as "true heroism"... that might be the kind of thing we could find some support for in character space). However I don't think any of the relevant texts are in our dataset... one thing we might do is try to come up with something like this we could check. We might be able to see if works from the same time period or within the same author tend to surface the same sets of traits for their characters?

Related to this: I really like this idea of "translating" the main dimensions we have found from the whole data set into trait adjectives specific to an author, time period, or genre. What I mean is, we know what traits best describe those dimensions over all of the storyverses in the data set, but if we restricted the matrix only to works by a certain author, and then ran SVD, the traits highest in those dimensions would reflect the interpretation of that author. We might need to do some pre-processing like only using the traits their characters scored highest in in the first place, but if the dimensions we're seeing are genuinely universal rather than just arbitrary results from the specific dataset we have, then I think this general concept makes sense.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas for POCS final project pieces #14

Ideas for POCS final project pieces #14

jwzimmer-zz commented Oct 13, 2021

jwzimmer-zz commented Oct 15, 2021

jwzimmer-zz commented Oct 22, 2021

Ideas for POCS final project pieces #14

Ideas for POCS final project pieces #14

Comments

jwzimmer-zz commented Oct 13, 2021

jwzimmer-zz commented Oct 15, 2021

jwzimmer-zz commented Oct 22, 2021