-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Label the darn axes, NO BAD IDEAS #12
Comments
Want to make: lists/ word clouds based on the traits which have the most positive, most neutral, and most negative weights in each of the first 3 dimensions -- this should lead to a D&D style alignment chart (3x3) which will hopefully show a clear pattern? Maybe also do with characters? Pseudocode:
Using this tutorial: https://towardsdatascience.com/how-to-make-word-clouds-in-python-that-dont-suck-86518cdcb61f Saving visualizations here as I make them so I can hopefully tell/ remember what they are in the future: https://docs.google.com/presentation/d/1_kc36iI6B2OmsZlbMxLB0xiT0ePaQQ2qh7NykKqsefI/edit?usp=sharing I'm using this function in the file nextstep.py to make very basic word clouds, giving the wordcloud python package the scores for each trait in each row of V as if they were "frequencies" even though they are not and letting the built in function generate_from_frequencies interpret that however it may.
This seems to cause spyder to crash pretty often, so I made it lower quality and take fewer words:
For the first 6 rows of V, that results in: For reference, the weights in the relevant sigma matrix are as follows, they get pretty small by at least e.g. 15 in: [4571.60069027, 3977.77079978, 3148.95421275, 2330.72490479, Moving on to a more D-D like chart...
Call like this for e.g. the 3rd row of V with means removed: d1_2,d2_2,d3_2 = make_dd_wordcloud_dicts(V2,2,col2), then do simple_wordcloud(d3_2) (etc.). Produces this for the first 3 rows of V with means removed, dividing the traits into ordered 3rds... the first 89 with the most negative/ least positive, then the next 89 middle words, then the final 88 most positive words, those get put into 3 dicts by the above function; the way they are ordered in the chart is the opposite, the MOST POSITIVE at the top, etc. Maybe that isn't a good way to do that, because the spread isn't the same for every row of V, so I should come up with a new strategy for that? |
Other visualization ideas
|
Meeting with Dodds notes
|
Looking at how the values in the rows of V2 (V^T) are distributed (per above comment/ conversation), using the version of V2 from running SVD with the overall mean (49.65 ish) removed via e.g. Actually I think this might be easier to see as a scatter plot? My Interpretation
|
Looking at trait magnitude in the rows of V (V2), with the overall mean removed --> 1c5699d Using this code to render the bar charts:
Interpretation: there do seem to be some patterns in the lower dimensions with "outlying" traits, e.g. a sort of leadership style component (captain<->first-mate seems to come up a lot) and a physical component (thick<->thin and tall<->short). And, in the lower dimensions, gender, sexuality, and procreation seem to come up a lot. To make a chart for a specific row of V, use |
Meeting with Dodds
|
To do:
|
Notes from talking with Dodds Oct 12
|
From trying to come up with what visuals I want in the paper, it has become clear I absolutely can't avoid labeling the axes anymore. I keep not doing it because I'm worried I'll do it wrong. So this is the No Bad Ideas version. If it's stupid I'm sure Dodds will let me know.
Basic idea: Which traits are most important to each "dimensions"? That will be those traits which have the most extreme weights in each ROW of V. Which characters best exemplify each "dimension"? That will be the characters which have the most extreme weights in each COLUMN of U. How much more important is the first "dimension" compared to the second? That is given by the relevant WEIGHT in Sigma.
The text was updated successfully, but these errors were encountered: