Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codebook/ what are all the things #13

Open
jwzimmer-zz opened this issue Oct 27, 2020 · 11 comments
Open

Codebook/ what are all the things #13

jwzimmer-zz opened this issue Oct 27, 2020 · 11 comments

Comments

@jwzimmer-zz
Copy link
Owner

It probably is a good idea to keep track of what everything is so we don't forget, like Prof Cheney said this morning.

@jwzimmer-zz
Copy link
Owner Author

So far we are using the titles given to the trope pages as unique identifiers of a trope. We're storing them as strings in lists or dicts. We're assuming the filename of the trope article represents that article's title.

(1) https://github.com/jwzimmer/tv-tropes/blob/main/trope_list/tropes/tropes_dict.json: single list comprising a dict for every trope in https://github.com/jwzimmer/tv-tropes/tree/main/trope_list/tropes which has every trope that trope links to on its article page
(2) https://github.com/jwzimmer/tv-tropes/tree/main/trope_list/tropes: "master list" of all the tropes... the TV Tropes community made this list of every article that was (I believe manually) tagged as a trope, so we used that as the consensus list of all tropes. There are lots of other articles and things that could be considered tropes on the website.
(3) https://github.com/jwzimmer/tv-tropes/tree/main/Indices: a folder with different indices in it - iirc I downloaded these manually to help check that nothing we cared about was missing when we used wget to scrape the site.
(4) https://github.com/jwzimmer/tv-tropes/blob/main/indextree.py: script for pulling the names of the tropes listed on each index page (used on https://github.com/jwzimmer/tv-tropes/tree/main/Indices)
(5) https://github.com/jwzimmer/tv-tropes/blob/main/individualtropepage.py: script for pulling the names of the tropes linked to within each individual trope page (used on https://github.com/jwzimmer/tv-tropes/tree/main/trope_list/tropes)
(6) files starting with linked_trope_dict: a file for each dict for every trope in https://github.com/jwzimmer/tv-tropes/tree/main/trope_list/tropes which has every trope that trope links to on its article page, HOPEFULLY equivalent to (1)
(7) Files starting with txt_dict: a file for the tropes listed within an index (from 3).

@jwzimmer-zz
Copy link
Owner Author

Turns out (6) is definitely not equivalent to (1), we interpreted these things differently:
(1) is everytime any trope in the masterlist links to any other trope in the masterlist anywhere on the page
(6) is what links are embedded in the text of the article for each trope in the masterlist

But that might turn out to be good - it complicates sanity checking a little, but allows us to compare what the difference is between what they (community of contributors) explicitly think of as a related trope vs. what they relate the trope to while writing about it.

@jwzimmer-zz
Copy link
Owner Author

Relevant to issue #7 too:

ttvtropestructure

@jwzimmer-zz
Copy link
Owner Author

Now my script in the pic above also captures links that are in lists in the main article, not just links in the paragraphs.

@jwzimmer-zz
Copy link
Owner Author

After more discussion, comparison, etc., we have decided: for our purposes, we're defining "trope" as a page in https://github.com/jwzimmer/tv-tropes/tree/main/trope_list/tropes (this is the list they've identified as being tropes here https://tvtropes.org/pmwiki/pagelist_having_pagetype_in_namespace.php?n=Main&t=trope ... via https://tvtropes.org/pmwiki/pmwiki.php/Administrivia/NotATrope> https://tvtropes.org/pmwiki/pmwiki.php/Main/Trope > https://tvtropes.org/pmwiki/pmwiki.php/Main/Tropes > https://tvtropes.org/pmwiki/pagelist_having_pagetype_in_namespace.php?n=Main&t=trope)

all the pages in that folder, so equivalent masterlist, in: https://github.com/jwzimmer/tv-tropes/blob/main/in_Masterlist.json

all the pages in the Main folder, so containing tropes not in the masterlist, metatropes, indices, and other article types, in: https://github.com/jwzimmer/tv-tropes/blob/main/in_pmwiki_Main.json

@jwzimmer-zz
Copy link
Owner Author

Description of the dicts that are the links within each trope article: #12 (comment)

@jwzimmer-zz
Copy link
Owner Author

jwzimmer-zz commented Oct 28, 2020

A gml file (for gephi) of the network in the Sister Tropes page (https://tvtropes.org/pmwiki/pmwiki.php/Main/SisterTrope) - there are unweighted, undirected edges between every pair of tropes given as "sister tropes" in the Examples section of the page - only including tropes that are in the trope masterlist:
https://github.com/jwzimmer/tv-tropes/blob/main/sistertropes_inmasterlist.gml

(this version has all the links given in the examples section, whether they're tropes from the masterlist or not: https://github.com/jwzimmer/tv-tropes/blob/main/sistertropes.gml)

@jwzimmer-zz
Copy link
Owner Author

A gml file (for gephi) of the network in the Super Tropes page (https://tvtropes.org/pmwiki/pmwiki.php/Main/SuperTrope), there are unweighted, undirected edges between a "Super Trope" root node and each example given in the "samples" section of the page, and then from each example their listed subtropes. The edgelist is listed in #16 (comment). (I included super tropes NOT in the trope master list; I did not include sub tropes that were not in the trope master list)
: https://github.com/jwzimmer/tv-tropes/blob/main/supertropes.gml

@jwzimmer-zz
Copy link
Owner Author

@jwzimmer-zz
Copy link
Owner Author

The list of all the tropes (in the masterlist) and the tropes they link to: github.com/jzimmer/tv-tropes/all-tropes-with-links.json

@jwzimmer-zz
Copy link
Owner Author

jwzimmer-zz commented Jan 22, 2021

For imperfect answers to questions from the Datasheets for Datasets paper (https://arxiv.org/abs/1803.09010), see jwzimmer-zz/tv-tropening#3 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant