notebooks

Labnotes about data exploration and visualization of wikipedia pages corpora.

about the datasets

All the initial content and the produced datasets are stored as csv and gexf are stored in the data folder of each topic folders.

list of geometry topics

This analysis is about the set of pages imported from the wikipedia List of geometry topics page. All the muscle for data retrieval work has been provided using the wekeypedia/python-toolkit/analysis-data-retrieval.py macro.

dataset content

pages: 303
editors: 15857
revisions: 101462

analysis

multivariate analysis: main exploration and results from the the geometry pages dataset
time series: pageviews and revisions time series of all pages
page explorer: a notebook to explore data about individual pages. Very helpfull if you are looking for local relationships. This includes all analytics used by knwoledge path reconstruction
construction of the hyperlink graph: reconstruction of the natural hyperlink network between the corpus pages. Also add an extension to the graph of page names apparitions
reading map based on a reduced graph of hyperlink terms
construction of the pages-editors bi-partite graph: construction of a bi-partite graph made of pages and editors. Also build a page-page graph based on the projection of the previous bi-partite where pages are linked if they share an editor
building a reading map based on a reduced graph of co-edited pages: a first trial at building a stronger reading map based upon a strategy using co-edited pages network instead of hyperlinks network. This include a lot of mid-range lifting involving networkx
index of bots: List of bots active in the corpus

sociology

dataset content

pages: 1239
editors: 81604
revisions: 697011

wisdom

pages: 4
parsing and decoding wikipedia page diffs: a guide about using the wekeypedia toolkit to make sense of diffs
words of love and wisdom: numerical exploration of terms distributions over the corpus pages
find love on wikipedia with NLTK: exploitation of the work on diffs to extract signs of love on wikipedia

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
geometry		geometry
javascript		javascript
sociology		sociology
wisdom		wisdom
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

notebooks

about the datasets

list of geometry topics

dataset content

analysis

sociology

dataset content

wisdom

About

Releases

Packages

Languages

WeKeyPedia/notebooks

Folders and files

Latest commit

History

Repository files navigation

notebooks

about the datasets

list of geometry topics

dataset content

analysis

sociology

dataset content

wisdom

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages