Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

formatting wiki pagerank #160

Open
havnar opened this issue Sep 9, 2014 · 0 comments
Open

formatting wiki pagerank #160

havnar opened this issue Sep 9, 2014 · 0 comments

Comments

@havnar
Copy link

havnar commented Sep 9, 2014

I couldn't find any links to the wiki dataset used so I downloaded them from wikimedia.
When I run the pagerank I get weird page titles though, so midway the code I wanted to know what titles were beeing utilised. Is this normal:

(also: where can I find the proper dataset used in the amplab)

scala> vertices.take(50)

res14: Array[(org.apache.spark.graphx.VertexId, String)] = Array((0,""), (0,""), (0,""), (1728454431,* Toby, Marlene. ''A.A. Milne, Author of Winnie-the-Pooh''. Chicago: Childrens Press, 1995. ISBN), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (124,|), (103299066,|honorific-prefix), (117890311,|name), (191986873,|honorific-suffix), (-644639137,|image), (1667503647,|order1), (-188590855,|office1), (1077807430,|term_start1), (-866339411,|term_end1), (-1122312309,|monarch1), (1499463684,|governor-general1), (-1568689980,|predecessor1), (-217292473,|successor1), (1685055658,|birth_date), (708509579,|birth_place), (368907285,|death...

Final output:

printing the top 10 ranked pages:

''7.08: 0.15

color:oligocene bar:NAM21 from:: 0.15

|url = http://books.google.ca/books?id=aQ84ViBNkYwC&lpg=PR1&dq=Michael%20Jordan&pg=PR1#v=onepage&q&f=true|publisher=Greenwood Press |isbn=: 0.15

*Twelve Foot Change: 0.15

''(0.02/.08): 0.15

In mammals and birds, sleep is divided into two broad types: [[rapid eye movement sleep|rapid eye movement]](REM sleep) and [[non-rapid eye movement sleep|non-rapid eye movement]](NREM or non-REM sleep). Each type has a distinct set of physiological and neurological features associated with it. REM sleep is associated with the capability of dreaming.<ref name="National">{National Institute of Neurological Disorders and Stroke. (21 May 2007). Brain basics:: 0.15

*2035: 0.15

commands: 0.15
39411: 0.15

QJT 2½: 0.15

printing the most important page within the subgraph of Wikipedia that mentions Berkeley in the title:

By contrast, [[John von Neumann|von Neumann]] recommended against floating point for the 1951 [[IAS machine]], arguing that fixed point arithmetic was preferable.<ref>{{cite web|url=http://www.cs.berkeley.edu/~wkahan/SIAMjvnl.pdf|title=The: 0.15

Zuse also proposed, but did not complete, carefully rounded floating–point arithmetic that would have included ±∞ and NaNs, anticipating features of IEEE Standard floating–point by four decades.<ref name=kahansiam>{{cite web|url=http://www.cs.berkeley.edu/~wkahan/SIAMjvnl.pdf|title=The: 0.15

* [[Mary Elizabeth Barry|Berry, Mary Elizabeth]]. (2006). ''Japan in Print: Information and Nation in the Early Modern Period.'' Berkeley: University of California Press.: 0.15

* Glahn, Richard Von. (1996). ''Fountain of Fortune: Money and Monetary Policy in China, 1000-1700.'' Berkeley: University of California Press.: 0.15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant