-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Counting and Visualizing CRAN Downloads with packageRank (with Caveats!) - R-hub blog #101
Comments
Very cool post! Another interesting metric is the package pagerank in the dependency graph. See for example the code at the end of this post: https://blog.revolutionanalytics.com/2014/12/a-reproducible-r-example-finding-the-most-popular-packages-using-the-pagerank-algorithm.html |
Thanks! For what it's worth, the name 'packageRank' is a nod to PageRank. But for my purposes, getting a "better" estimate of user (rather than developer) interest in a package, what I actually want is an "inverse" PageRank algorithm, which discounts rather than credits dependencies. A task for the future. |
nice work, I think there might be also some kind of server's bias. IPs that download hundreds and thousands of packages a day, that might not represent real users. |
Thanks! In the current development version of ‘packageRank’ I’ve been working on functions that try to do what you suggest. Among other things, they try to filter out log entries due to CI/unit testing and “unofficial” efforts to mirror CRAN. As far as server bias is concerned, tell me what you have in mind. Something along the lines of people who use RStudio’s CRAN Mirror, which generates the logs used to count package downloads, tend to do more testing, package development, etc. than those who use other mirrors? |
It seems there are IPs that day after day download the same number of packages. IPs are not real (coded), and change arbitrarily, but it seems odd that a fixed number of IPs (x-axis) download a fixed number of packages (y-axis) in different days. (i.e. some peaks of this graph repeat for different days). Seems non-human.
|
Yes. It's not "human". There seems to be a lot of repeated, regularly scheduled scripted downloads (probably due to chron jobs, AWS, Docker, CI/unit testing, unofficial CRAN mirroring, etc.). Because IP addresses are anonymized, identifying the specific culprit is not a trivial exercise (to me, this concern for privacy is perfectly understandable and should be respected). That said, even without that information I think it's still possible to reduce the contribution of automated download on the overall package download count. |
Counting and Visualizing CRAN Downloads with packageRank (with Caveats!) - R-hub blog
https://blog.r-hub.io/2020/05/11/packagerank-intro/
The text was updated successfully, but these errors were encountered: