-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capture (and possibly automate) traffic CSVs from readthedocs #14
Comments
@jtpio mentioned Chris Holdgraf's repo metrics notebooks, we can look at those for inspiration. |
Previously I brought up adding a footer like on GitHub: |
This is in progress here: |
@blink1073 Thank you, it's much appreciated! |
Two quick thoughts: Inspiration via jupyter bookIf you want some inspiration, I often use Jupyter Book for this kind of thing. For example, here's a dashboard I've used in the past for tracking activity within the Jupyter ecosystem (it's now out of date so there's an error message but you get the idea): https://chrisholdgraf.com/jupyter-activity-snapshot/jupyter.html#merged-pull-requests That uses papermill to use a Plausible?Historically, we've used Google Analytics to track user behavior across our websites, including docs. This was very useful for things like generating impact reports for grants. We moved away from Google Analytics for privacy reasons, but some folks mentioned that https://plausible.io/ was an attractive alternative that wouldn't have the same concerns.1 Would it be less work if Jupyter self-hosted a plausible instance that generated dashboards for all of the sub-project docs sites? Apologies if this has already been discussed and decided on, just wanted to throw it out there in case it creates an "ah-ha that would be way easier" response. Footnotes
|
@blink1073 AWESOME, thank you! |
@choldgraf These look like just the thing (I was pondering something similar here), thanks for linking. I may contact you further down the line. @blink1073 Also I just noticed and hate to bother you further but there should be a second SEARCH csv for all those sites also if you are able to provide those 😅 |
@blink1073 Geeze this is fantastic, thanks for single handedly knocking this problem out of the park :D |
In case it's helpful @ericsnekbytes: Here are the templates that are used to generate org-specific pages: https://github.com/choldgraf/jupyter-activity-snapshot/tree/main/monthly_update/templates Specifically here's the one that generates the org reports I mentioned before: https://github.com/choldgraf/jupyter-activity-snapshot/blob/main/monthly_update/templates/org_report.ipynb You can see where the templates have variables to be inserted later within You can then generate pages using that template with code like this: path_book = Path("generated/book")
for org in github_orgs:
parameters = dict(github_org=org, n_days=n_days)
path_out = path_book.joinpath(f"{org}.ipynb")
ntbk = pm.execute_notebook(
"./templates/org_report.ipynb",
str(path_out),
parameters=parameters,
nest_asyncio=True,
cwd="./templates/",
)
# Remove the param cell so it doesn't show up
(param_cell,) = [
cell for cell in ntbk.cells if "injected-parameters" in cell.metadata.tags
]
param_cell.metadata.tags.append("remove-cell")
nbs = nbf.writes(ntbk)
nbs = nbs.replace("{{ github_org }}", org)
path_out.write_text(nbs) And then these two github actions are used in the CI/CD to build the pages from a template, and then build the book: - name: Generate book pages with latest data
run: |
papermill --cwd monthly_update monthly_update/run_template.ipynb -
env:
GITHUB_ACCESS_TOKEN: ${{ secrets.ACCESS_TOKEN }}
# Build the book
- name: Build the book
run: |
jb toc from-project monthly_update/generated/book -e .ipynb -e .md -e .rst --guess-titles > monthly_update/generated/book/_toc.yml
jb build monthly_update/generated/book I think that's the core of the logic there. A lot of the code there is very stale which is why I'm trying to point out the details here. If you really wanna get fancy you could also try the new MyST build engine at https://mystmd.org :-) |
I grabbed the stats for the docs I have access to here: https://gist.github.com/minrk/c1df933c520f9a51ee2bf474817a20bb including the notebook I used to get them. It seems the traffic data isn't in the API, so I needed to script it with playwright. |
@choldgraf Thanks for the additional details. @minrk I'll be digging through these and may ping you again for some additional info, thanks for providing these 👍 |
Edit: Check below |
@minrk @blink1073 We need another CSV dump (there's a ticket on RTD that would create an API call for this). We should also make a service account and grant it permissions to download these (since the API call does not exist yet), which I cannot do. I can make the account and add it to the Jupyter password manager, if one of you can grant permissions to it... |
I've updated the data in the gist today. If you create the bot account, I can add it to some projects. |
I'm also happy to add the account to the projects I own. |
@minrk @blink1073 The new GitHub user is @jupyterautomation (jupyterautomation on RTD as well), and should be ready to be added to ReadTheDocs sites :) I've added that account to the Jupyter password manager (and the underlying email address, [email protected]). Thanks! |
Okay, I added it to all the projects I maintain |
Sent all my invitations, too, I think. |
@blink1073 @minrk Thank you! I'll share progress when I've got something up and running. |
ReadTheDocs offers traffic and search stats that Jupyter subprojects can use to direct their docs improvement efforts. Right now, these metrics are not widely used (as indicated by discussions in group meetings) and are not easily accessible (they're locked behind an admin panel). They can be made easily available and usable from a central location so that subprojects can better benefit from the insights they contain.
The text was updated successfully, but these errors were encountered: