-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create action/workflow for running Harvester periodically #5116
Conversation
steps: | ||
- uses: actions/checkout@v2 | ||
|
||
- uses: pjquirk/harvest-action@main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This workflow should be moved somewhere other than my personal account, to avoid bus-factor issues. Since this is so tied into Linguist, you might prefer having it in this repo under a top-level /actions
folder.
The workflow or action could do more as well, such as creating an issue once an extension crosses some threshold of unique repositories. |
Which parts were adapted, exactly? I don't recognise anything that resembles my own code… Anyway, you might be interested in adding feedback to Feedback and ideas welcome. |
Mostly adapted the ideas/algorithm, I didn't copy any code verbatim since it relied on being run in the browser rather than using the REST APIs.
Good call, I didn't see that issue. I definitely don't have the context for this script that you do, and my implementation is admittedly more focused on bulk extension monitoring rather than individual ones. I went with a rewrite since I didn't think that version was still being maintained, and it was a day of learning project around running the script over dozens of extensions :D |
Reporting extension usage is the primary use-case for the tool, and I plan on having an option to select output formats (markdown for GitHub discussion copy+pasta, null- or tab-separated values for easier CLI wrangling, etc). More ambitious options include mass-downloading of collected URLs and reusing locally-cached results. Having some way of going through search results to identify unrelated formats would greatly simplify the task of verifying a candidate language addition. Don't hesitate to suggest other functions that relate to extension/format research, since now's the time for that sort of discussion.
It's only updated whenever a change to GitHub's front-end code forces me to update the CSS selectors used to target relevant page elements (which in itself is rather tenuous, given the utter lack of ID attributes and semantic markup 😢). As a sidenote, if I haven't archived a repository, it means it's maintained (even if it's not been updated in a while). Anyway, enough rambling. I won't derail this thread any further. 👍 |
I've been thinking about this and the one thing that's kinda niggling to me is the fact we've got to keep updating the What would be really cool is if the action automatically ran against PRs that are adding a new language, possibly based on a label, using a field or comment in the PR. |
@lildude I agree that'd be cool, though given existing PRs I don't see a great way to do that. I could take the diff of the You could require via template some JSON or other data in the PR body that contains the same data as the candidates file above, and then make the action a check on the PR (I talked about that here), assuming the metrics can be defined. I didn't want to suggest that immediately as that's a bit of a workflow change for your team, but I'd be happy to modify the action to do that. |
Yeah, I was thinking that maybe parsing a value from the template or a comment would do the trick. We can always go back and add it to already open PRs and manually trigger the workflow. |
* Add CITATION.cff as YAML filename * Add CITATION.cff sample * Add CITATION variants to documentation * Add CITATION(S) as plaintext filename Co-authored-by: Colin Seymour <[email protected]>
This adds a workflow that runs an adapted version of "Harvester" to collect the number of hist/repos for a given extension/query.
Description
Keeping track of the usage of a language is heavily manual process, see for example #4219. This monitoring is done by a tool known as Harvester, which uses the github.com search page to look for the total number of hits across unique repositories.
This workflow uses an action that does the same thing, using the actual REST APIs instead of the browser, which allows us to more easily automate the work. The action expects a JSON file that contains the search terms to process, and generates a markdown file with the results.
Example input:
Example output:
This came out of a discussion in a PR.
Checklist:
N/A
: I have added or updated the tests for the new or changed functionality.