-
Notifications
You must be signed in to change notification settings - Fork 70
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Articles plagiarism check implementation (#451)
* finished hono service * started coding a plagiarism-check github workflow * finished workflow for plagiarism-check service * refactored interface declarations for api results * added a comment to clarify code * rewrote github workflow, small changes to plagiarism check tool * don't send the sentence back if it had no matches * properly handle case where there are no results * Fixed empty results appending to final json file * added formatting to action output * Fixed formatting * tidied up the workflow file a bit * refactored promise all to avoid possible race conditions * added filtering out headers and complex sentence splitting, removed 10 character minimum limit for a sentence * plagiarism percent formula fix * added setup instructions * returned permission check for github action * fixed logic to not include sentences with no matches in the response * rounded up results percentage to 2 digits after the decimal point * moved formatting logic from githib actions to worker, fixed some formatting issues * cleaned up github actions, removed formatting logic * fixed permission check and simplified worker
- Loading branch information
Showing
8 changed files
with
1,592 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
on: | ||
issue_comment: | ||
types: [created] | ||
|
||
permissions: | ||
contents: read | ||
issues: read | ||
pull-requests: write | ||
|
||
jobs: | ||
permission-check-job: | ||
runs-on: ubuntu-latest | ||
if: | | ||
github.event.issue.pull_request && | ||
contains(github.event.comment.body, '/plagiarismcheck') | ||
outputs: | ||
permission: ${{ steps.permissions-check.outputs.defined }} | ||
steps: | ||
- name: Check for Secret availability | ||
id: permissions-check | ||
shell: bash | ||
run: | | ||
echo "defined=${{ contains(fromJSON(secrets.WIKI_REVIEWERS), github.actor) }}" >> $GITHUB_OUTPUT; | ||
|
||
plagiarism-check: | ||
runs-on: ubuntu-latest | ||
name: "Checks a new article from a PR for plagiarism" | ||
needs: [ permission-check-job ] | ||
if: needs.permission-check-job.outputs.permission == 'true' | ||
env: | ||
GH_TOKEN: "${{ secrets.GITHUB_TOKEN }}" | ||
|
||
steps: | ||
- name: Check out repository | ||
uses: actions/checkout@v4 | ||
|
||
- name: Go to PR files | ||
run: gh pr checkout "${{ github.event.issue.number }}" | ||
|
||
- name: Save article contents | ||
run: | | ||
pr_number="${{ github.event.issue.number }}" | ||
file_path="$(gh pr diff --name-only $pr_number | grep '\.md' | head -n 1)" | ||
if [ -n "$file_path" ]; then | ||
cat "$file_path" > article.txt | ||
else | ||
gh pr comment "${{ github.event.issue.number }}" --body "No .md file found in the PR." | ||
exit 1 | ||
fi | ||
- name: Check for plagiarism | ||
run: | | ||
content="$(cat article.txt)" | ||
escaped_content=$(jq -Rs . <<<"$content") | ||
result="$(curl -X POST "${{ secrets.WORKER_URL }}" -H "Content-Type: application/json" -d "{\"text\": $escaped_content}")" | ||
echo "$result" > results.txt | ||
- name: Format and post response | ||
run: | | ||
response=$(cat results.txt) | ||
results=$(echo "$response" | jq -r '.results') | ||
gh pr comment "${{ github.event.issue.number }}" --body "$results" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# prod | ||
dist/ | ||
|
||
# dev | ||
.yarn/ | ||
!.yarn/releases | ||
.vscode/* | ||
!.vscode/launch.json | ||
!.vscode/*.code-snippets | ||
.idea/workspace.xml | ||
.idea/usage.statistics.xml | ||
.idea/shelf | ||
|
||
# deps | ||
node_modules/ | ||
.wrangler | ||
|
||
# env | ||
.env | ||
.env.production | ||
.dev.vars | ||
|
||
# logs | ||
logs/ | ||
*.log | ||
npm-debug.log* | ||
yarn-debug.log* | ||
yarn-error.log* | ||
pnpm-debug.log* | ||
lerna-debug.log* | ||
|
||
# misc | ||
.DS_Store |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Plagiarism Checker | ||
|
||
This service does plagiarism evaluation throw a Cloudflare Worker. | ||
|
||
### Setup | ||
|
||
- Get Google API key and search engine ID from [here](https://developers.google.com/custom-search/v1/overview#api_key) | ||
|
||
- Set up wrangler.toml according to your Cloudflare credentials and add two of following enviromental variables: | ||
|
||
- **GOOGLE_SEARCH_ENGINE_CX** | ||
|
||
- **GOOGLE_API_KEY** | ||
|
||
- Instal dependencies and deploy Worker | ||
|
||
```bash | ||
npm i | ||
npm run deploy | ||
``` | ||
|
||
- Save a deployed worker URL. | ||
|
||
- Add a **WORKER_URL** enviromental variable to your repository secrets, so Github Actions can access the service. | ||
|
||
### Usage | ||
|
||
Leave a comment with *"/plagiarismcheck"* in a pull request with new article to activate bot. |
Oops, something went wrong.