Skip to content

Commit

Permalink
Articles plagiarism check implementation (#451)
Browse files Browse the repository at this point in the history
* finished hono service

* started coding a plagiarism-check github workflow

* finished workflow for plagiarism-check service

* refactored interface declarations for api results

* added a comment to clarify code

* rewrote github workflow, small changes to plagiarism check tool

* don't send the sentence back if it had no matches

* properly handle case where there are no results

* Fixed empty results appending to final json file

* added formatting to action output

* Fixed formatting

* tidied up the workflow file a bit

* refactored promise all to avoid possible race conditions

* added filtering out headers and complex sentence splitting, removed 10 character minimum limit for a sentence

* plagiarism percent formula fix

* added setup instructions

* returned permission check for github action

* fixed logic to not include sentences with no matches in the response

* rounded up results percentage to 2 digits after the decimal point

* moved formatting logic from githib actions to worker, fixed some formatting issues

* cleaned up github actions, removed formatting logic

* fixed permission check and simplified worker
  • Loading branch information
kol3x authored Oct 11, 2024
1 parent cb72207 commit e691468
Show file tree
Hide file tree
Showing 8 changed files with 1,592 additions and 0 deletions.
61 changes: 61 additions & 0 deletions .github/workflows/plagiarism-check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
on:
issue_comment:
types: [created]

permissions:
contents: read
issues: read
pull-requests: write

jobs:
permission-check-job:
runs-on: ubuntu-latest
if: |
github.event.issue.pull_request &&
contains(github.event.comment.body, '/plagiarismcheck')
outputs:
permission: ${{ steps.permissions-check.outputs.defined }}
steps:
- name: Check for Secret availability
id: permissions-check
shell: bash
run: |
echo "defined=${{ contains(fromJSON(secrets.WIKI_REVIEWERS), github.actor) }}" >> $GITHUB_OUTPUT;

plagiarism-check:
runs-on: ubuntu-latest
name: "Checks a new article from a PR for plagiarism"
needs: [ permission-check-job ]
if: needs.permission-check-job.outputs.permission == 'true'
env:
GH_TOKEN: "${{ secrets.GITHUB_TOKEN }}"

steps:
- name: Check out repository
uses: actions/checkout@v4

- name: Go to PR files
run: gh pr checkout "${{ github.event.issue.number }}"

- name: Save article contents
run: |
pr_number="${{ github.event.issue.number }}"
file_path="$(gh pr diff --name-only $pr_number | grep '\.md' | head -n 1)"
if [ -n "$file_path" ]; then
cat "$file_path" > article.txt
else
gh pr comment "${{ github.event.issue.number }}" --body "No .md file found in the PR."
exit 1
fi
- name: Check for plagiarism
run: |
content="$(cat article.txt)"
escaped_content=$(jq -Rs . <<<"$content")
result="$(curl -X POST "${{ secrets.WORKER_URL }}" -H "Content-Type: application/json" -d "{\"text\": $escaped_content}")"
echo "$result" > results.txt
- name: Format and post response
run: |
response=$(cat results.txt)
results=$(echo "$response" | jq -r '.results')
gh pr comment "${{ github.event.issue.number }}" --body "$results"
33 changes: 33 additions & 0 deletions tools/plagiarism-checker/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# prod
dist/

# dev
.yarn/
!.yarn/releases
.vscode/*
!.vscode/launch.json
!.vscode/*.code-snippets
.idea/workspace.xml
.idea/usage.statistics.xml
.idea/shelf

# deps
node_modules/
.wrangler

# env
.env
.env.production
.dev.vars

# logs
logs/
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*
lerna-debug.log*

# misc
.DS_Store
28 changes: 28 additions & 0 deletions tools/plagiarism-checker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Plagiarism Checker

This service does plagiarism evaluation throw a Cloudflare Worker.

### Setup

- Get Google API key and search engine ID from [here](https://developers.google.com/custom-search/v1/overview#api_key)

- Set up wrangler.toml according to your Cloudflare credentials and add two of following enviromental variables:

- **GOOGLE_SEARCH_ENGINE_CX**

- **GOOGLE_API_KEY**

- Instal dependencies and deploy Worker

```bash
npm i
npm run deploy
```

- Save a deployed worker URL.

- Add a **WORKER_URL** enviromental variable to your repository secrets, so Github Actions can access the service.

### Usage

Leave a comment with *"/plagiarismcheck"* in a pull request with new article to activate bot.
Loading

0 comments on commit e691468

Please sign in to comment.