Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

949 GitHub url check #998

Merged
merged 2 commits into from
Oct 2, 2023
Merged

949 GitHub url check #998

merged 2 commits into from
Oct 2, 2023

Conversation

ewan-escience
Copy link
Collaborator

Better GitHub repo URL checks

Changes proposed in this pull request:

  • The scrapers now only attempt to scrape a GitHub repo if the URL points to the root of a single repository, reducing the amount of error logs generated.
  • Add a warning to the maintainer page if a GitHub repo URL seems not to be valid.

How to test :

  • docker compose down --volumes && docker compose build --parallel && docker compose up --scale data-generation=0
  • Create a software page, publish it, and enter https://github.com/research-software-directory/RSD-as-a-service/issues as repo URL. This should generate a warning in orange text but should succeed.
  • Run the varous Git scrapers (or wait for them to run):
    • docker compose exec scrapers java -cp /usr/myjava/scrapers.jar nl.esciencecenter.rsd.scraper.git.MainProgrammingLanguages
    • docker compose exec scrapers java -cp /usr/myjava/scrapers.jar nl.esciencecenter.rsd.scraper.git.MainContributors
    • docker compose exec scrapers java -cp /usr/myjava/scrapers.jar nl.esciencecenter.rsd.scraper.git.MainBasicData
    • docker compose exec scrapers java -cp /usr/myjava/scrapers.jar nl.esciencecenter.rsd.scraper.git.MainCommits
  • Login as admin and check that no error logs exist.
  • Check that the errors at http://localhost/api/v1/repository_url make sense and that the scraped_at fields were set.
  • Change the repo URL again to something else invalid, for example https://github.com/research-software-directory/RSD-as-a-service/tree/949-github-url-check and https://github.com/research-software-directory should behave the same as above.
  • Now change the repo URL to be valid: https://github.com/research-software-directory/RSD-as-a-service.
  • Run the scrapers again, now they should run succesfully (check the software page and http://localhost/api/v1/repository_url for the results).

Closes #949

PR Checklist:

  • Increase version numbers in docker-compose.yml
  • Link to a GitHub issue
  • Update documentation
  • Tests

Copy link
Contributor

@dmijatovic dmijatovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@sonarcloud
Copy link

sonarcloud bot commented Oct 2, 2023

[scrapers] SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell B 14 Code Smells

23.2% 23.2% Coverage
0.0% 0.0% Duplication

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint

@sonarcloud
Copy link

sonarcloud bot commented Oct 2, 2023

[rsd-frontend] SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

88.9% 88.9% Coverage
3.4% 3.4% Duplication

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint

@ewan-escience ewan-escience merged commit 1e4cb92 into main Oct 2, 2023
4 of 6 checks passed
@ewan-escience ewan-escience deleted the 949-github-url-check branch October 23, 2023 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Checks for GitHub repository URLs
2 participants