Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logo extraction with no slash #597

Open
dgarijo opened this issue Nov 30, 2023 · 2 comments
Open

Logo extraction with no slash #597

dgarijo opened this issue Nov 30, 2023 · 2 comments
Labels
bug Something isn't working
Milestone

Comments

@dgarijo
Copy link
Collaborator

dgarijo commented Nov 30, 2023

Detected by @tpronk
I think there might be an issue with extracting a logo when there is no slash (/) in the path to the logo. For illustration, below is a snippet of the README.md of the somef-demo-repo, followed by a snippet of the JSON output of SOMEF. Note that logo1.png is not recognized as a logo, but logo_directory/logo2.png is. Same result if I use logo.png and if I don't have the logo_directory/logo2.png in the README.md

README.md

# Image
Images used to illustrate the software component.
![logo1.png](logo1.png)

# Logo
Main logo used to represent the target software component.
![logo2.png](logo_directory/logo2.png)

SOMEF Output

"logo": [
  {
    "result": {
      "type": "Url",
      "value": "https://raw.githubusercontent.com/tpronk/somef-demo-repo/main/logo_directory/logo2.png"
    },
    "confidence": 1,
    "technique": "regular_expression",
    "source": "https://raw.githubusercontent.com/tpronk/somef-demo-repo/main/README.md"
  }
],
"image": [
  {
    "result": {
      "type": "Url",
      "value": "https://raw.githubusercontent.com/tpronk/somef-demo-repo/main/logo1.png"
    },
    "confidence": 1,
    "technique": "regular_expression",
    "source": "https://raw.githubusercontent.com/tpronk/somef-demo-repo/main/README.md"
  }
]
@dgarijo dgarijo added the bug Something isn't working label Nov 30, 2023
@dgarijo dgarijo changed the title Logo extraction with no hash Logo extraction with no slash Nov 30, 2023
@tpronk
Copy link
Contributor

tpronk commented Nov 30, 2023

As I kept adding fields for SOMEF to extract, I discovered this one might be a bit trickier.

  1. This version of the somef-demo-repo yields the same output as above.
  2. In a later version, I added some more fields to extract, including invocation. I also moved logo2 to below the invocation section. This version does not yield any image nor logo. Instead, the markdown containing logo2 is made part of the description field, and logo1 does not appear in the output.
  3. In a yet later version, I added more fields, including package_distribution, related_documentation, and related_papers. Here, logo1 is again extracted into image, and logo2 is again extracted into logo.

@dgarijo
Copy link
Collaborator Author

dgarijo commented Nov 30, 2023

The invocation bit is an error by the classifier (need to improve training corpus). In theory if there is something called "logo" we should consider it a logo. But we may only allow one, hence one being classified as image. Will have to look into it.

@dgarijo dgarijo added this to the v0.9.* milestone Jan 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants