Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Badlist normalization is inconsistently used and causes misses #272

Closed
kam193 opened this issue Oct 1, 2024 · 2 comments
Closed

Badlist normalization is inconsistently used and causes misses #272

kam193 opened this issue Oct 1, 2024 · 2 comments
Assignees
Labels
accepted This issue was accepted, we will work on this at some point bug Something isn't working core In progress

Comments

@kam193
Copy link

kam193 commented Oct 1, 2024

Describe the bug
I was investigating why a URL, already included in IoCs exported from my collection, weren't recognized on a few submissions. It's turned out, the URL contains some upper characters while the version saved by Badlist updater was normalized to lower characters... but the normalization is not performed when matching against badlist. It looks like the manual adding items to Badlist also isn't normalized.

To Reproduce
Steps to reproduce the behavior:

  1. Have a file with URL including an URL in both normalized and original form, e.g. a TXT file like:
    https://exampleee.com/SomeUpperCaseURL
    https://exampleee.com/someuppercaseurl
    
  2. Add the normalized form to badlist manually.
  3. Submit file and observe only the normalized form being marked.
  4. Add the original form to a badlist update source and trigger downloading.
  5. Resubmit file and observe only the normalized form being marked. In addition, the URL in badlist should be only in the normalized form.
  6. Add the original form to the badlist manually. In addition, the URL in Badlist should now be also in the original form.
  7. Resubmit file and observe both URLs are marked.

Expected behavior
Tags are matched regardless of the normalized or not form.

Screenshots

Environment (please complete the following information if pertinent):

  • Assemblyline Version: 4.5.0.51, Badlist 4.5.0.19

Additional context

I have found the normalization in the updating server: https://github.com/CybercentreCanada/assemblyline-service-badlist/blob/3821ce750186704fd649609352ca822bf949877b/badlist/update_server.py#L158
But neither in Badlist client: https://github.com/CybercentreCanada/assemblyline-core/blob/06eb4c46f77be82e657de489d3d8d9350e709ea1/assemblyline_core/badlist_client.py#L139-L152
service API: https://github.com/CybercentreCanada/assemblyline-v4-service/blob/7dbbddc1cbedc8c3324c8a936b255552bd62fde6/assemblyline_v4_service/common/api.py#L140-L152
nor the badlist itself: https://github.com/CybercentreCanada/assemblyline-service-badlist/blob/3821ce750186704fd649609352ca822bf949877b/badlist/badlist.py#L97-L122

I suspect it may also be a case for file hashes and the safelist, but I haven't tested those cases

@kam193 kam193 added assess We still haven't decided if this will be worked on or not bug Something isn't working labels Oct 1, 2024
@cccs-rs cccs-rs self-assigned this Oct 7, 2024
@cccs-rs
Copy link
Contributor

cccs-rs commented Nov 28, 2024

Based on https://www.rfc-editor.org/rfc/rfc3986#section-6.2.2.1, it looks like we could normalize the domain part of the URLs to be lowercased since that part is case insensitive in practice. However other parts of the URL, we probably shouldn't normalize in cases where /About and /about could route differently even if on the same domain.

This would have to be particular to any tags that involve a URL or domain (I think currently just network.(static|dynamic).(domain|uri)) and we'd probably perform this normalization in the client (likely the exists_tags() call) since it's a common point for both the services and core where the normalization should take place before doing a search in ES.

@cccs-rs cccs-rs added core accepted This issue was accepted, we will work on this at some point In progress and removed assess We still haven't decided if this will be worked on or not labels Nov 29, 2024
cccs-rs added a commit to CybercentreCanada/assemblyline-service-badlist that referenced this issue Nov 29, 2024
cccs-rs added a commit to CybercentreCanada/assemblyline-service-badlist that referenced this issue Nov 29, 2024
Let the Badlist client perform data normalization of tags
@cccs-rs
Copy link
Contributor

cccs-rs commented Nov 29, 2024

This should be resolved in the 4.5.0.61 release and the v4.5.0.22 release of the Badlist service.

@cccs-rs cccs-rs closed this as completed Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This issue was accepted, we will work on this at some point bug Something isn't working core In progress
Projects
None yet
Development

No branches or pull requests

2 participants