Malware - Stats #1

rothoma2 · 2024-06-16T15:41:31Z

It is important to have statistics on some of the commonly observed Malicious Delivery Methods and file extensions.

Requirements.

A web scrapper tool, that scrapes and get publicly disclosed information from several sources (Malware Sandbox sites) and aggregated them to produce statistics such as File Extension, Malware Families etc).

Sources

Things to explore.

Amount of pages that can be scrapped before Rate Limit, or Captcha Kick in.
Parse HTML pages, and extract valuable data.

Example.

Collect last 10K malicious files ( For Windows) reported on each site, and aggregate them per File Extension.

poneoneo · 2024-06-20T10:04:01Z

To avoid barriers like captcha we should really think about deducated tool like bright data website. Some youtubers offers 10 dollars free to try you should take a look on this.

rothoma2 · 2024-06-26T19:59:47Z

@poneoneo maybe we should see if this pages have some captcha or ratelimit first. I think the project would be severly limited if we need to depend on a pay per use service, such a bright data website.

poneoneo · 2024-06-26T20:51:21Z

Ok @rothoma2 I will check this out. Maybe tools like playwright or selenium will be enought to behave like a real browser an overcome user-agent and captcha barrier

Ohnoimded · 2024-06-26T22:38:26Z

Selenium -base is working fine for it with uc driver

rothoma2 · 2024-06-27T14:19:40Z

Cool, maybe someone can upload some base code and we start extending from there.

Ohnoimded · 2024-06-27T14:49:58Z

The code will bypass all checks on app.any.run but can only get till page 5 as going further is restricted by them.
Need to implement actual scraping part for extracted rows.

from seleniumbase import SB
import time
import random

with SB(uc=True) as sb:
    print("Entering Website")
    sb.open_html_file("https://app.any.run/submissions/")
    sb.click("#history-filterBtn")
    sb.click("div.btn-group:nth-child(1) > button:nth-child(1)")
    time.sleep(random.randrange(0,2))
    sb.click("div.btn-group:nth-child(1) > div:nth-child(2) > ul:nth-child(1) > li:nth-child(1) > a")
    sb.click("#historySearchBtn")
    time.sleep(random.randrange(2,3))
    for i in range(5):
        time.sleep(random.randrange(1,2))
        soup = sb.get_beautiful_soup()
        extracted_row = soup.css.select("div.history-table--content__row")
        # I haven't done the bs part yet. Something like this. 
        sb.click(".history-pagination__next")

rothoma2 added this to Collaboration Request Jun 16, 2024

rothoma2 added top-level-task good first issue Good for newcomers help wanted Extra attention is needed labels Jun 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Malware - Stats #1

Malware - Stats #1

rothoma2 commented Jun 16, 2024

poneoneo commented Jun 20, 2024

rothoma2 commented Jun 26, 2024

poneoneo commented Jun 26, 2024

Ohnoimded commented Jun 26, 2024

rothoma2 commented Jun 27, 2024

Ohnoimded commented Jun 27, 2024 •

edited

Loading

Malware - Stats #1

Malware - Stats #1

Comments

rothoma2 commented Jun 16, 2024

poneoneo commented Jun 20, 2024

rothoma2 commented Jun 26, 2024

poneoneo commented Jun 26, 2024

Ohnoimded commented Jun 26, 2024

rothoma2 commented Jun 27, 2024

Ohnoimded commented Jun 27, 2024 • edited Loading

Ohnoimded commented Jun 27, 2024 •

edited

Loading