Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New update for https://github.com/duckduckgo/tracker-radar/tree/main/entities #130

Open
pooneh-nb opened this issue Mar 22, 2022 · 5 comments

Comments

@pooneh-nb
Copy link

pooneh-nb commented Mar 22, 2022

I was working on a project to identify tracking/adverting domains on the Alexa echo device. I used https://github.com/duckduckgo/tracker-radar/tree/main/entities to find the parent companies behind each domain name. Thanks for sharing such a great dataset!
I figured out several domain names were not available in your dataset. So, I manually look them up from ICANN, crunchbase.com, or their website. Since some are tracking/advertising websites, I think it's good to update your database. Here is the update:

{'acsechocaptiveportal.com' : 'Amazon Technologies, Inc.',
'amazon-dss.com' : 'Amazon Technologies, Inc.',
'amazonalexa.com': 'Amazon Technologies, Inc.',
'amcs-tachyon.com' : 'Amazon Technologies, Inc.',
'fireoscaptiveportal.com' : 'Amazon Technologies, Inc.',
'chtbl.com' : 'Chartable Holding Inc',
'chrt.fm' : 'Chartable Holding Inc',
'dillilabs.com' : 'Dilli Labs LLC',
'megaphone.fm' : 'Spotify AB',
'omny.fm' : 'Triton Digital, Inc.',
'podtrac.com' : 'Podtrac Inc',
'voiceapps.com' : 'Voice Apps LLC',
'mittendorf.net' : 'individual',
'doctorpooch.com' : 'Dilli Labs LLC',
'kwimer.com' : 'Highwinds Network Group, Inc'}

I'm gonna cite this dataset in our paper. Can I ask where is the source of this dataset?

@kdzwinel
Copy link
Member

Hey Pouneh, thanks a lot for sharing you findings, we really appreciate it!

I'm gonna cite this dataset in our paper. Can I ask where is the source of this dataset?

Not sure if I understand your question, but this repo is the source. You can reference it like this:

"DuckDuckGo Tracker Radar", [online] Available: https://github.com/duckduckgo/tracker-radar, Retrieved: March 2022.

@pooneh-nb
Copy link
Author

Hey Konrad, thanks for your reply.
So my question was that what is the source of this dataset? Like did you query crunchbase.com or WHOIS to find the company behind each domain name?

@kdzwinel
Copy link
Member

Ah, sorry for misunderstanding. We use public WHOIS data, SSL cert data and do manual investigation (e.g. by reviewing privacy polices). We also do semi-automatic cleanup. Small portion of the data is contributed by outside contributors. LMK if that helps!

@pooneh-nb
Copy link
Author

I see that makes sense. Thank you!

@Ohcora
Copy link

Ohcora commented Sep 8, 2024

Youe wellcome I will have to go back k in data bass to see what source it was

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants