Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect tracking parameters in URLs #1998

Merged
merged 2 commits into from
Sep 27, 2023
Merged

Conversation

Famlam
Copy link
Collaborator

@Famlam Famlam commented Aug 25, 2023

Implements #1950

This PR detects tracking parameters in URLs, such as fbclid or utm_campaign, and strips them from the URL.
It uses the list of https://github.com/duckduckgo/privacy-configuration/blob/main/features/tracking-parameters.json, which is CC BY-NC-SA 4.0 licensed. Based upon my understanding, we can use this for the detection as long as we refer to its origin (via resource=*) as we're not commercial (but mind, I'm not a legally educated person).

As the easiest way to parse the URL was to use urlsplit, I also added a warning (class 30933) for broken URLs that give a ValueError in that function. These are however pretty rare, so I suspect it'll barely detect something. So if you prefer, I can also skip over those.

@frodrigo
Copy link
Member

which is CC BY-NC-SA 4.0 licensed. Based upon my understanding, we can use this for the detection as long as we refer to its origin (via resource=*) as we're not commercial (but mind, I'm not a legally educated person).

I'm unsure about that. As OSM is not NC. Maybe is better to ask ?

cc @jocelynj

@jocelynj
Copy link
Member

Are you sure about the license?

I see on https://github.com/duckduckgo/privacy-configuration/blob/main/LICENSE that it is using Apache license v2, which is an open-source license. CC BY-NC-SA 4.0 seems to be only about DuckDuckGo logos and marks.

@frodrigo
Copy link
Member

Are you sure about the license?

I see on https://github.com/duckduckgo/privacy-configuration/blob/main/LICENSE that it is using Apache license v2, which is an open-source license. CC BY-NC-SA 4.0 seems to be only about DuckDuckGo logos and marks.

It is not the same as here https://github.com/duckduckgo/privacy-configuration/tree/main#licensing

@jocelynj
Copy link
Member

So, we have two different information about license :(

It would be better to ask which license is applied, and if CC BY-NC-SA is confirmed, to ask for explicit authorization.

In the meantime, we could limit to utm_* variables, which can be found in a free license here: https://en.wikipedia.org/wiki/UTM_parameters

@Famlam
Copy link
Collaborator Author

Famlam commented Aug 26, 2023

It would be better to ask which license is applied, and if CC BY-NC-SA is confirmed, to ask for explicit authorization.

Drafted an email on Matrix. If that's fine, we could leave it for a few weeks on draft until we get a confirmation?
If we don't hear anything, I'll go for @jocelynj s suggestion to use the UTM parameters.

@Famlam Famlam marked this pull request as draft August 26, 2023 09:01
@Famlam Famlam force-pushed the strip_url_trackers branch 3 times, most recently from b3cfcef to be2a799 Compare September 26, 2023 09:49
@Famlam Famlam marked this pull request as ready for review September 27, 2023 10:36
@Famlam
Copy link
Collaborator Author

Famlam commented Sep 27, 2023

New implementation using the tracking parameter list from https://github.com/mpchadwick/tracking-query-params-registry is ready :)

plugins/Website.py Outdated Show resolved Hide resolved
plugins/Website.py Outdated Show resolved Hide resolved
@frodrigo frodrigo merged commit 8d4802c into osm-fr:dev Sep 27, 2023
3 checks passed
@frodrigo
Copy link
Member

Look ok for me. Thank you.

@Famlam Famlam deleted the strip_url_trackers branch September 27, 2023 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants