Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filter for Nominet suspensions #91

Open
JimKillock opened this issue Dec 9, 2020 · 16 comments
Open

Add filter for Nominet suspensions #91

JimKillock opened this issue Dec 9, 2020 · 16 comments
Assignees

Comments

@JimKillock
Copy link
Member

See: https://happinessclasses.co.uk for the main pattern. Note:

  1. The pages will identify the requesting authority; the vast majority are PIPCU, but there are others, see https://wiki.openrightsgroup.org/wiki/Nominet/Domain_suspension_statistics
  2. There are a lot of these, suggest we place them at nominet.blocked.org.uk
  3. We should create an appeals process, as we intend to for copyright blocks, which can also be at a subdomain, eg copyright.blocked.org.uk
@carlheaton
Copy link

carlheaton commented Dec 9, 2020

This is the easiest method I found thus far (in theory it misses domains which were already on Cloudflare, but we believe that would be a Department <> Cloudflare issue which is a different process to Department <> Nominet):

TARGETS=`diff --unchanged-line-format= --old-line-format= --new-line-format='%L' <(grep -h ".*IN.*NS.*\.ns\.cloudflare\.com\." old/* | cut -d$'\t' -f1 | cut -d" " -f1 | uniq | sort) <(grep -h ".*IN.*NS.*\.ns\.cloudflare\.com\." new/* | cut -d$'\t' -f1 | cut -d" " -f1 | uniq | sort)`
TOTAL=`echo "$TARGETS" | wc -l`
echo "Found $TOTAL";
while IFS= read -r line; do if curl -m 3 --silent https://$line | grep -q "nominet"; then echo "$line"; fi; done <<< "$TARGETS"

Where old is extracted ukdata-YYYYMMDD.zip from previous day and new is extracted from current day. There's no changes prior to 20201122.

We've yet to see it rolled out to another department, but the "/img/colp-logo.jpg" is probably going to be the easiest identifier.

@carlheaton
Copy link

carlheaton commented Feb 18, 2021

Latest code been running for some time, just needs a diff adding to de-duplicate daily (or tracking of which dates have been processed or missed):

https://paste2.org/PmpNe1yy

@JimKillock
Copy link
Member Author

Thanks, so this is extraction from Nominet's zone files?

@carlheaton
Copy link

Yep it includes the API calls to download the newest and the oldest-1 zone files from the Nominet BrickFTP hosted zone file service, compare which domains had NS moved to cloudflare, visit those domains and look for the nominet-logo, output the results.

@JimKillock
Copy link
Member Author

I see @carlheaton the problem we have is that we have access to the zone files (current week only) as a Nominet member, but the API is available to registrars only. We found about 4.5k domains that way but cannot automate retrieval.

We're thinking there may be some bulk WHOIS lookups that could find some other suspended domains, as the WHOIS normally state that the domains are 'suspended'.

@JimKillock
Copy link
Member Author

JimKillock commented Feb 19, 2021

(we do IIRC have an API for submissions tho, so a willing partner could automatically submit likely suspended domains for us to test … we then make an HTTP at the domains, match for the HTML on the block pages, and store a copy of that.)

@carlheaton
Copy link

@JimKillock To be clear, I believe (even as member) you should be able to create a FILESCOM_API_KEY via https://data.nominet.uk/profile/edit (generic API access to the SFTP provider, not any Nominet specific API)

@carlheaton
Copy link

NB. I think even with access to whois2 through a bunch of different registrar, or using whois through a bunch of distributed ipv4 scripts, whois method (while being more accurate) will be too much volume and too slow to recognise these changes in any timely manner. I don't see them being in a rush to move away from CloudFlare, the guy who built this system left Nominet last month.

@JimKillock
Copy link
Member Author

Hi @carlheaton see Retrieve .uk domains via Nominet API #102 about what we found with the Nominet API as a member: basically we cannot get the changes to the zone file via API, which registrars can. We do have access to the files, but need to download and upload them manually AIUI.

@JimKillock
Copy link
Member Author

JimKillock commented Feb 20, 2021

All this said @dantheta @gwire it's not impossible for ORG ( or another likeminded commercial organisation maybe?) to become a registrar.

@carlheaton
Copy link

carlheaton commented Feb 21, 2021

I'm still confused Jim sorry, just to make sure we've both not missed anything.

As a member of Nominet you get access to the zone files.

In the Nominet member control panel, you request access to the zone files, accept some additional terms, are provided access to https://data.nominet.uk/

https://data.nominet.uk/ is provided by BrickFTP/files.com a third party, it is by default a web interface (via that URL) and also an SFTP service.

You can visit https://data.nominet.uk/ and manually login and manually download the files via https.

You can SFTP to sftp://user:[email protected]/ and automatically login and automatically download the files via https.

You can also go to https://data.nominet.uk/profile/edit and create an API key.

This API key allows you to automate the the https login / generate a token as shown in my script example, which allows you to automate the downloading of the zone files. THIS IS NOT A NOMINET API or KEY, THIS IS A BRICKFTP/FILES.COM API KEY.

In my example I download the newest zone file, the penultimate oldest zone file, compare the two. I don't download the newest and then next newest, because I'm too lazy to work out when the zone files appear / miss one (they also sometimes leave files in there which should be), so the script needs a little work to de-duplicate.

The rest API you reference in #102 is really only for managing your own domains and not querying other peoples domains.

The API to query other peoples domains are too rate limited to allow you to scan the entire zone on any frequent basis, but you could use those API or public whois services to validate each domain identified via the zone file method I presented.

Apologies if I'm confused in the above, but to re-iterate, I do not use any Nominet API and believe you do not need any Nominet API or registrar status to do what I have done.

@JimKillock
Copy link
Member Author

JimKillock commented Feb 21, 2021

That is very helpful: @dantheta maybe we can automate collection and checking domains this way?

@JimKillock
Copy link
Member Author

JimKillock commented Mar 2, 2021

I believe this is all in progress now @carlheaton ; @dantheta has implemented import mechanisms.

BTW it seems Nominet are rolling out block pages in stages, and had only done this for CTIRU until now. They are now doing the same for MHRA, NCA and FCA so we should see a few new variants from late Feb onwards. I don't know how easy it will be to catch these where the suspension is historic and applied to sites we have already looked at, it may be that nothing changes in the data to tell us that a block page may appear for instance.

@carlheaton
Copy link

Nice! The person who developed the system using cloudflare workers and ns left Nominet earlier this year, yet to see the new variants but I'd be surprised if other than some pre-amble and logo that they'd be much different. Yes we miss out on sites already on cloudflare being suspended, but I expect cloudflares burden of proof to be higher than Nominets.

@JimKillock
Copy link
Member Author

JimKillock commented Mar 2, 2021

Sure: we have two problems though:

  1. Identify the block from the blockpage; as you say, trivial
  2. Identify the sites we need to check; we only have the Nominet ZF information to go on. This is where I think we may have some difficulty, as there could be no change to the ZF information to indicate a change from "suspended, no response" to "suspended, gives block page"
  3. Thus, we may want to re-test older data for block pages where we have found indications in the data that Nominet have treated the domain as "suspended, [and give] no response"

@carlheaton
Copy link

carlheaton commented Mar 2, 2021

You basically need to query all the domains in ZF against whois daily to determine their suspended status, then curl the suspended to-do it properly, not even the domainers have built such massively parallel infrastructure (our need here is ~365x theirs). I'd be inclined to wait until we have a change of management to see if Nominet changes is misguided approach to policing-by-dns (or at-least be transparent about it).

dantheta added a commit that referenced this issue Mar 20, 2021
dantheta added a commit that referenced this issue Mar 20, 2021
dantheta added a commit that referenced this issue Mar 20, 2021
dantheta added a commit that referenced this issue Mar 20, 2021
dantheta added a commit that referenced this issue Mar 20, 2021
dantheta added a commit that referenced this issue Mar 20, 2021
dantheta added a commit that referenced this issue Mar 20, 2021
dantheta added a commit that referenced this issue Mar 20, 2021
dantheta added a commit that referenced this issue Mar 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants