Given a tld rule list and a domain, parse the tld from the domain.
Yes, it's more complicated than domain.split(".")
;)
Parsing TLDs requires actually knowing all of the TLDS. These are maintained in a list online.
For some reason, all of the TLD parsers out that at the moment like to handle lookups to these lists internally, making them awkward to couple with whatever your flavour of application is.
This is a get-the-list-yourself situation.
pip install tld-parser
:)
You'll need access to the public suffix list:
Canonical: https://publicsuffix.org/list/public_suffix_list.dat
Git hosted: https://raw.githubusercontent.com/publicsuffix/list/master/public_suffix_list.dat
>>> from tld_parser import parse_rule_list, parse_domain
>>>
>>> from some_http_client import get
>>>
>>> suffix_list = get(list_url).content
>>> # The parser expects a Sequence of rules in the same format at the public suffix list.
>>> suffix_list = suffix_list.decode().splitlines()
>>>
>>> tld_rules = parse_rule_list(suffix_list)
>>>
>>> parse_domain(tld_rules, "some_subdomain.domain.co.uk")
Result(registrable_part='some_subdomain.domain', tld='co.uk')
And out pops a Result
object :)
- Raise
TLDParserError
on error parse fail, rather than returningNone
, lol. RaisesNoRegisterablePart
when the TLD is valid but the domain given was just a TLD. RaisesNoTLDMatch
when no TLD could be found.