Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New snapshot? #282

Open
steve-mavens opened this issue Apr 24, 2023 · 6 comments
Open

New snapshot? #282

steve-mavens opened this issue Apr 24, 2023 · 6 comments

Comments

@steve-mavens
Copy link

steve-mavens commented Apr 24, 2023

Apologies if I've failed to find this in the docs, but is there any official cadence for how often the PSL snapshot is updated and a new release made?

We tripped over this because of a material change under .museum: so now our online and offline tests get different results for one of our test cases, that happens to be in there.

Obviously our test case is our problem (and maybe offline tests of code that uses tldextract are not a great idea in the first place). But it would be useful to know if it's our problem for a while, or if you were due to update the snapshot fairly soon anyway.

@john-kurkowski
Copy link
Owner

There's no cadence. It's easy for me to update, so I just did in 6f45fed.

$ curl https://publicsuffix.org/list/public_suffix_list.dat > tldextract/.tld_set_snapshot

@john-kurkowski
Copy link
Owner

Some possible solutions.

  1. This project continually publishes upon update of the upstream list.
  2. Vendor your copy of the suffix list and point to it in your tests (and/or application) via the suffix_list_urls or cache_dir kwargs; avoid diverging tests online vs. offline.
  3. Test against suffixes that probably won't change in the upstream list, like example.com or example.probablyneverasuffix.
  4. Decouple from testing this library; assume it works; stub it in your tests.

@steve-mavens
Copy link
Author

Thanks very much!

(1) sounds like unwarranted effort for you (and might make the changelog a bit spammy?)

(2) is probably what I should do, or a short suffix list file would cover these tests.

(3) Turns out I'm not a perfect judge of what's probable! The case was chosen as a non-ASCII second-level domain listed in the PSL, and .museum seemed stable at the time. Until now that test file was unchanged since written in 2020, so it's not volatile enough to be a real problem.

(4) Would also work, but even when I have that fully isolated test I usually want the integration test as well, so it's a question of whether I can get away with that integration test being offline, or whether I need to be online in order to test that my understanding of tldextract is correct.

Anyway I think in some sense (2) amounts to saying, "tldextract can be its own fake". It's isolatable enough, and it can be configured with any invented cases needed. So if I do that then it's kind of a semantic argument whether I have a true unit test of my function against a fake I didn't write myself, or an integration test of my function + tldextract with a lower-level dependency (the PSL) stubbed. My team doesn't do enough formal test design for that distinction to matter.

Btw, before I used tldextract I had a checked-in copy of the PSL and my own parser. My commit Function to identify public suffix, from Mozilla's list of rules was on 2011-02-11. So if I'd worked on other features for another 17 days I guess I could have saved that effort and used tldextract from the start!

@steve-mavens
Copy link
Author

Oh, and I think another possible solution is to run a line of code to let tldextract get a fresh PSL in between installing the test environment (which obviously is an online operation) and running the offline tests (with pytest-socket to enforce offline-ness). I suppose arguably this is just (2) again, with the cache_dir arg rather that the suffix_list_urls arg. Or I could make the PSL's URL an exception to pytest-socket.

@john-kurkowski
Copy link
Owner

So if I'd worked on other features for another 17 days I guess I could have saved that effort and used tldextract from the start!

😊

I like your breakdown. Yeah, there are tradeoffs in all directions, depending how robust and formal you want your test suite. Your last suggestion with the test suite continually updating the PSL reminds me of this article on verified fakes.

@steve-mavens
Copy link
Author

steve-mavens commented Apr 27, 2023

Yes, sounds good. I've also seen (but IIRC never implemented) a variant on that where you put an interception layer in, and generate stub responses by capturing the responses from the run of the live version of the test. So instead of the verified fake you have an "updateable stub". I say never implemented: many times I've set a breakpoint and dumped some http response to disk for use as a test case, never have I properly automated that. There's probably a framework for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants