-
Notifications
You must be signed in to change notification settings - Fork 56
Canadian Digital Service changes #775
Comments
@evadb is also dev with CDS working on this project, so you are aware |
Hello friends o/ |
Hello!
That would be very useful. It has evolved a bit haphazardly, and is also very tough to debug against. For example, testing out one piece of the pipeline while mocking out the prior parts leading up to it, without having to wait minutes/hours for things to complete, is difficult (and is what motivated some of the haphazard flags, to try to cut through that).
That makes sense. And this backend data loading ETL part of Pulse is really quite separate from the front-end. You could theoretically separate the web front-end and data loading back-end relatively cleanly (which might be a useful mental frame for a "pulse lite" structure).
👍 👍
👍 👍
Yeah, this could be
An ideal outcome here would be a data/processing.py that didn't have to carve out specific code for each type of thing being monitored. There is a bit of super-core topic-specific code (the threshold-setting for the HTTPS/HSTS/TLS fields) that could be moved elsewhere, and ideally
Yes. One way to start might be just replacing
Do you mean eliminating the on-disk output of domain-scan as well? Having the files be somewhere is very helpful for debugging. I reference the S3 buckets we store raw results in all the time: But one area where domain-scan would benefit is having a small opinionated option to have its
I regret choosing TinyDB - it's slow and more cumbersome than I expected. If I could go back, I'd pick SQLite or some other small thing. I will say that having sub-objects inside documents/rows is very helpful, and potentially extra useful for a "pulse lite", where topic-specific fields don't need to be carved out in a database migration. That was part of the original motivation in choosing a schema-less database. I believe Postgres also supports things like that these days too, but that may be heavier infrastructure than either of us wants.
I would generally expect our visual presentations to diverge somewhat, at the bare minimum for branding purposes. What would be nice to figure out upfront is if we can at least keep the innards as a shared upstream.
Two possible places to split the app come to mind:
I think any of these likely means un-hardcoding out the topics being measured and have those be generatable from separately managed configuration. So there'd be some concept of an "HTTPS" topic that specifies what fields are expected from where, how to translate those input fields into display fields, and how to render those display fields effectively. This would strip HTTPS-specific code from the That's a lot on my end too. :) I'm all for just taking this in easy steps, but if there's any particularly compelling way to find a core we can all use, I'm all ears and ready to support. |
Man, this should be like 5 separate conversations in different places to allow for responses that don't take 20+ minutes to write. I'll be very brief, it's likely each of these sections can (should) be it's own conversation CLI: How dependent on that exact interface are you guys? There are really three problems I have with it.
The changes I'd want to do to fix those (mostly the third one) would break the interface. 🎉 Down with 👍 Generalizing
I do like me some document databases. I've only used Mongo, but I agree that they are very handy.
Yeah, I'll shelve my feelings on it for now, but I feel like there's got to be a better way (some logging type stuff to handle debugging or something). Small steps first. As a gut reaction I like completely splitting the backend ETL from the frontend entirely, but I can the downside of repeating ourselves a bit. |
Standard unix style arguments would be preferable, and we can break the interface to do it.
That's the ticket.
Would extending domain-scan to support output directly to S3 instead of local disk address this? Because there's a very clear argument to do that, as filed in 18F/domain-scan#197. I'm good breaking these into separate conversations, feel free to close this and we can advance any of them individually! |
Sounds good, I will close this issue, and likely open new ones for individual topics soon (sometime today) |
As per #774 let's discuss some changes.
I'm going to just spew out some of the ideas/plans we've been discussing over here, with the understanding that most/all of it is still in very early stages and not really solidified. The scope of work implications of each piece has only been briefly considered, so it may be some of these are just too much.
The CLI of the
data
module seems a bit unintuitive and difficult to understand from simply looking at the code, with handling of different options/arguments spread throughout the codebase. We'd like to consolidate it into one central CLI handler section that is explicit in all the arguments it supports and their effect. Something to the effect of the WIP PR https://github.com/cds-snc/pulse/pull/3/files#diff-bb1a6896d9e62542cd89ee8daa479277 in our fork.Going hand in hand with that is modularizing the code in
update.py
andprocessing.py
, as right now it's pretty densely packed intorun
methods that do a lot. In that same WIP PR I've broken things up a bit so that different functionality can be run independently of the others.No specific plans for it as of right now, but we'd like to eliminate
shell_out
entirely. A python implemented alternative for howcurl
is used (likely using something like therequests
module) seems like it would fit the bill, and oncedomain-scan
is turned into a python package (Which I see there is a PR for right now 👍 👍 👍 from us on that idea), it can be listed as a dependency and called directly instead. I think some AWS stuff is also done viashell_out
to a command line tool, I've not done much research but I would be surprised if there was not an existing python library for such a use case.As of right now, there's a decent chunk of Pulse that's not in scope for our work (primarily the "analytics" section), at least in this first pass, we're concerned with HTTPS, and SSL. As such we'll likely be stripping the analytics section (along with the a11y stuff) out entirely.
As you would expect, our needs for domain lists will be different than yours were/are, as such our equivalent to your
meta.yml
file will have to be different, along with the options passed along todomain-scan
. If we go the pulse-lite approach, it might be worthwhile to look into abstracting some of the URL functionality, to avoid other users of Pulse needing to hard code their own URLs. This could be helped by more extreme modularization of the existing steps into more pure functions, that way lists of domains could just be fed into the functions that actually operate on them instead of those functions fetching the domains from specific locations via things like the meta.yml file or environment variables. Ultimately the domains could still bet fetched from a file or whatever is a good choice, just not fetched in the functions that do the operating.I know this is approaching the realm of very large changes, but the thought had crossed my mind many times that I would love if there didn’t have to be so many intermediate steps of writing data to disk. I would like the database that feeds pulse to be the only place that data sits at rest. That probably has huge implications with how caching of expensive operations are currently handled.
We’re unlikely to stay with the decision to use TinyDB to feed pulse, I haven’t put much thought into it but something like sqlite seems appropriate.
I’m not sure to what extent, but I know the designer on our team is working on some changes to the visual look.
As to the idea of a shared core, some kind of pulse lite, that sounds like a good idea to me. Do you have any existing ideas in that vein?
Now after writing all that, I realize that there is a lot. Don’t feel like you need to respond specifically to all of it, this is mostly to inform you as to what we’re thinking, to inform you as to the discussion of a shared base.
The text was updated successfully, but these errors were encountered: