Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rework configuration files to more easily add or remove nodes #374

Open
jeanetteclark opened this issue Aug 15, 2023 · 1 comment
Open
Milestone

Comments

@jeanetteclark
Copy link
Collaborator

Currently, all of the nodes that the quality engine is running on are listed in metadig.properties. When we had 2 or 3 nodes this wasn't a big deal but now we have 8, and the number is growing rapidly. Similarly, the tasks for each of these nodes are listed in taskList.csv. Adding a new node requires a configuration change which means that a helm upgrade has to be run to install the new config files. This is not ideal, in addition to the fact that the config files will quickly become unwieldy.

There are a few options here I think, which I'll outline below:

1. Dynamic lookup for nodes and dynamic assignment of tasks

Using bookkeeper we can look up the hosted repositories that metadig-engine should be running on. I think we would need to combine that list with a list of DataONE nodes funded through other sources(ADC, KNB, ESS-DIVE) that are listed in the metadig.properties file. This does keep some hard coded values in metadig.properties but the number would (maybe?) be much smaller and not change as frequently as the list generated by bookkeeper.

From this list of nodes, we would assume that the tasks are all fairly similar. Each node will get a quality task with the Dataone FAIR suite and a member node assessment task. We can also do a regex search and assign additional quality tasks to nodes based on the node/suite name (eg: urn:node:ARCTIC would match to the arctic-data-center-suite). This way custom suites can still be written and run without us having to hard code in every node/suite combination in the taskList file. This would effectively remove the node score and quality tasks from the taskList.csv, leaving portal scoring tasks, listing CN nodes, and the data file acquisition tasks to remain in the taskList file. I'm not sure how some of the parameters would be set though, things like the formatId filter or harvest begin date would have to be generic and not configurable.

Overall this solution only feels okay because we can't really dynamically get all of the nodes we need, unless there is some flag that I could use from the DataONE API that I'm not aware of.

2. Static lookup for nodes and dynamic assignment of tasks

Take the list of nodes out of metadig.properties and put it some other config file. Instead of listing both the subjectId and subjectURL, just list the node ID and use the dataONE API to get the endpoint and subjectId.

Tasklist is as above.

3. Static lookup for nodes and tasks

Static list for the nodes and tasks would make the tasks more fully configurable, but we are still stuck with an unwieldy taskList file. I would like to consider refactoring that taskList.csv into a json file though, hopefully in a way that makes it a little more readable and parsable.

None of these solutions really seem ideal, though I think the second one might be the easiest (mostly since I'm unsure where bookkeeper is in terms of development).

Thoughts @mbjones ?

@jeanetteclark jeanetteclark added this to the 3.0 milestone Aug 15, 2023
@mbjones
Copy link
Member

mbjones commented Aug 15, 2023

These all sounds like good options, with bookkeeper being the preferred route but obviously more complicated. Peter did some initial work on incorporating bookkeeper into metadig that is turned off, so we should look into what he did and how that might work/contribute to a solution. In terms of the KNB/ARCTIC/etc, I think those should go in bookkeeper as well as valid quotas and we can use a single solution to know which sites can get which services.

As bookkeeper isn't yet ready for primetime, let's discuss a shorter term strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

2 participants