Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert the evaluation from Redis pubsub to Redis Streams #677

Conversation

christophertubbs
Copy link
Contributor

closes #336

Updated the Runner for the evaluation service to launch jobs coming from a redis stream rather than a pubsub and updated the submission process to send jobs through the the stream as well.

The bulk of the work is within runner.py

@christophertubbs christophertubbs added the enhancement New feature or request label Jul 12, 2024
@christophertubbs christophertubbs force-pushed the evaluation-runner-pubsub-to-stream branch from 0dd12ed to 7a30a76 Compare July 15, 2024 13:39
@@ -21,15 +22,38 @@
LOGGER = logging.ConfiguredLogger()


ASGIView = typing.Callable[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use a protocol instead?

class ASGIView(typing.Protocol):
    def __call__(
        self,
        scope: typing.Mapping,
        receive: typing.Callable[[bool | None, float | None], typing.Mapping],
        send: typing.Callable[[typing.Mapping], None],
    ) -> typing.Coroutine[typing.Any, typing.Any, None]: ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of using a protocol in this instance? The purpose is to describe a hint for an expected signature for type hinting, not a promise for an object structure.

Would changing its name to ASGIViewSignature make more sense?

await self.send(bytes_data=message)
else:
await self.send(message)
while True:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this in unbounded, we should add the return type hint of -> typing.NoReturn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad idea. Might also be handy to detail how this is being managed and why to make it clear that this is a dependent operation that will be interrupted and won't necessarily create an infinite blocking loop.

@@ -137,8 +137,6 @@ async function submit_evaluation(event) {
editorData.view.setOption("readonly", true);
}

$(event.target).button("disable");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this accidentally committed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's a fix for annoying behavior I encountered while working. You can submit an evaluation, but the button becomes permanently disabled. I removed the behavior. It will allow you to rapid fire identical evaluations (not ideal), but if something fails and you only need to change a part of the config, you no longer need to refresh the page.

return evaluation_template_file.read()


def generate_evaluation_id() -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First question and it might be moot depending on the answer. Could / will a generated evaluation id ever make it into the db? I have a recollection that this was just for front end purposes, but just wanted to double check. If it is only for the front end, I think its probably fine. I would rather just return current_date.isoformat(timespec="seconds"), but honestly that is more of a nit than anything. If it could go into the db at some point:

My concern with this is that it is not unique enough. What are your thoughts on doing something like:

import secrets
from datetime import datetime

def generate_evaluation_id() -> str:
  now = datetime.now()
  salt = "%05x" % secrets.randombits(20)
  time = now.isoformat(timespec="seconds")
  # ex: '2024-08-06T10:49:04_f8c61'
  return f"{time}_{salt}"

The small amount of salt will decrease the chance of collisions. Its not perfect and using something like a ulid would be preferred, but just wanted to put it out there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a human friendly default/example identifier, not anything for uniqueness. It'll show up on the screen in the input box and if a user is going to be lazy, it'll submit the job and name it by that name. Ideally, the user would change the value to be more reflective of their work.

It is there so if you looked at a list of all the evaluations run in the last x hours, you'd be able to find the one you named. Hashes, random bits, etc don't work great in this context because the average goober doesn't naturally link that sort of data to concrete entities, though natural language and dates usually do the trick.

A docker container name-like scheme might work better, but that would require the involvement of random phrase generators.

If it were to go into a database, it'd be in some name field, not a key.

from utilities.common import ErrorCounter
from utilities import streams

EXCEPTION_LIMIT: typing.Final[int] = 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great to see this!

@christophertubbs christophertubbs force-pushed the evaluation-runner-pubsub-to-stream branch from 0c30a87 to 79eee88 Compare August 30, 2024 18:34
…low through a redis queue rather than a redis pubsub
…cently and added event handling to ensure that functions are when too many errors are hit. Fixed the parameters for the cleanup function and the return code for interrupt exits and added extra handling for keyboard interrupts that weren't being respected.
…moval of an unneeded RLock, added more messaging and error handling, removed the now unncessary TimedOccurenceWatcher, removed unneeded logging configurations that caused extra logging, added handling to the evaluation service logger to ensure that logged messages don't propagate to the root logger, wrote documentation for the runner, and updated the runner to use redis streams instead of redis pubsub.
…oid issues due to django service requirements and fixed a possible issue in the django test script.
…se more appropriate messaging levels, and added a more complete resource cleanup mechanism
@christophertubbs christophertubbs marked this pull request as ready for review September 10, 2024 21:09
Copy link
Contributor

@robertbartel robertbartel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to go ahead and approve this. I think at this point it's acceptable, and it's better to have it included than not. Any remaining items of concern going forward can have new issues created for them. Ideally, we should also catch any significant problems that hypothetically could be missed here with soon-to-be-create testing and validation processes, which will be created as part of the new QA/QC work.

@robertbartel robertbartel merged commit 7905fe8 into NOAA-OWP:master Oct 7, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve useage of Redis, Redis Streams, and other Redis components in inter-service communication
3 participants