-
Notifications
You must be signed in to change notification settings - Fork 9
IVS-556 Task pasta refactoring #204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice
'header_syntax_validation_subtask': TaskConfig( | ||
type=ValidationTask.Type.HEADER_SYNTAX, | ||
increment=5, | ||
model_field='status_header_syntax', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a string. Can these be the django model descriptors: ValidationTask.status_header_syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model_field='status_header_syntax', | ||
check_program=check_header_syntax, | ||
blocks=[ | ||
'header_validation_subtask', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also thinking for a way these not to be strings, which is a bit harder since they need to match the keys in the dicts.
Maybe it's an idea that we do not start from a dict, but rather as named instances.
header_validation_subtask = TaskConfig(
type=ValidationTask.Type.HEADER,
increment=10,
model_field='status_header',
check_program=check_validate_header,
blocks = [syntax_validation_subtask],
execution_stage="serial",
)
Later you can still wrap them in a list later when passing them to the registry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done Also simplified it using a make_task
function.
'industry_practices_subtask', | ||
'instance_completion_subtask', | ||
], | ||
execution_stage="serial", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enum?
(sorry, it's already really nice and organized)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the refactored code we don't strings anymore, just Validation.Task.Task_Type's. I think it's a bit confusing that we use task types on the one hand with a text representation and a separate celery task name on the other hand. The function name of the celery task will still be used, but we should probably go for the ValidationTask.Type as the main reference.
class Type(models.TextChoices):
"""
The type of an Validation Task.
"""
SYNTAX = 'SYNTAX', 'STEP Physical File Syntax'
return [sys.executable, *args] | ||
|
||
def check_syntax(file_path: str, task_id: int) -> list: | ||
return execute_check("-m", "ifcopenshell.simple_spf", "--json", file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think later we're going to try and reduce the usage of subprocesses. Cold start of a python process takes quite some time. I'd rather already at this stage not leak too much that they are subprocesses, but rather see them as functions (that at this moment still happen to invoke a subprocess).
'digital_signatures_subtask', | ||
'schema_validation_subtask', | ||
'normative_rules_ia_validation_subtask', | ||
'normative_rules_ip_validation_subtask', | ||
'industry_practices_subtask', | ||
'instance_completion_subtask' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be some grouping here as well? it's a bit repetitive and error prone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes :) Similar comment as above
With the refactored code we don't strings anymore, just Validation.Task.Task_Type's. I think it's a bit confusing that we use task types on the one hand with a text representation and a separate celery task name on the other hand. The function name of the celery task will still be used, but we should probably go for the ValidationTask.Type as the main reference.
class Type(models.TextChoices):
"""
The type of an Validation Task.
"""
SYNTAX = 'SYNTAX', 'STEP Physical File Syntax'
backend/apps/ifc_validation/tasks.py
Outdated
merged_result = {} | ||
for result in prev_result: | ||
if isinstance(result, dict): | ||
merged_result.update(result) | ||
prev_result = merged_result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged_result = {} | |
for result in prev_result: | |
if isinstance(result, dict): | |
merged_result.update(result) | |
prev_result = merged_result | |
prev_result = reduce(operator.or_, filter(lambda x: isinstance(x, dict), prev_result), {}) |
Opinions differ I guess which is more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added it, but it's not relevant anymore since I'd like to move away from prev_result (see comment above).
backend/apps/ifc_validation/tasks.py
Outdated
for blocker in task_registry.get_blockers_of(get_task_type(self.name)) | ||
) | ||
request = ValidationRequest.objects.get(pk=id) | ||
file_path = get_absolute_file_path(request.file.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can move to the if block below
backend/apps/ifc_validation/tasks.py
Outdated
) | ||
request = ValidationRequest.objects.get(pk=id) | ||
file_path = get_absolute_file_path(request.file.name) | ||
task = ValidationTask.objects.create(request=request, type=task_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks intentional, to create a non-initiated task for blocked tasks. But is it desirable, necessary? Should there be a comment as to why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment. The reason is more administrative; a task that is not created can't be marked as skipped. The comment;
# Always create the task record, even if it will be skipped due to blocking conditions,
# so it is logged and its status can be marked as 'skipped'
task = ValidationTask.objects.create(request=request, type=task_type)
backend/apps/ifc_validation/tasks.py
Outdated
prev_result_succeeded = prev_result is not None and prev_result[0]['is_valid'] is True | ||
if prev_result_succeeded: | ||
@validation_task_runner(ValidationTask.Type.INSTANCE_COMPLETION) | ||
def instance_completion_subtask(self, task, prev_result, request, file_path, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already greatly improved, but there a few things still:
- I hoped that the task logic function would have been agnostic of all the self, task, prev_result
@validation_task_runner(ValidationTask.Type.INSTANCE_COMPLETION)
def perform_instance_completion(file_path, request):
# no try-except: exception handling in the decorator (rereraise a more informed exception is also ok)
ifc_file = ifcopenshell.open(file_path)
# fetch and update ModelInstance records without ifc_type
with transaction.atomic():
model_id = request.model.id
model_instances = ModelInstance.objects.filter(model_id=model_id, ifc_type__in=[None, ''])
instance_count = model_instances.count()
logger.info(f'Retrieved {instance_count:,} ModelInstance record(s)')
for inst in model_instances.iterator():
inst.ifc_type = ifc_file[inst.stepfile_id].is_a()
inst.save()
# no exception means task can be marked as completed in decorator
return f'Updated {instance_count:,} ModelInstance record(s)'
And ideally this function would be referenced in the config as what is being executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there....
@validation_task_runner(ValidationTask.Type.INSTANCE_COMPLETION) | ||
def instance_completion_subtask(): pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to be that guy, but this is now a bit strange: to have empty decorated functions where the decorator does all the work (it doesn't even call the func
passed to it).
I guess what you could do:
def validation_task_runner(task_type):
@shared_task(bind=True)
@log_execution
@requires_django_user_context
@functools.wraps(func)
def wrapper(self, *args, **kwargs):
...
return wrapper
(i.e remove the decorator, but just make it a higher order function that returns a function)
and then
instance_completion_subtask = validation_task_runner(ValidationTask.Type.INSTANCE_COMPLETION)
That still results in a name that the celery worker can serialize and is a bit more in line with the intent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe then validation_task_runner
-> task_factory
, wrapper
-> task_function
or sth
Depending on: IfcOpenShell/step-file-parser#15
Remaining optional tasks: