Skip to content

IVS-556 Task pasta refactoring #204

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: development
Choose a base branch
from

Conversation

Ghesselink
Copy link
Contributor

@Ghesselink Ghesselink commented Jul 21, 2025

Depending on: IfcOpenShell/step-file-parser#15

Remaining optional tasks:

  • Task orchestration based on task configuration (not hardcoded in tasks.py)
  • Split functionalities into different files

Copy link
Collaborator

@aothms aothms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice

'header_syntax_validation_subtask': TaskConfig(
type=ValidationTask.Type.HEADER_SYNTAX,
increment=5,
model_field='status_header_syntax',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a string. Can these be the django model descriptors: ValidationTask.status_header_syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model_field='status_header_syntax',
check_program=check_header_syntax,
blocks=[
'header_validation_subtask',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also thinking for a way these not to be strings, which is a bit harder since they need to match the keys in the dicts.

Maybe it's an idea that we do not start from a dict, but rather as named instances.

header_validation_subtask = TaskConfig(
        type=ValidationTask.Type.HEADER,
        increment=10,
        model_field='status_header',
        check_program=check_validate_header,
        blocks = [syntax_validation_subtask],
        execution_stage="serial",
    )

Later you can still wrap them in a list later when passing them to the registry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done Also simplified it using a make_task function.

'industry_practices_subtask',
'instance_completion_subtask',
],
execution_stage="serial",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enum?

(sorry, it's already really nice and organized)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the refactored code we don't strings anymore, just Validation.Task.Task_Type's. I think it's a bit confusing that we use task types on the one hand with a text representation and a separate celery task name on the other hand. The function name of the celery task will still be used, but we should probably go for the ValidationTask.Type as the main reference.
class Type(models.TextChoices):
"""
The type of an Validation Task.
"""
SYNTAX = 'SYNTAX', 'STEP Physical File Syntax'

return [sys.executable, *args]

def check_syntax(file_path: str, task_id: int) -> list:
return execute_check("-m", "ifcopenshell.simple_spf", "--json", file_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think later we're going to try and reduce the usage of subprocesses. Cold start of a python process takes quite some time. I'd rather already at this stage not leak too much that they are subprocesses, but rather see them as functions (that at this moment still happen to invoke a subprocess).

Comment on lines 106 to 111
'digital_signatures_subtask',
'schema_validation_subtask',
'normative_rules_ia_validation_subtask',
'normative_rules_ip_validation_subtask',
'industry_practices_subtask',
'instance_completion_subtask'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be some grouping here as well? it's a bit repetitive and error prone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes :) Similar comment as above

With the refactored code we don't strings anymore, just Validation.Task.Task_Type's. I think it's a bit confusing that we use task types on the one hand with a text representation and a separate celery task name on the other hand. The function name of the celery task will still be used, but we should probably go for the ValidationTask.Type as the main reference.
class Type(models.TextChoices):
"""
The type of an Validation Task.
"""
SYNTAX = 'SYNTAX', 'STEP Physical File Syntax'

Comment on lines 256 to 260
merged_result = {}
for result in prev_result:
if isinstance(result, dict):
merged_result.update(result)
prev_result = merged_result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
merged_result = {}
for result in prev_result:
if isinstance(result, dict):
merged_result.update(result)
prev_result = merged_result
prev_result = reduce(operator.or_, filter(lambda x: isinstance(x, dict), prev_result), {})

Opinions differ I guess which is more readable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added it, but it's not relevant anymore since I'd like to move away from prev_result (see comment above).

for blocker in task_registry.get_blockers_of(get_task_type(self.name))
)
request = ValidationRequest.objects.get(pk=id)
file_path = get_absolute_file_path(request.file.name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can move to the if block below

)
request = ValidationRequest.objects.get(pk=id)
file_path = get_absolute_file_path(request.file.name)
task = ValidationTask.objects.create(request=request, type=task_type)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks intentional, to create a non-initiated task for blocked tasks. But is it desirable, necessary? Should there be a comment as to why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment. The reason is more administrative; a task that is not created can't be marked as skipped. The comment;

# Always create the task record, even if it will be skipped due to blocking conditions,
# so it is logged and its status can be marked as 'skipped'
task = ValidationTask.objects.create(request=request, type=task_type)

prev_result_succeeded = prev_result is not None and prev_result[0]['is_valid'] is True
if prev_result_succeeded:
@validation_task_runner(ValidationTask.Type.INSTANCE_COMPLETION)
def instance_completion_subtask(self, task, prev_result, request, file_path, *args, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already greatly improved, but there a few things still:

  • I hoped that the task logic function would have been agnostic of all the self, task, prev_result
@validation_task_runner(ValidationTask.Type.INSTANCE_COMPLETION)
def perform_instance_completion(file_path, request):
    # no try-except: exception handling in the decorator (rereraise a more informed exception is also ok)
    ifc_file = ifcopenshell.open(file_path)

    # fetch and update ModelInstance records without ifc_type
    with transaction.atomic():
        model_id = request.model.id
        model_instances = ModelInstance.objects.filter(model_id=model_id, ifc_type__in=[None, ''])
        instance_count = model_instances.count()
        logger.info(f'Retrieved {instance_count:,} ModelInstance record(s)')

        for inst in model_instances.iterator():
            inst.ifc_type = ifc_file[inst.stepfile_id].is_a()
            inst.save()

    # no exception means task can be marked as completed in decorator
    return f'Updated {instance_count:,} ModelInstance record(s)'

And ideally this function would be referenced in the config as what is being executed.

@Ghesselink Ghesselink marked this pull request as ready for review July 29, 2025 15:03
Copy link
Collaborator

@aothms aothms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there....

Comment on lines +205 to +206
@validation_task_runner(ValidationTask.Type.INSTANCE_COMPLETION)
def instance_completion_subtask(): pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be that guy, but this is now a bit strange: to have empty decorated functions where the decorator does all the work (it doesn't even call the func passed to it).

I guess what you could do:

def validation_task_runner(task_type):
        @shared_task(bind=True)
        @log_execution
        @requires_django_user_context
        @functools.wraps(func)
        def wrapper(self, *args, **kwargs):
            ...

         return wrapper

(i.e remove the decorator, but just make it a higher order function that returns a function)

and then

instance_completion_subtask = validation_task_runner(ValidationTask.Type.INSTANCE_COMPLETION)

That still results in a name that the celery worker can serialize and is a bit more in line with the intent.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe then validation_task_runner -> task_factory, wrapper -> task_function or sth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants