IVS-556 Task pasta refactoring #204

Ghesselink · 2025-07-21T19:08:35Z

Depending on: IfcOpenShell/step-file-parser#15

Remaining optional tasks:

Task orchestration based on task configuration (not hardcoded in tasks.py)
Split functionalities into different files

aothms

Very nice

aothms · 2025-07-24T06:59:09Z

backend/apps/ifc_validation/task_configs.py

+    'header_syntax_validation_subtask': TaskConfig(
+        type=ValidationTask.Type.HEADER_SYNTAX,
+        increment=5,
+        model_field='status_header_syntax',


Instead of a string. Can these be the django model descriptors: ValidationTask.status_header_syntax.

aothms · 2025-07-24T07:09:37Z

backend/apps/ifc_validation/task_configs.py

+        model_field='status_header_syntax',
+        check_program=check_header_syntax,
+        blocks=[
+            'header_validation_subtask',


Also thinking for a way these not to be strings, which is a bit harder since they need to match the keys in the dicts.

Maybe it's an idea that we do not start from a dict, but rather as named instances.

header_validation_subtask = TaskConfig( type=ValidationTask.Type.HEADER, increment=10, model_field='status_header', check_program=check_validate_header, blocks = [syntax_validation_subtask], execution_stage="serial", )

Later you can still wrap them in a list later when passing them to the registry.

Done Also simplified it using a make_task function.

aothms · 2025-07-24T07:10:00Z

backend/apps/ifc_validation/task_configs.py

+            'industry_practices_subtask',
+            'instance_completion_subtask',
+        ],
+        execution_stage="serial",


Enum?

(sorry, it's already really nice and organized)

With the refactored code we don't strings anymore, just Validation.Task.Task_Type's. I think it's a bit confusing that we use task types on the one hand with a text representation and a separate celery task name on the other hand. The function name of the celery task will still be used, but we should probably go for the ValidationTask.Type as the main reference.
class Type(models.TextChoices):
"""
The type of an Validation Task.
"""
SYNTAX = 'SYNTAX', 'STEP Physical File Syntax'

aothms · 2025-07-24T07:12:19Z

backend/apps/ifc_validation/task_configs.py

+    return [sys.executable, *args]
+
+def check_syntax(file_path: str, task_id: int) -> list:
+    return execute_check("-m", "ifcopenshell.simple_spf", "--json", file_path)


I think later we're going to try and reduce the usage of subprocesses. Cold start of a python process takes quite some time. I'd rather already at this stage not leak too much that they are subprocesses, but rather see them as functions (that at this moment still happen to invoke a subprocess).

aothms · 2025-07-24T07:37:16Z

backend/apps/ifc_validation/task_configs.py

+            'digital_signatures_subtask',
+            'schema_validation_subtask',
+            'normative_rules_ia_validation_subtask',
+            'normative_rules_ip_validation_subtask',
+            'industry_practices_subtask',
+            'instance_completion_subtask'


should there be some grouping here as well? it's a bit repetitive and error prone.

Yes :) Similar comment as above

With the refactored code we don't strings anymore, just Validation.Task.Task_Type's. I think it's a bit confusing that we use task types on the one hand with a text representation and a separate celery task name on the other hand. The function name of the celery task will still be used, but we should probably go for the ValidationTask.Type as the main reference.
class Type(models.TextChoices):
"""
The type of an Validation Task.
"""
SYNTAX = 'SYNTAX', 'STEP Physical File Syntax'

aothms · 2025-07-24T07:46:35Z

backend/apps/ifc_validation/tasks.py

+                merged_result = {}
+                for result in prev_result:
+                    if isinstance(result, dict):
+                        merged_result.update(result)
+                prev_result = merged_result


Suggested change

merged_result = {}

for result in prev_result:

if isinstance(result, dict):

merged_result.update(result)

prev_result = merged_result

prev_result = reduce(operator.or_, filter(lambda x: isinstance(x, dict), prev_result), {})

Opinions differ I guess which is more readable

I've added it, but it's not relevant anymore since I'd like to move away from prev_result (see comment above).

aothms · 2025-07-24T07:48:00Z

backend/apps/ifc_validation/tasks.py

+                    for blocker in task_registry.get_blockers_of(get_task_type(self.name))
+                )
+            request = ValidationRequest.objects.get(pk=id)
+            file_path = get_absolute_file_path(request.file.name)


Can move to the if block below

aothms · 2025-07-24T07:48:31Z

backend/apps/ifc_validation/tasks.py

+                )
+            request = ValidationRequest.objects.get(pk=id)
+            file_path = get_absolute_file_path(request.file.name)
+            task = ValidationTask.objects.create(request=request, type=task_type)


This looks intentional, to create a non-initiated task for blocked tasks. But is it desirable, necessary? Should there be a comment as to why?

Added a comment. The reason is more administrative; a task that is not created can't be marked as skipped. The comment;

# Always create the task record, even if it will be skipped due to blocking conditions, # so it is logged and its status can be marked as 'skipped' task = ValidationTask.objects.create(request=request, type=task_type)

aothms · 2025-07-24T08:10:04Z

backend/apps/ifc_validation/tasks.py

-    prev_result_succeeded = prev_result is not None and prev_result[0]['is_valid'] is True
-    if prev_result_succeeded:
+@validation_task_runner(ValidationTask.Type.INSTANCE_COMPLETION)
+def instance_completion_subtask(self, task, prev_result, request, file_path, *args, **kwargs):


It's already greatly improved, but there a few things still:

I hoped that the task logic function would have been agnostic of all the self, task, prev_result

@validation_task_runner(ValidationTask.Type.INSTANCE_COMPLETION) def perform_instance_completion(file_path, request): # no try-except: exception handling in the decorator (rereraise a more informed exception is also ok) ifc_file = ifcopenshell.open(file_path) # fetch and update ModelInstance records without ifc_type with transaction.atomic(): model_id = request.model.id model_instances = ModelInstance.objects.filter(model_id=model_id, ifc_type__in=[None, '']) instance_count = model_instances.count() logger.info(f'Retrieved {instance_count:,} ModelInstance record(s)') for inst in model_instances.iterator(): inst.ifc_type = ifc_file[inst.stepfile_id].is_a() inst.save() # no exception means task can be marked as completed in decorator return f'Updated {instance_count:,} ModelInstance record(s)'

And ideally this function would be referenced in the config as what is being executed.

backend/apps/ifc_validation/tasks.py

aothms

Almost there....

aothms · 2025-08-05T12:23:03Z

backend/apps/ifc_validation/tasks/task_runner.py

+@validation_task_runner(ValidationTask.Type.INSTANCE_COMPLETION)
+def instance_completion_subtask(): pass


Sorry to be that guy, but this is now a bit strange: to have empty decorated functions where the decorator does all the work (it doesn't even call the func passed to it).

I guess what you could do:

def validation_task_runner(task_type): @shared_task(bind=True) @log_execution @requires_django_user_context @functools.wraps(func) def wrapper(self, *args, **kwargs): ... return wrapper

(i.e remove the decorator, but just make it a higher order function that returns a function)

and then

instance_completion_subtask = validation_task_runner(ValidationTask.Type.INSTANCE_COMPLETION)

That still results in a name that the celery worker can serialize and is a bit more in line with the intent.

Maybe then validation_task_runner -> task_factory, wrapper -> task_function or sth

Ghesselink added 2 commits July 21, 2025 18:47

refactor

e761515

add task configuration

1c1b14c

Ghesselink temporarily deployed to development July 21, 2025 19:08 — with GitHub Actions Inactive

tasks configuration

aa33ef0

Ghesselink temporarily deployed to development July 23, 2025 21:58 — with GitHub Actions Inactive

Ghesselink had a problem deploying to development July 23, 2025 21:58 — with GitHub Actions Failure

small DRY improvements

5e28a51

Ghesselink had a problem deploying to development July 23, 2025 23:39 — with GitHub Actions Failure

Ghesselink temporarily deployed to development July 23, 2025 23:39 — with GitHub Actions Inactive

Ghesselink requested review from rw-bsi and aothms July 23, 2025 23:41

aothms requested changes Jul 24, 2025

View reviewed changes

Ghesselink added 6 commits July 26, 2025 23:53

set model status to django model descriptors

e5d1ec9

simplify error handling

2716657

Add documentation for prev_result parallel tasks

e83cbad

update configs

dac01c1

update_progress inside single decorator

e478119

Move all tasks to separate folder

3a8eebe

Ghesselink had a problem deploying to development July 28, 2025 11:07 — with GitHub Actions Failure

Ghesselink temporarily deployed to development July 28, 2025 11:07 — with GitHub Actions Inactive

Ghesselink added 2 commits July 28, 2025 13:09

use db status rather than prev_result of for blocking tasks

7f6f993

rm internal results, rely on db

f46c779

Ghesselink had a problem deploying to development July 28, 2025 12:36 — with GitHub Actions Failure

Ghesselink temporarily deployed to development July 28, 2025 12:36 — with GitHub Actions Inactive

Ghesselink added 3 commits July 29, 2025 13:18

configuration improvements

3861e60

update ifcopenshell, align tests

fa85b85

trigger tests, rm pdb

54499a7

Ghesselink marked this pull request as ready for review July 29, 2025 15:03

Merge branch 'development' into IVS-556-task-pasta-refactoring

49573db

Ghesselink temporarily deployed to development July 29, 2025 15:05 — with GitHub Actions Inactive

remove lines of code

645868d

Ghesselink temporarily deployed to development August 2, 2025 17:31 — with GitHub Actions Inactive

add separate execution layer

4f2c1c0

Ghesselink temporarily deployed to development August 2, 2025 20:27 — with GitHub Actions Inactive

check_program bugfix

25489bb

Ghesselink temporarily deployed to development August 3, 2025 19:44 — with GitHub Actions Inactive

refactor processing layer, create task context

1fa7c38

aothms requested changes Aug 5, 2025

View reviewed changes

		@validation_task_runner(ValidationTask.Type.INSTANCE_COMPLETION)
		def instance_completion_subtask(): pass

IVS-556 Task pasta refactoring #204

Are you sure you want to change the base?

IVS-556 Task pasta refactoring #204

Conversation

Ghesselink commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aothms left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aothms left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Ghesselink commented Jul 21, 2025 •

edited

Loading