You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GpsBatchJobs_start_job is a vital part of batch job handling in the geopyspark driver, but has a very high tech debt score and zero unit test coverage (I think).
some quick observations
500+ lines of code
giant if isKube: .. else: ... construct with 200 LoC in if branch and 100 LoC in else branch
weird construction with sparkapplication.yaml.j2 where a jinja template is used to generate YAML, that is parsed and eventually converted to a dict. Feels a bit like overkill, but more importantly: a lot that can go wrong and zero unit testing
likewise, the submit_batch_job_spark3.sh code path is also pretty cumbersome: giant positional argument list with adhoc serialization and deserialization in bash. also see Simplify "submit_batch_job_spark3.sh" workflow #627
lot of hardcoded assumptions and vito references
The text was updated successfully, but these errors were encountered:
FYI the lack of test coverage make it risky to add features here, e.g. like i have with #845/#914. I basically have to merge in master and wait for integration tests to discover a problem. This slows down both my own work cycle, and I potentially break the cycle of other people/projects
soxofaan
changed the title
Tech debt: GpsBatchJobs_start_job
Tech debt: GpsBatchJobs._start_jobOct 23, 2024
GpsBatchJobs_start_job
is a vital part of batch job handling in the geopyspark driver, but has a very high tech debt score and zero unit test coverage (I think).some quick observations
if isKube: .. else: ...
construct with 200 LoC in if branch and 100 LoC in else branchsparkapplication.yaml.j2
where a jinja template is used to generate YAML, that is parsed and eventually converted to a dict. Feels a bit like overkill, but more importantly: a lot that can go wrong and zero unit testingsubmit_batch_job_spark3.sh
code path is also pretty cumbersome: giant positional argument list with adhoc serialization and deserialization in bash. also see Simplify "submit_batch_job_spark3.sh" workflow #627The text was updated successfully, but these errors were encountered: