Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add TES models with unit tests #32

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

feat: add TES models with unit tests #32

wants to merge 12 commits into from

Conversation

Karanjot786
Copy link
Member

@Karanjot786 Karanjot786 commented Oct 12, 2024

Introduce Pydantic models for Task Execution Services (TES) to enhance type safety and data validation, conforming to the GA4GH schemas. Additionally, provide comprehensive unit tests to ensure model integrity and functionality.

Changes:

  • Added Models:

    • crategen/models/tes_models.py: Contains all TES-related Pydantic models, including TESData, TESInput, TESOutput, TESExecutor, TESTaskLog, TESResources, TESExecutorLog, TESOutputFileLog, TESFileType, and TESState.
  • Added Unit Tests:

    • tests/unit/test_tes_models.py: Includes unit tests for all TES models, covering valid instances, validation errors, and edge cases.
  • Updated Package Initialization:

    • crategen/models/__init__.py: Added imports for TES models and defined __all__ to specify the public API of the models package.

Attribution:

  • Portions of the TES models were adapted from @SalihuDickson's implementations to maintain consistency and ensure alignment with project standards.

Documentation:

  • Added comprehensive docstrings to all public classes and methods in tes_models.py to improve code readability and maintainability.

Linting and Testing:

  • Resolved all linting errors using Ruff by adding necessary docstrings and replacing magic values with named constants.
  • Ensured all unit tests pass successfully.

Notes:

  • This PR focuses solely on adding TES models and their corresponding unit tests.
  • Integration of these models into converters (tes_converter.py and wes_converter.py) and addition of related tests will be handled in subsequent PRs to maintain manageable and focused code reviews.

Summary by Sourcery

Add Pydantic models for Task Execution Services (TES) to enhance type safety and data validation, along with comprehensive unit tests to ensure model integrity and functionality. Update package initialization and resolve linting issues.

New Features:

  • Introduce Pydantic models for Task Execution Services (TES) to enhance type safety and data validation, conforming to the GA4GH schemas.

Enhancements:

  • Update package initialization to include imports for TES models and define __all__ to specify the public API of the models package.

Documentation:

  • Add comprehensive docstrings to all public classes and methods in tes_models.py to improve code readability and maintainability.

Tests:

  • Add comprehensive unit tests for all TES models, covering valid instances, validation errors, and edge cases.

Chores:

  • Resolve all linting errors using Ruff by adding necessary docstrings and replacing magic values with named constants.

Copy link

sourcery-ai bot commented Oct 12, 2024

Reviewer's Guide by Sourcery

This pull request introduces Pydantic models for Task Execution Services (TES) along with comprehensive unit tests. The changes focus on enhancing type safety and data validation, conforming to the GA4GH schemas. The implementation includes new models, unit tests, and updates to the package initialization.

Class diagram for TES models

classDiagram
    class TESFileType {
        <<enumeration>>
        FILE
        DIRECTORY
    }

    class TESState {
        <<enumeration>>
        UNKNOWN
        QUEUED
        INITIALIZING
        RUNNING
        PAUSED
        COMPLETE
        EXECUTOR_ERROR
        SYSTEM_ERROR
        CANCELLED
    }

    class TESOutputFileLog {
        +str url
        +str path
        +str size_bytes
    }

    class TESExecutorLog {
        +Optional~datetime~ start_time
        +Optional~datetime~ end_time
        +Optional~str~ stdout
        +Optional~str~ stderr
        +int exit_code
    }

    class TESExecutor {
        +str image
        +list~str~ command
        +Optional~str~ workdir
        +Optional~str~ stdout
        +Optional~str~ stderr
        +Optional~str~ stdin
        +Optional~dict~str, str~~ env
    }

    class TESResources {
        +Optional~int~ cpu_cores
        +Optional~bool~ preemptible
        +Optional~float~ ram_gb
        +Optional~float~ disk_gb
        +Optional~list~str~~ zones
    }

    class TESInput {
        +Optional~str~ name
        +Optional~str~ description
        +Optional~AnyUrl~ url
        +str path
        +TESFileType type
        +Optional~str~ content
    }

    class TESOutput {
        +Optional~str~ name
        +Optional~str~ description
        +AnyUrl url
        +str path
        +TESFileType type
    }

    class TESTaskLog {
        +list~TESExecutorLog~ logs
        +Optional~dict~str, str~~ metadata
        +Optional~datetime~ start_time
        +Optional~datetime~ end_time
        +list~TESOutputFileLog~ outputs
        +Optional~list~str~~ system_logs
    }

    class TESData {
        +str id
        +Optional~str~ name
        +Optional~str~ description
        +Optional~datetime~ creation_time
        +Optional~TESState~ state
        +list~TESInput~ inputs
        +list~TESOutput~ outputs
        +list~TESExecutor~ executors
        +Optional~TESResources~ resources
        +Optional~list~str~~ volumes
        +Optional~list~TESTaskLog~ logs
        +Optional~dict~str, str~~ tags
    }

    TESData --> TESInput
    TESData --> TESOutput
    TESData --> TESExecutor
    TESData --> TESResources
    TESData --> TESTaskLog
    TESInput --> TESFileType
    TESOutput --> TESFileType
    TESTaskLog --> TESExecutorLog
    TESTaskLog --> TESOutputFileLog
    TESExecutorLog --> TESState
Loading

File-Level Changes

Change Details Files
Introduced TES Pydantic models
  • Created TESData model representing a TES task
  • Implemented TESInput and TESOutput models for handling task inputs and outputs
  • Added TESExecutor model for defining task executors
  • Created TESResources model for specifying task resource requirements
  • Implemented TESTaskLog and TESExecutorLog models for logging
  • Added TESFileType and TESState enums for file types and task states
crategen/models/tes_models.py
Added comprehensive unit tests for TES models
  • Implemented tests for TESInput model validation
  • Created tests for TESOutput model validation
  • Added tests for TESExecutor model
  • Implemented tests for TESData model
  • Created tests for edge cases and validation errors
tests/unit/test_tes_models.py
Updated package initialization
  • Added imports for TES models
  • Defined all to specify the public API of the models package
crategen/models/__init__.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Karanjot786 - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟡 Testing: 3 issues found
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

crategen/models/tes_models.py Outdated Show resolved Hide resolved
tests/unit/test_tes_models.py Outdated Show resolved Hide resolved
tests/unit/test_tes_models.py Outdated Show resolved Hide resolved
tests/unit/test_tes_models.py Outdated Show resolved Hide resolved
crategen/models/tes_models.py Outdated Show resolved Hide resolved
tests/unit/test_tes_models.py Outdated Show resolved Hide resolved
@SalihuDickson
Copy link

SalihuDickson commented Oct 12, 2024

@Karanjot786, the generally accepted practice to split PRs like this is to cherry pick the commits and then make adjustments. Copying and pasting someones else's code (mine in this case) with no attribution is not very professional.

Of course I don't mind all that much, just for future reference : )

@Karanjot786
Copy link
Member Author

Hi @SalihuDickson,

Thank you for bringing this to my attention and for your understanding.

I acknowledge that I copied your code without proper attribution in the recent PR. My intention was solely to replicate the models accurately since we are working on the same project and ensuring consistency is crucial. I wanted to avoid potential discrepancies that might arise from rewriting the models from scratch.

I understand that copying code without attribution is not professional, regardless of the circumstances. In the future, I will adhere to best practices by cherry-picking commits and making the necessary adjustments to maintain both integrity and clarity in our work. I appreciate your guidance on this matter and will ensure that all contributions are properly attributed moving forward.

Once again, I apologize for any inconvenience this may have caused and thank you for your patience and support.

@uniqueg
Copy link
Member

uniqueg commented Oct 13, 2024

Hi @SalihuDickson,

Thank you for bringing this to my attention and for your understanding.

I acknowledge that I copied your code without proper attribution in the recent PR. My intention was solely to replicate the models accurately since we are working on the same project and ensuring consistency is crucial. I wanted to avoid potential discrepancies that might arise from rewriting the models from scratch.

I understand that copying code without attribution is not professional, regardless of the circumstances. In the future, I will adhere to best practices by cherry-picking commits and making the necessary adjustments to maintain both integrity and clarity in our work. I appreciate your guidance on this matter and will ensure that all contributions are properly attributed moving forward.

Once again, I apologize for any inconvenience this may have caused and thank you for your patience and support.

Note that since we squash and merge commits in a PR anyways, you can just attribute co-authorship of a commit with co-authored-by directive in the commit message for the squash. Let's just remember to do that when we are good to go. It's of course less granular than keeping git blame info on a line by line basis, but IMO that's just another reason why we should keep PRs small and crunchy :)

@uniqueg uniqueg requested a review from SalihuDickson October 13, 2024 21:53
Copy link
Member

@uniqueg uniqueg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! :)

However, one major thing from looking at the nice summary that Sourcery provided: Are you sure you created the models for TES v1.1 as previously discussed? There are a few minor inconsistencies with the models, but maybe it's because you have used an older version?

Here's a list of the issues I found:

  • TESState (from tesState):
    • CANCELING is missing (added in v1.1)
    • PREEMPTED is missing (added in v1.1)
  • TESInput (from tesInput):
    • type is optional for creating tasks (requirement dropped in v1.1, I believe), but should be filled in once that information becomes available; probably not enforcable on the validation side
  • TESOutput (from tesOutput):
    • path_prefix is missing (added in v1.1); note that it is required if TESOutput.path contains wildcards; requires custom Pydantic validator
    • type as above
  • TESExecutor (from tesExecutor):
    • ignore_error is missing (added in v1.1)
  • TESData (from tesTask):
    • Only executors is required, inputs and outputs are not

Apart from that, a few minor comments:

  • Check the docstrings and make sure they are consistent with Google's docstring style and your other docstrings in the codebase (see individual comment for more info)
  • Address the open Sourcery comments

crategen/models/tes_models.py Outdated Show resolved Hide resolved
@Karanjot786
Copy link
Member Author

This looks really good! :)

However, one major thing from looking at the nice summary that Sourcery provided: Are you sure you created the models for TES v1.1 as previously discussed? There are a few minor inconsistencies with the models, but maybe it's because you have used an older version?

Here's a list of the issues I found:

* `TESState` (from `tesState`):
  
  * `CANCELING` is missing (added in v1.1)
  * `PREEMPTED` is missing (added in v1.1)

* `TESInput` (from `tesInput`):
  
  * `type` is optional for creating tasks (requirement dropped in v1.1, I believe), but should be filled in once that information becomes available; probably not enforcable on the validation side

* `TESOutput` (from `tesOutput`):
  
  * `path_prefix` is missing (added in v1.1); note that it is required if `TESOutput.path` contains wildcards; requires custom Pydantic validator
  * `type` as above

* `TESExecutor` (from `tesExecutor`):
  
  * `ignore_error` is missing (added in v1.1)

* `TESData` (from `tesTask`):
  
  * Only `executors` is required, `inputs` and `outputs` are not

Apart from that, a few minor comments:

* Check the docstrings and make sure they are consistent with Google's docstring style and your other docstrings in the codebase (see individual comment for more info)

* Address the open Sourcery comments

Thank you for your positive feedback and for taking the time to review my PR. I appreciate your thorough and detailed comments.

TES Version Alignment

You are correct; I inadvertently based the models on an older version of the TES schema. I apologize for this oversight. I will update the models to align with TES v1.1 as we talked about before. Specifically, I will address the following:

TESState:
Add the missing CANCELING and PREEMPTED states introduced in v1.1.

TESInput:
Update the type field to be optional during task creation, acknowledging that it may be filled in once the information becomes available.

TESOutput:
Add the path_prefix field, which is required if the path contains wildcards, and implement a custom Pydantic validator to handle this scenario.
Make the type field optional, consistent with v1.1.

TESExecutor:
Include the ignore_error field added in v1.1.

TESData:
Adjust the required fields so only executors are required, making inputs and outputs optional.

@Karanjot786
Copy link
Member Author

Objective

  • Introduce Pydantic models for Task Execution Services (TES) conforming to the GA4GH TES v1.1 schema.
  • Provide comprehensive unit tests to ensure model integrity and functionality.
  • Update docstrings to conform to Google style guidelines.
  • Resolve all test errors and ensure code quality.

Changes

Updated Models:

  • crategen/models/tes_models.py:
    • Added missing states CANCELING and PREEMPTED to TESState.
    • Made type field optional in TESInput and TESOutput.
    • Added path_prefix to TESOutput with appropriate validation.
    • Included ignore_error field in TESExecutor.
    • Adjusted TESData to make inputs and outputs optional, with only executors being required.

Updated Unit Tests:

  • tests/unit/test_tes_models.py:
    • Added tests for new fields and validators.
    • Refactored tests for better readability and coverage.
    • Ensured all tests pass successfully.

Documentation:

  • Revised docstrings in tes_models.py to align with Google style guidelines.
  • Removed emphasis and type information from docstrings.
  • Ensured consistency across all models.

Linting and Testing:

  • Resolved all linting issues using Ruff.
  • Confirmed that all unit tests pass.

@Karanjot786 Karanjot786 requested a review from uniqueg October 14, 2024 17:03
Co-authored-by: salihuDickson <[email protected]>
@SalihuDickson
Copy link

@uniqueg, I think this is just about ready to merge, maybe you want to do a quick read through of the recent changes.

@SalihuDickson
Copy link

SalihuDickson commented Oct 21, 2024

hi @uniqueg, just a quick reminder, please take a look at this whenever you have the time.

@Karanjot786
Copy link
Member Author

Hi @SalihuDickson, I've taken a look at the updates you've made, and they look great. I appreciate the example you've provided, and I will implement the same method in the WES models as well.

@SalihuDickson SalihuDickson dismissed uniqueg’s stale review October 25, 2024 22:27

the stated issues have been addressed

@SalihuDickson SalihuDickson self-requested a review October 25, 2024 22:27
@Karanjot786 Karanjot786 enabled auto-merge (squash) October 26, 2024 13:53
@Karanjot786 Karanjot786 disabled auto-merge October 26, 2024 13:53
@SalihuDickson
Copy link

SalihuDickson commented Nov 4, 2024

hey @uniqueg, just a gentle follow up on my earlier reminder. Need your approval to merge :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants