-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: GATK utils for common IO files #39
Conversation
Thanks for the contribution! |
I see what you mean, however, as these command line arguments are triggered by output files, I thought we would assume that user knows what they are doing and won't request files a GATK subcommand can't produce. Shall I turn these values to |
It just feels a bit safer to have everything disabled by default so that each wrapper has to specifically activate the options it needs. |
Warning Rate limit exceeded@fgvieira has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 25 minutes and 15 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 Walkthrough📝 Walkthrough📝 Walkthrough📝 WalkthroughWalkthroughA new function named Changes
Sequence Diagram(s)sequenceDiagram
participant S as Snakemake
participant G as GATK Options Function
S->>G: Call get_gatk_opts(params)
G->>G: Initialize gatk_opts
G->>G: Check for argument file
alt Argument file provided
G->>G: Validate and append to gatk_opts
end
G->>G: Check for reference file
alt Reference file provided
G->>G: Validate and append to gatk_opts
end
G->>G: Check for BAM index
alt BAM index required
G->>G: Validate and append to gatk_opts
end
G->>G: Check for VCF index
alt VCF index required
G->>G: Validate and append to gatk_opts
end
G->>S: Return gatk_opts
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (2)
snakemake_wrapper_utils/gatk.py (2)
16-17
: Enhance function documentation.The current docstring is too brief. Consider adding:
- Parameter descriptions
- Return value description
- Usage example
- """Obtain gatk_opts from input, output, params""" + """ + Construct GATK command-line options from Snakemake inputs, outputs, and parameters. + + Args: + snakemake: Snakemake object containing workflow information + parse_arg_file (bool): Enable parsing of argument files + parse_bam_index (bool): Enable BAM index creation option + parse_bam_md5 (bool): Enable BAM MD5 creation option + parse_vcf_index (bool): Enable VCF index creation option + parse_vcf_md5 (bool): Enable VCF MD5 creation option + parse_ref (bool): Enable reference file parsing + parse_ref_dict (bool): Enable reference dictionary parsing + param_name (str): Name of the parameter containing extra options + + Returns: + str: Constructed GATK command-line options + + Example: + >>> opts = get_gatk_opts(snakemake, parse_ref=True, parse_bam_index=True) + """
1-99
: Well-structured implementation with good separation of concerns.The overall architecture is clean and maintainable:
- Each file type (BAM, VCF) has its own dedicated section
- Consistent pattern for handling conflicts with extra parameters
- Safe defaults as discussed in PR comments
- Clear separation between argument files, reference handling, and output options
Consider adding unit tests to verify the behavior of each option combination.
Would you like help creating comprehensive unit tests for this utility function?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
snakemake_wrapper_utils/gatk.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
snakemake_wrapper_utils/gatk.py (1)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
🔇 Additional comments (2)
snakemake_wrapper_utils/gatk.py (2)
1-3
: LGTM! Clean and minimal imports.
The imports are appropriate and well-organized.
5-15
: LGTM! Well-structured function signature with safe defaults.
The parameter naming is clear and the default values (False) align well with the safety considerations discussed in the PR comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
snakemake_wrapper_utils/gatk.py (4)
5-16
: Enhance function documentation.The current docstring is too brief. Consider adding:
- Parameter descriptions
- Return value description
- Usage examples
- """Obtain gatk_opts from input, output, params""" + """ + Construct GATK command-line options based on Snakemake inputs, outputs, and parameters. + + Args: + snakemake: Snakemake object containing workflow information + parse_arg_file (bool): Enable argument file parsing + parse_bam_index (bool): Enable BAM index creation option + parse_bam_md5 (bool): Enable BAM MD5 creation option + parse_vcf_index (bool): Enable VCF index creation option + parse_vcf_md5 (bool): Enable VCF MD5 creation option + parse_ref (bool): Enable reference file parsing + parse_ref_dict (bool): Enable reference dictionary parsing + param_name (str): Name of the parameter containing extra options + + Returns: + str: Constructed GATK command-line options + + Example: + >>> opts = get_gatk_opts(snakemake, parse_ref=True, parse_bam_index=True) + """
43-60
: Standardize command string spacing.The command string construction has inconsistent spacing. Some options have spaces on both sides while others don't.
- gatk_opts += f" --reference {ref} " - gatk_opts += f" --sequence-dictionary {ref_dict} " + gatk_opts += f" --reference {ref}" + gatk_opts += f" --sequence-dictionary {ref_dict}"
64-79
: Standardize command string spacing.Similar to the reference handling, the command string construction has inconsistent spacing.
- gatk_opts += " --create-output-bam-index " - gatk_opts += " --create-output-bam-md5 " + gatk_opts += " --create-output-bam-index" + gatk_opts += " --create-output-bam-md5"
1-99
: Consider refactoring for better maintainability.While the implementation is functionally correct, consider these improvements:
- Create a helper function for option string construction to ensure consistent spacing
- Use string constants for repeated option names
- Consider using a dictionary to map output types to their corresponding GATK options
Example helper function:
def add_option(opts: str, option: str, value: str = "") -> str: """Add an option to the command string with consistent spacing.""" if value: return f"{opts} {option} {value}".strip() return f"{opts} {option}".strip()This would make the code more maintainable and less prone to spacing inconsistencies.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
snakemake_wrapper_utils/gatk.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
snakemake_wrapper_utils/gatk.py (1)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
🔇 Additional comments (2)
snakemake_wrapper_utils/gatk.py (2)
25-38
: 🛠️ Refactor suggestion
Fix typo and improve argument file handling.
The error message contains the same typo as previously noted ("infered" should be "inferred").
Additionally, the join operation should handle type conversion:
- arg_file = " --argument_file ".join(arg_file)
+ arg_file = " --argument_file ".join(str(f) for f in arg_file)
83-98
: Standardize command string spacing and verify VCF options.
- Similar to other sections, standardize command string spacing.
- gatk_opts += " --create-output-variant-index "
- gatk_opts += " --create-output-variant-md5 "
+ gatk_opts += " --create-output-variant-index"
+ gatk_opts += " --create-output-variant-md5"
- Let's verify that these are the correct GATK options for VCF handling:
Utilities for bam/vcf indexes, for argument file(s) and MD5 files.
Summary by CodeRabbit