Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Code Refactor]: Orthogonalise the DNAm QC pipeline #257

Open
sof202 opened this issue Nov 19, 2024 · 1 comment
Open

[Code Refactor]: Orthogonalise the DNAm QC pipeline #257

sof202 opened this issue Nov 19, 2024 · 1 comment
Labels
DNAm DNA methylation array enhancement New feature or request important This is important refactor Code refactor

Comments

@sof202
Copy link
Collaborator

sof202 commented Nov 19, 2024

Path to affected file

array/DNAm/preprocessing/QC.rmd

Type of refactor

Code understandability improvement, Code readability improvement

Description of required code refactor

Currently, the DNAm QC pipeline is very interdependent and nonorthogonal in design. A great deal of bugs that have been identified in the past year or so have been due to this problem.

Many parts of the pipeline depend on other parts successful completion. As a result, a seemingly harmless change to one section of code in the pipeline inevitably causes problems in a different, completely unrelated part of the pipeline.

Therefore it would be beneficial to properly layout what each part of the pipeline requires and what each part of the pipeline outputs/contributes. Doing this will allow us to disentangle and compartmentalise each section of the pipeline. This will have the benefit of making the pipeline easier to understand and therefore easier to contribute to in the future (without fear of breaking everything).

Steps to completion

  1. An agreement of what the sections of the DNAm QC pipeline actually are. The blocks and headings provide a solid start (and it may be all we actually need), but it would be useful to actually have this written down. See [Documentation Fault]: Create contracts for each stage/section of DNAm QC pipeline #259.
  2. A set of 'contracts' (see design by contract) detailing the requirements and expectations of each section of the pipeline. Completion of this step results in the resolution of issue [Documentation Fault]: Create contracts for each stage/section of DNAm QC pipeline #259.
    a) If it feels like any contract is exceptionally long/complex, this implies that some abstraction or
    further compartmentalisation is required.
  3. Using these contracts create a UML diagram (or similar) to help determine the processes/steps of the pipeline that each section actually depends on.
    a) Add this UML diagram (or similar) to the documentation to help users understand what the pipeline
    actually does (mermaid may be useful for this). This step is given explicitly in [Documentation Fault]: Create a UML diagram explaining the steps for the DNAm QC pipeline #258 and
    should be completed separately to the other steps listed
  4. Reflect the diagram from 3) in the actual code. Ensure that each section does indeed receive the information it requires and does indeed output the information it is obligated to (with ideally no side effects) i.e. each section adheres to its proposed contract.
    a) It may be useful to move some processing to separate files, either rmarkdown children or new R
    scripts. This can help with navigation around the code base (though might be better suited to
    being asked for in a further issue)
@sof202
Copy link
Collaborator Author

sof202 commented Nov 19, 2024

Extra information

'Contracts' can be enforced in R code using the stopifnot() function
This page might prove useful also.

@sof202 sof202 modified the milestone: Agree on the sections of DNAm QC pipeline Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DNAm DNA methylation array enhancement New feature or request important This is important refactor Code refactor
Projects
None yet
Development

No branches or pull requests

1 participant