Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add package: 2025_Lazaridis_Yamnaya #66

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Kavlahkaff
Copy link
Contributor

@Kavlahkaff Kavlahkaff commented Feb 26, 2025

Linked to #65

PR Checklist

  • Add the appropriate label to your PR (new package or package update).
  • The PR title is in the format Add/update package: {package_name}.
  • The PR description includes a link to the issue requesting the package its
    update. (Add to Linked to #XXX above.)

If adding or updating a package:

SSF file Todo list

  • This PR contains a sequencingSourceFile (.ssf) for the requested
    package.
  • The name of the .ssf file(s) matches the package name (i.e.
    packages/2023_my_package/2023_my_package.ssf).
  • The .ssf file MUST contain a new line at the end of the file.
    A check for this exists in the CI. This check should pass before
    you continue with this list.
  • I confirm that the poseidon_IDs, udg, and library_built are filled
    and correct.
  • I made sure to leave notes where necessary to explain any special
    cases/judgement calls made for data entries.

Recipe creation and validation

  • Comment @delphis-bot create recipe to this pull request to awaken
    Poseidon's trusty helper. (This should be repeated whenever changes are
    made to the SSF file contents).

After a few second, Delphis-bot will add a number of files to the PR.
Using the 'Files changed' tab, check that all of the following files were added:

  • The file packages/{package_name}/{package_name}.tsv was added to the PR.
  • The file packages/{package_name}/{package_name}.tsv_patch.sh was added
    to the PR from template.
  • The file packages/{package_name}/script_versions.txt was added to the
    PR.
  • The file packages/{package_name}/{package_name}.config was added to the
    PR from template.

Additional configuration

Additional configuration may be required when processing the data through nf-core/eager.
If you think this may be the case here, please either leave a comment about it in the PR or
add the relevant parameters within the params section at the end of the package config file.
For example, if the published data from the paper have internal barcodes, please mention that
in a comment, or provide the relevant nf-core/eager parameters in the params section.

  • If any nf-core/eager parameters need to be altered from their defaults, I
    have commented so in this PR (or added the relevant parameters within the
    params section at the end of the package config file).

@Kavlahkaff Kavlahkaff self-assigned this Feb 26, 2025
@Kavlahkaff
Copy link
Contributor Author

@delphis-bot create recipe

@Kavlahkaff
Copy link
Contributor Author

Hi @TCLamnidis, could you have a look at this? Just to double check I made the correct entries for udg and library. I have startet with the janno but will need some time for it since its quite large.

@TCLamnidis
Copy link
Member

Hi @Kavlahkaff
After having a look, I don't think the UDG/Strandedness info is quite correct. If you look at column AK of OnlineTable1(inds.new.data) in the Supplementary Tables file, you can see the attributes for the different libraries merged for each individual. They are encoded as follows:

minus=no.damage.correction,
half=damage.retained.at.last.position,
plus=damage.fully.corrected,
ds=double.stranded.library.preparation,
ss=single.stranded.library.preparation

While you have correctly marked mixes of ds/ssDNA libraries as ds, any IDs that only contain ss.* entries should be marked as ss in the SSF.
Also, the UDG column should only be minus in cases where at least one of the libraries has received NO UDG-treament (i.e. minus). When combining multiple libraries, the column should contain the value of the least treated library. (So half if no minus libraries and at least 1 half library got combined.)

Also, note that ss.USER in this case would be equivalent to ss.half, since in the paper they state:

We treated DNA extracts with USER (NEB) during library preparation to cut DNA at uracils; this treatment is inefficient at terminal uracils and leaves a damage pattern expected for ancient DNA at the terminal bases that can be filtered out for downstream analysis while allowing a library to be authenticated as old.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants