Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gen3-DRS files not found in manifest not being saved into a log file #306

Open
imendes93 opened this issue Feb 9, 2022 · 0 comments
Open
Labels
enhancement New feature or request

Comments

@imendes93
Copy link
Contributor

imendes93 commented Feb 9, 2022

Problem

When using Gen3-DRS option, no information is provided if a file in the input.csv is not present in the manifest.json file.

Solution

Add a logging file with the file names in the input.csv that are not present in the manifest.json file

Implementation

This should be implemented in the filter_manifest.py helper script by comparing the file names the resulting filtered manifest with the original input.csv file

if len(reads_df[~reads_df['file_name'].isin(manifest_df['file_name'])])>0:
    print("The following file_name IDs where not found in manifest:")
    print(reads_df[~reads_df['file_name'].isin(manifest_df['file_name'])])
    reads_df[~reads_df['file_name'].isin(manifest_df['file_name'])].to_csv("not_found_GTEX_samples.txt", index=False)

This has been tried in tag Simplify-Gen3-DRS-7, corresponding to the failing run https://cloudos.lifebit.ai/public/jobs/6203e3cb91203701dcbcb686

@imendes93 imendes93 added the enhancement New feature or request label Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant