Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entropy on strand #301

Open
Ge0rges opened this issue Nov 18, 2024 · 5 comments
Open

Entropy on strand #301

Ge0rges opened this issue Nov 18, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@Ge0rges
Copy link

Ge0rges commented Nov 18, 2024

Hey @ArtRand,

I just wanted to write to ask for a feature request for entropy which would be to make it "strand aware" by allowing us to specify the strand for each region -, + or ..

It would also be convenient if the documentation specified the headers for the output files when --regions is specified.

Thanks!

@ArtRand ArtRand added the enhancement New feature or request label Nov 20, 2024
@ArtRand
Copy link
Contributor

ArtRand commented Nov 20, 2024

Hello @Ge0rges,

I agree that the BED file should direct the strand to use. I'll be sure to add it along with the multi-base work.

@Ge0rges
Copy link
Author

Ge0rges commented Nov 20, 2024

Also I've noticed sometimes the output regions.bed is empty with no error printed?

@ArtRand
Copy link
Contributor

ArtRand commented Nov 20, 2024

The final log will report 0 regions processed successfully in this case. There is always a bit of a balance to strike with respect to informing the user why something was ineligible to calculate a result and making the logs very verbose and hard to follow. Perhaps a better solution is to tabulate how many regions failed and their reasons?

@Ge0rges
Copy link
Author

Ge0rges commented Nov 26, 2024

Hey @ArtRand could you let me know what schema for the output is when --regions is specified?

@ArtRand
Copy link
Contributor

ArtRand commented Nov 27, 2024

Hello @Ge0rges,

The schema is:

col Name Description type
1 chrom chromosome of the region str
2 start 0-based start position of the region int
3 end 0-based end position of the region int
4 region_name name of the region from the input BED file str
5 mean_entropy average entropy of the passing windows included in the region float
6 strand strand of the region {+, -, . } str
7 median_entropy median entropy of the passing windows included in the region float
8 min_entropy minimum passing window entropy float
9 max_entropy maximum passing window entropy float
10 mean_num_reads average number of reads used in the passing windows' entropy calculation float
11 min_num_reads minimum number of reads used in the passing windows' entropy calculation int
12 max_num_reads minimum number of reads used in the passing windows' entropy calculation int
13 successful_window_count number of passing windows in the region int
14 failed_window_count number of failed windows in the region int

You can also pass the --header flag to get a header line in the output.

I'll add this to the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants