-
Notifications
You must be signed in to change notification settings - Fork 3
How to interpret SATAY data in order to have meaningful information from it? #27
Comments
To help you further along, the actual cells with transposons are converted into reads such that the reads follow a binomial distribution. Since we have the inverse problem, However, we cannot easily use the negative binomial distribution to invert reads to actual transposons, since Wessel mentioned today we don't know that probability, I thought you could try something else, namely finding the best fitting binomial distribution. Unfortunately Matlab's mle wants to have the probability parameter fixed, so I wrote a small script in Matlab using the generalized method of moments instead to fit simulated read data. This works reasonably well (run Reads_transposon_conversion_simulation_v2.m in the zip file). Two caveats can be that: (Updated to v2 to resolve first caveat: |
Very interesting, I will look into it! One thought I had about the probability is that we might be able to estimate the total number of cells during the SATAY experiment. I think Benoît also mentions a number in his paper, from this we know how much transpositions have taken place. So maybe we can have a good estimation of the probability of actually reading transposition. |
That sounds good, it would be reassuring to see if there is a reasonable match with the fitted estimate. |
I saw this paper that discusses normalization using various statistical approaches, for example the negative binomial distribution. Maybe it is useful. |
Did you could download the paper? I could not ... |
|
Interesting that those papers: "NORMALIZATION OF TRANSPOSON-MUTANT LIBRARY SEQUENCING DATASETS TO IMPROVE IDENTIFICATION OF CONDITIONALLY ESSENTIAL GENES" and "Statistical analysis of genetic interactions in Tn-Seq |
@Gregory94 you should watch and take a look at the repo from the same author (Michael A. DeJesus): https://github.com/mad-lab/tools |
Yes, indeed. But I think for many tools they created, it is optimized for their experimental setup which is different from ours. We should think whether we want to use a similar experimental approach as they had or change the tools they have and alter them for our approach. |
Yes they are optimized to the type of data they get and with the vision they have to analyze those datasets. However still can be useful in terms of how they implemented it and some parts of the statistical analyses could be just abstracted from their use to ours. It looks very organized at first look , and in general it is always of great benefit to have good examples of well organized and structure code from where we can learn, build and collaborate . |
Some additional comments on how to interpret the data were made In the meeting with Werner.
The text was updated successfully, but these errors were encountered: