Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Division issue #2

Open
sinclaircooper opened this issue Dec 19, 2017 · 1 comment
Open

Division issue #2

sinclaircooper opened this issue Dec 19, 2017 · 1 comment

Comments

@sinclaircooper
Copy link

Hi,
I'm getting some division errors when trying to run PANDA.

/path/to/.local/lib/python2.7/site-packages/numpy/lib/function_base.py:3167: RuntimeWarning: invalid value encountered in true_divide
c /= stddev[:, None]
/path/to/.local/lib/python2.7/site-packages/numpy/lib/function_base.py:3168: RuntimeWarning: invalid value encountered in true_divide
c /= stddev[None, :]

This appears to be related to np.corrcoef(self.expression_matrix), i.e. there is something in my input counts matrix that means numpy cannot generate a proper correlation matrix.
I'm supplying a matrix of tissue aware normalised counts (using YARN).

Does PANDA expect normalised counts, TPMs, log2 counts?

Cheers

@mararie
Copy link

mararie commented Dec 19, 2017

Hi! PANDA starts by generating a gene co-expression (correlation) matrix from the expression data. This can be done on many different data types. We prefer to use normalized counts, but TPMs and log2 counts will work too.

The issue you're having can happen if your input data includes genes that do not show any variation in expression. In principle, YARN should filter out genes that are not expressed across a certain percentage of samples (depending on the thresholds you're using), so that is not likely to happen. (It is still possible that a specific gene has the same non-zero count in all samples, but this is rather unlikely.) However, it may be that you're making your network on a subset of all samples, in which one or more genes are just not expressed.

The easiest option is to filter out these genes before running PANDA. Another workaround is changing the PyPanda code to change correlations that return NA to 0 (this is what we did for the MATLAB code we used to run networks on GTEx data).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants