Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nnmf fails when noise leads to negative values #65

Open
chriswaudby opened this issue Jul 16, 2021 · 7 comments
Open

nnmf fails when noise leads to negative values #65

chriswaudby opened this issue Jul 16, 2021 · 7 comments
Labels

Comments

@chriswaudby
Copy link

I've got a dataset that physically is strictly positive, but measurement noise can lead to negative values. I'd like to factorise the result, but nnmf won't run when the data matrix contains negative values. I could set these values to zero as a quick workaround, but in general I feel like this would be introducing bias. Are there any solutions to this problem?

@ghost ghost added the question label Jul 16, 2021
@ghost
Copy link

ghost commented Jul 16, 2021

When a data matrix contains negative values, a simple solution is the unit range normalization.

@chriswaudby
Copy link
Author

In the presence of noise, wouldn't that effectively add a constant to the matrix, increasing its rank? It's more attractive than cutting off values to zero, but it doesn't feel like a fundamental solution?

@ghost
Copy link

ghost commented Jul 17, 2021

I'm not familiar with NMF with negative values. So I can't give you a more appropriate solution. Perhaps the concept of semi-NMF may be useful for you. Thanks.

スクリーンショット 2021-07-17 9 17 42

@chriswaudby
Copy link
Author

Thanks for the suggestions. Physically it's NMF that makes the most sense for my problem though. I've been looking at the code, and it seems that the objective is calculated with sqL2dist, which wouldn't actually cause any problems if negative numbers were passed through in the input, right? I'm not an expert so apologies if this is a silly question, but what then is the point of the initial check on the positivity of the input matrix?

@ghost
Copy link

ghost commented Jul 17, 2021

I don't know whether the negative values in a data matrix have any bad effects on the accuracy of the approximation (X ≒ WH) , but it is algorithmically possible to run NMF.

julia> using NMF

julia> X = randn(5, 4)
5×4 Matrix{Float64}:
  0.427738  0.668884   0.694885   0.29986
 -0.371153  0.652576   0.377217   1.53983
 -0.502157  1.23846    0.0284338  0.975191
 -0.191144  0.383284   0.638029   0.505248
 -0.478253  0.882959  -0.983354   0.0273086

julia> k = 2;

julia> W, H = NMF.nndsvd(X, k, variant=:ar);

julia> ret = NMF.solve!(NMF.GreedyCD{Float64}(maxiter=50), X, W, H);

julia> ret.W
5×2 Matrix{Float64}:
 0.510974  0.25225
 0.503171  0.767277
 0.954606  0.440184
 0.292119  0.339879
 0.679283  0.0

julia> ret.H
2×4 Matrix{Float64}:
 0.0  1.30007  0.0       0.0305617
 0.0  0.0      0.721782  1.89938

julia> ret.W * ret.H
5×4 Matrix{Float64}:
 0.0  0.6643    0.182069  0.494733
 0.0  0.654155  0.553807  1.47273
 0.0  1.24105   0.317717  0.86525
 0.0  0.379773  0.245319  0.654486
 0.0  0.883112  0.0       0.02076

@ghost
Copy link

ghost commented Jul 17, 2021

In general, NMF takes a non-negative matrix as input, so the high-level interface nnmf checks for non-negativity.

@chriswaudby
Copy link
Author

Thanks - that last example is really helpful.

Just to give you an idea where my problem is coming from, I'm working on chemical kinetics, where W and H represent the concentrations (as a function of time) and absorption spectra (as a function of frequency) of reaction components. These are both strictly positive, but the measurement process introduces gaussian noise that can lead to negative values in the observed data matrix X (i.e. X=WH+epsilon). To me, this seems like a pretty reasonable application of NMF, so I don't really understand why the input matrix X should be forced to be positive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant