Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with using LiNGAM #95

Open
mikelynch opened this issue Jun 24, 2020 · 8 comments
Open

Bug with using LiNGAM #95

mikelynch opened this issue Jun 24, 2020 · 8 comments

Comments

@mikelynch
Copy link

mikelynch commented Jun 24, 2020

I'm using the latest version of the package (1.2.1) and have come across what I think must be a bug when trying to use LiNGAM. I originally noticed this when looking at my own data, but this minimal script demonstrates the problem:

import numpy as np
import pandas as pd
import pycausal.pycausal
import pycausal.search


if __name__ == "__main__":
    pc = pycausal.pycausal.pycausal()
    pc.start_vm()

    tetrad = pycausal.search.tetradrunner()

    rng = np.random.default_rng(1234)
    X = rng.normal(5, 1.5, 1000)
    Y = X + rng.uniform(-2, 2, 1000)

    df = pd.DataFrame({
        "X": X,
        "Y": Y,
    })
    df.to_csv("~/two_var.csv", index=False)

    tetrad.run(
        dfs=df, verbose=True, algoId="lingam",
        penaltyDiscount=1,
    )
    print(tetrad.getEdges())

    pc.stop_vm()

This gives the output:

graph Returning this graph: Graph Nodes:
X;Y

Graph Edges:
1. Y --> X

Graph Attributes:
BIC: -1110.654178

Graph Node Attributes:
BIC: [X: 155.157065;Y: -1265.811243]

['Y --> X']

I expected the discovered edge to be the opposite of this. When I run this file with causal-cmd, I see:

~$ java -jar causal-cmd-1.1.3-jar-with-dependencies.jar --algorithm lingam --penaltyDiscount 1 --data-type continuous --dataset ~/two_var.csv --delimiter comma --skip-latest
Jun 24, 2020 4:42:31 AM java.util.prefs.FileSystemPreferences$6 run
WARNING: Prefs file removed in background /home/ubuntu/.java/.userPrefs/prefs.xml
W = 
       V1       V2
  -0.3731  -0.9278
   0.9278  -0.3731

WTilde before normalization = 
       V1       V2
  -0.3731   0.9278
  -0.9278  -0.3731

WTilde after normalization = 
       V1       V2
   1.0000  -0.9278
   2.4867   1.0000

B = 
       V1       V2
            0.9278
  -2.4867         

BTilde = 
       V1       V2
            0.9278
  -2.4867         

graph Returning this graph: Graph Nodes:
X;Y

Graph Edges:
1. X --> Y

Graph Attributes:
BIC: -1110.654178

Graph Node Attributes:
BIC: [X: -849.681780;Y: -260.972398]

as expected. When I run the same file in the Tetrad GUI (explicitly setting penaltyDiscount = 1), the graph edges match the output from causal-cmd and the BIC values exactly match. I expected the same results from using py-causal – am I doing something wrong here? I've not noticed discrepencies between Tetrad and py-causal when using other algorithms.

@chirayukong
Copy link
Collaborator

I think it was because the py-causal code base was a bit out of date. Let me check about this. Will be back to you shortly.

@jdramsey
Copy link
Contributor

jdramsey commented Jun 25, 2020 via email

@mikelynch
Copy link
Author

Apologies for chasing up, but did you get anywhere looking into this?

@jdramsey
Copy link
Contributor

jdramsey commented Aug 10, 2020 via email

@jdramsey
Copy link
Contributor

jdramsey commented Aug 10, 2020 via email

@jdramsey
Copy link
Contributor

OK, I finally had a chance to look at LiNGAM. Sorry it took me so long, busy. Maybe I misunderstood; I had thought there was a problem in getting a good estimate of structure out of the algorithm. In one branch I was having that trouble. But a few days ago I made a fresh branch off of development, and when I tried it there I was getting estimates that were spot on. So now I need to inquire about whether this version has been pushed into publication or not. That could be the problem. We may just need to do another publication run. Unfortunately that's not immediate; we need to take stock of all the changes. But I think it will solve the problem.
Again, sorry for the delays. We're running on a skeleton crew at the moment.
@chirayukong What do you think? I think this was your suggestion anyway. Maybe we should wait a bit on this, as I have changes in this new branch I'd like to push, and I need to review them with someone.

@chirayukong
Copy link
Collaborator

@jdramsey Sorry for replying late. I'm working on the new release. Having a problem with the javabridge library in Python with the new jar file. Hope to fix it soon.

@jdramsey
Copy link
Contributor

jdramsey commented Aug 20, 2020

Thanks @chirayukong. A couple things I can say. (Sorry, this took me a long time too.) I had thought the problem was my translation of FastICA from the original into Java. That was't the problem. Details of the code aside, there's a simple check to see if FastICA is behaving itself. It's supposed to produce (for square matrix cases at least) a matrix W such that X = W^-1 S, where X is the data, arranged row-wise, W is the unmixing matrix, and S is a row-wise matrix of independent vectors, which for a linear model means that SS' = I. You can just check to see if it's doing that; it is, out to at least 4 decimal places, which should be good enough. So whatever instability there was was due to the rest of the algorithm.

I spend some time cleaning up the rest of the algorithm so it produces the right answer. Getting the causal order step to behave itself for the original LiNGAM algorithm was just a matter of following the specified algorithm without bugs, so that was not bad. The algorithm for this is Algorithm A in https://www.cs.helsinki.fi/group/neuroinf/lingam/JMLR06.pdf. There was a subsequent resampling pruning step in the algorithm, which I could not get to work. So far as I can tell it produces random adjacency results. So instead of doing that I use FGES with the causal order from the first step as knowledge. For toy cases, that produces very good results for linear, non-Gaussian models. In the future I may substitute some other adjacency step that is theoretically cleaner. The Wald method in the above article is rather difficult to implement in Java, and I didn't make a concerted effort to do that.

The version of LiNGAM in development I think is OK. It produces correct results for toy problems. I did some more work in a separate branch, but I did not change the algorithm from the above; I just pulled the pruning step code over in case I have an inspiration. Anyway I'll push those changes to development as soon as the other changes in that branch (on another algorithm) are mature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants