Bug with using LiNGAM #95

mikelynch · 2020-06-24T04:50:44Z

I'm using the latest version of the package (1.2.1) and have come across what I think must be a bug when trying to use LiNGAM. I originally noticed this when looking at my own data, but this minimal script demonstrates the problem:

import numpy as np
import pandas as pd
import pycausal.pycausal
import pycausal.search


if __name__ == "__main__":
    pc = pycausal.pycausal.pycausal()
    pc.start_vm()

    tetrad = pycausal.search.tetradrunner()

    rng = np.random.default_rng(1234)
    X = rng.normal(5, 1.5, 1000)
    Y = X + rng.uniform(-2, 2, 1000)

    df = pd.DataFrame({
        "X": X,
        "Y": Y,
    })
    df.to_csv("~/two_var.csv", index=False)

    tetrad.run(
        dfs=df, verbose=True, algoId="lingam",
        penaltyDiscount=1,
    )
    print(tetrad.getEdges())

    pc.stop_vm()

This gives the output:

graph Returning this graph: Graph Nodes:
X;Y

Graph Edges:
1. Y --> X

Graph Attributes:
BIC: -1110.654178

Graph Node Attributes:
BIC: [X: 155.157065;Y: -1265.811243]

['Y --> X']

I expected the discovered edge to be the opposite of this. When I run this file with causal-cmd, I see:

~$ java -jar causal-cmd-1.1.3-jar-with-dependencies.jar --algorithm lingam --penaltyDiscount 1 --data-type continuous --dataset ~/two_var.csv --delimiter comma --skip-latest
Jun 24, 2020 4:42:31 AM java.util.prefs.FileSystemPreferences$6 run
WARNING: Prefs file removed in background /home/ubuntu/.java/.userPrefs/prefs.xml
W = 
       V1       V2
  -0.3731  -0.9278
   0.9278  -0.3731

WTilde before normalization = 
       V1       V2
  -0.3731   0.9278
  -0.9278  -0.3731

WTilde after normalization = 
       V1       V2
   1.0000  -0.9278
   2.4867   1.0000

B = 
       V1       V2
            0.9278
  -2.4867         

BTilde = 
       V1       V2
            0.9278
  -2.4867         

graph Returning this graph: Graph Nodes:
X;Y

Graph Edges:
1. X --> Y

Graph Attributes:
BIC: -1110.654178

Graph Node Attributes:
BIC: [X: -849.681780;Y: -260.972398]

as expected. When I run the same file in the Tetrad GUI (explicitly setting penaltyDiscount = 1), the graph edges match the output from causal-cmd and the BIC values exactly match. I expected the same results from using py-causal – am I doing something wrong here? I've not noticed discrepencies between Tetrad and py-causal when using other algorithms.

The text was updated successfully, but these errors were encountered:

chirayukong · 2020-06-24T17:39:27Z

I think it was because the py-causal code base was a bit out of date. Let me check about this. Will be back to you shortly.

jdramsey · 2020-06-25T00:16:11Z

Someone else reported this problem as well. If the fix doesn't work, let me know and I'll look at the code.

…

On Wed, Jun 24, 2020 at 1:39 PM Chirayu (Kong) Wongchokprasitti < ***@***.***> wrote: I think it was because the py-causal code base was a bit out of date. Let me check about this. Will be back to you shortly. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#95 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACLFSR4CEMQHJTF7LYQWDXTRYI255ANCNFSM4OGJI2YA> .

-- Joseph D. Ramsey Special Faculty and Director of Research Computing Department of Philosophy 135 Baker Hall Carnegie Mellon University Pittsburgh, PA 15213 [email protected] Office: (412) 268-8063 http://www.andrew.cmu.edu/user/jdramsey

mikelynch · 2020-08-10T14:24:46Z

Apologies for chasing up, but did you get anywhere looking into this?

jdramsey · 2020-08-10T15:25:26Z

Going back through the messages--this should have been removed from all Tetrad versions; somehow it didn't get removed there. The problem, so far as I can tell, is a problem with Fast ICA in Tetrad--I had translated that from the original and checked that it gave the same results for small problems, but apparently for larger problems it diverges. That's on my list of projects. Anyway, you shouldn't use it. What did you want to use it for? Maybe there's a workaround?

…

On Mon, Aug 10, 2020 at 10:25 AM Mike Lynch ***@***.***> wrote: Apologies for chasing up, but did you get anywhere looking into this? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#95 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACLFSRYIWSBN3HKVV4IMVLTR777L3ANCNFSM4OGJI2YA> .

-- Joseph D. Ramsey Special Faculty and Director of Research Computing Department of Philosophy 135 Baker Hall Carnegie Mellon University Pittsburgh, PA 15213 [email protected] Office: (412) 268-8063 http://www.andrew.cmu.edu/user/jdramsey

jdramsey · 2020-08-10T16:12:53Z

I'm going to poke around; maybe I can fix it. On Mon, Aug 10, 2020 at 11:25 AM Joseph Ramsey <[email protected]> wrote:

…

Going back through the messages--this should have been removed from all Tetrad versions; somehow it didn't get removed there. The problem, so far as I can tell, is a problem with Fast ICA in Tetrad--I had translated that from the original and checked that it gave the same results for small problems, but apparently for larger problems it diverges. That's on my list of projects. Anyway, you shouldn't use it. What did you want to use it for? Maybe there's a workaround? On Mon, Aug 10, 2020 at 10:25 AM Mike Lynch ***@***.***> wrote: > Apologies for chasing up, but did you get anywhere looking into this? > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#95 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ACLFSRYIWSBN3HKVV4IMVLTR777L3ANCNFSM4OGJI2YA> > . > -- Joseph D. Ramsey Special Faculty and Director of Research Computing Department of Philosophy 135 Baker Hall Carnegie Mellon University Pittsburgh, PA 15213 ***@***.*** Office: (412) 268-8063 http://www.andrew.cmu.edu/user/jdramsey

-- Joseph D. Ramsey Special Faculty and Director of Research Computing Department of Philosophy 135 Baker Hall Carnegie Mellon University Pittsburgh, PA 15213 [email protected] Office: (412) 268-8063 http://www.andrew.cmu.edu/user/jdramsey

jdramsey · 2020-08-18T07:51:42Z

OK, I finally had a chance to look at LiNGAM. Sorry it took me so long, busy. Maybe I misunderstood; I had thought there was a problem in getting a good estimate of structure out of the algorithm. In one branch I was having that trouble. But a few days ago I made a fresh branch off of development, and when I tried it there I was getting estimates that were spot on. So now I need to inquire about whether this version has been pushed into publication or not. That could be the problem. We may just need to do another publication run. Unfortunately that's not immediate; we need to take stock of all the changes. But I think it will solve the problem.
Again, sorry for the delays. We're running on a skeleton crew at the moment.
@chirayukong What do you think? I think this was your suggestion anyway. Maybe we should wait a bit on this, as I have changes in this new branch I'd like to push, and I need to review them with someone.

chirayukong · 2020-08-20T04:37:19Z

@jdramsey Sorry for replying late. I'm working on the new release. Having a problem with the javabridge library in Python with the new jar file. Hope to fix it soon.

jdramsey · 2020-08-20T06:57:30Z

Thanks @chirayukong. A couple things I can say. (Sorry, this took me a long time too.) I had thought the problem was my translation of FastICA from the original into Java. That was't the problem. Details of the code aside, there's a simple check to see if FastICA is behaving itself. It's supposed to produce (for square matrix cases at least) a matrix W such that X = W^-1 S, where X is the data, arranged row-wise, W is the unmixing matrix, and S is a row-wise matrix of independent vectors, which for a linear model means that SS' = I. You can just check to see if it's doing that; it is, out to at least 4 decimal places, which should be good enough. So whatever instability there was was due to the rest of the algorithm.

I spend some time cleaning up the rest of the algorithm so it produces the right answer. Getting the causal order step to behave itself for the original LiNGAM algorithm was just a matter of following the specified algorithm without bugs, so that was not bad. The algorithm for this is Algorithm A in https://www.cs.helsinki.fi/group/neuroinf/lingam/JMLR06.pdf. There was a subsequent resampling pruning step in the algorithm, which I could not get to work. So far as I can tell it produces random adjacency results. So instead of doing that I use FGES with the causal order from the first step as knowledge. For toy cases, that produces very good results for linear, non-Gaussian models. In the future I may substitute some other adjacency step that is theoretically cleaner. The Wald method in the above article is rather difficult to implement in Java, and I didn't make a concerted effort to do that.

The version of LiNGAM in development I think is OK. It produces correct results for toy problems. I did some more work in a separate branch, but I did not change the algorithm from the above; I just pulled the pruning step code over in case I have an inspiration. Anyway I'll push those changes to development as soon as the other changes in that branch (on another algorithm) are mature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug with using LiNGAM #95

Bug with using LiNGAM #95

mikelynch commented Jun 24, 2020 •

edited

Loading

chirayukong commented Jun 24, 2020

jdramsey commented Jun 25, 2020 via email

mikelynch commented Aug 10, 2020

jdramsey commented Aug 10, 2020 via email

jdramsey commented Aug 10, 2020 via email

jdramsey commented Aug 18, 2020

chirayukong commented Aug 20, 2020

jdramsey commented Aug 20, 2020 •

edited

Loading

Bug with using LiNGAM #95

Bug with using LiNGAM #95

Comments

mikelynch commented Jun 24, 2020 • edited Loading

chirayukong commented Jun 24, 2020

jdramsey commented Jun 25, 2020 via email

mikelynch commented Aug 10, 2020

jdramsey commented Aug 10, 2020 via email

jdramsey commented Aug 10, 2020 via email

jdramsey commented Aug 18, 2020

chirayukong commented Aug 20, 2020

jdramsey commented Aug 20, 2020 • edited Loading

mikelynch commented Jun 24, 2020 •

edited

Loading

jdramsey commented Aug 20, 2020 •

edited

Loading