BIC score for FCI #90

vlad43210 · 2019-09-20T19:58:10Z

Hi,

I have a similar issue to #86 -- I would like to get the BIC score for the FCI algorithm. I'm working with a team and they confirm that while FCI does not use the BIC for search, it does output it. However, when I run:

graph = tetrad.getTetradGraph() print('Graph BIC: {}'.format(graph.getAttribute('BIC')))

I get:

Graph BIC: None

when I choose the fci algorithm. Any chance you could surface this score for me? I was told by the same team that TETRAD does output it for FCI, so it should be in the source somewhere.

The text was updated successfully, but these errors were encountered:

jdramsey · 2019-09-20T21:07:48Z

The problem is, Bic scores only make sense if you gave a dag model or can get a dag model since you need to know what the parents are for every node. For a pag you don't necessarily know this. So usually Bic scores for fci outputs don't theoretically make sense.

…

On Fri, Sep 20, 2019, 3:58 PM vlad43210 ***@***.***> wrote: Hi, I have a similar issue to #86 <#86> -- I would like to get the BIC score for the FCI algorithm. I'm working with Ilya Shpitser's team at JHU and they confirm that while FCI does not use the BIC for search, it does output it. However, when I run: graph = tetrad.getTetradGraph() print('Graph BIC: {}'.format(graph.getAttribute('BIC'))) I get: Graph BIC: None when I choose the fci algorithm. Any chance you could surface this score for me? I was told by the same team that TETRAD does output it for FCI, so it should be in the source somewhere. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#90?email_source=notifications&email_token=ACLFSR6CNE2XFMMUVNOAJF3QKUTNHA5CNFSM4IY2YGBKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HMYKUGA>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACLFSRYKNLNOJQBLC23H2U3QKUTNHANCNFSM4IY2YGBA> .

vlad43210 · 2019-09-20T23:14:37Z

So my colleague says:

BIC scores make sense if you have a likelihood that comes from a curved exponential family
FCI may certainly be applicable to data that comes from such a model
and in general you have no choice but to use FCI, because the only way you can use model scoring is if you had a tractable local search procedure, and such procedures are only known to exist for undirected and directed acyclic graph models
but not, for example, for MAGs
even if MAGs form a curved exponential family and thus have a very much defined BIC score, and this score is even consistent
to summarize:
(a) the BIC score may be defined on any model with a fixed dimension
(b) the BIC score is further consistent if the model is also curved exponential
(c) the BIC score may further be used to have a computationally tractable model selection procedure if it corresponds to directed acyclic, undirected, or bidirected graphs

Our work is in the above three settings, and it would be very useful to get the BIC score for DAGs generated by pycausal in these cases! Especially since it's already a feature in TETRAD, but our solution is programmatic and we don't want to manually fire up TETRAD every time we want to calculate a BIC.

To give a little more clarity, we are interested in calculating BICs for DAGs generated at various levels of confidence (alpha value) to select the optimal alpha value for a particular DAG generation process.

Hope this helps!

Vlad

jdramsey · 2019-09-21T00:22:21Z

OK, ask Ilya how to calculate it. :)

…

On Fri, Sep 20, 2019 at 7:14 PM vlad43210 ***@***.***> wrote: So Prof. Shpitser says: BIC scores make sense if you have a likelihood that comes from a curved exponential family FCI may certainly be applicable to data that comes from such a model and in general you have no choice but to use FCI, because the only way you can use model scoring is if you had a tractable local search procedure, and such procedures are only known to exist for undirected and directed acyclic graph models but not, for example, for MAGs even if MAGs form a curved exponential family and thus have a very much defined BIC score, and this score is even consistent to summarize: (a) the BIC score may be defined on any model with a fixed dimension (b) the BIC score is further consistent if the model is also curved exponential (c) the BIC score may further be used to have a computationally tractable model selection procedure if it corresponds to directed acyclic, undirected, or bidirected graphs Our work is in the above three settings, and it would be very useful to get the BIC score for DAGs generated by pycausal in these cases! Especially since it's already a feature in TETRAD, but our solution is programmatic and we don't want to manually fire up TETRAD every time we want to calculate a BIC. To give a little more clarity, we are interested in calculating BICs for DAGs generated at various levels of confidence (alpha value) to select the optimal alpha value for a particular DAG generation process. Hope this helps! Vlad — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#90?email_source=notifications&email_token=ACLFSR74LHXMJZPSSDMGJDLQKVKN5A5CNFSM4IY2YGBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7ID2CI#issuecomment-533740809>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACLFSR6ILKJQVQIYMNEXZ7TQKVKN5ANCNFSM4IY2YGBA> .

-- Joseph D. Ramsey Special Faculty and Director of Research Computing Department of Philosophy 135 Baker Hall Carnegie Mellon University Pittsburgh, PA 15213 [email protected] Office: (412) 268-8063 http://www.andrew.cmu.edu/user/jdramsey

jdramsey · 2019-09-23T19:26:33Z

By the way, I was serious. If you or Ilya can say how to calculate BIC for a FCI output correctly, I'll run it by Peter Spirtes, and if he agrees, I'll code it up for you. On Fri, Sep 20, 2019 at 8:22 PM Joseph Ramsey <[email protected]> wrote:

…

OK, ask Ilya how to calculate it. :) On Fri, Sep 20, 2019 at 7:14 PM vlad43210 ***@***.***> wrote: > So Prof. Shpitser says: > > BIC scores make sense if you have a likelihood that comes from a curved > exponential family > FCI may certainly be applicable to data that comes from such a model > and in general you have no choice but to use FCI, because the only way > you can use model scoring is if you had a tractable local search procedure, > and such procedures are only known to exist for undirected and directed > acyclic graph models > but not, for example, for MAGs > even if MAGs form a curved exponential family and thus have a very much > defined BIC score, and this score is even consistent > to summarize: > (a) the BIC score may be defined on any model with a fixed dimension > (b) the BIC score is further consistent if the model is also curved > exponential > (c) the BIC score may further be used to have a computationally tractable > model selection procedure if it corresponds to directed acyclic, > undirected, or bidirected graphs > > Our work is in the above three settings, and it would be very useful to > get the BIC score for DAGs generated by pycausal in these cases! Especially > since it's already a feature in TETRAD, but our solution is programmatic > and we don't want to manually fire up TETRAD every time we want to > calculate a BIC. > > To give a little more clarity, we are interested in calculating BICs for > DAGs generated at various levels of confidence (alpha value) to select the > optimal alpha value for a particular DAG generation process. > > Hope this helps! > > Vlad > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#90?email_source=notifications&email_token=ACLFSR74LHXMJZPSSDMGJDLQKVKN5A5CNFSM4IY2YGBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7ID2CI#issuecomment-533740809>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ACLFSR6ILKJQVQIYMNEXZ7TQKVKN5ANCNFSM4IY2YGBA> > . > -- Joseph D. Ramsey Special Faculty and Director of Research Computing Department of Philosophy 135 Baker Hall Carnegie Mellon University Pittsburgh, PA 15213 ***@***.*** Office: (412) 268-8063 http://www.andrew.cmu.edu/user/jdramsey

-- Joseph D. Ramsey Special Faculty and Director of Research Computing Department of Philosophy 135 Baker Hall Carnegie Mellon University Pittsburgh, PA 15213 [email protected] Office: (412) 268-8063 http://www.andrew.cmu.edu/user/jdramsey

bja43 · 2019-09-23T19:39:41Z

Joe, isn't the iterative conditional fitting algorithm of Drton and Richarson implemented in TETRAD?

This can be used to calculate the maximum likelihood term of BIC for linear Gaussian data. Additionally, I believe the number of parameters is proportional to the number of edges. I would run this by Peter to be sure, but I think you can calculate BIC on a MAG (choose a MAG in the output PAG of FCI) and linear Gaussian data using these two quantities.

The only issue I can think of is that FCI doesn't guarantee a PAG output. That being said, if it does output a PAG, you should be able to calculate its BIC score.

bja43 · 2019-09-23T19:55:53Z

To be clear, this is what I suggest:

run FCI
if FCI does not return a PAG, then return none
if FCI does return a PAG, then sample a MAG from that PAG
calculate the MLE covariance matrix for the MAG using (R)ICF
use the MLE covariance matrix to calculate the log-likelihood term
BIC = log-likelihood - (num edges / 2) * log(num samples)

jdramsey · 2019-09-23T20:12:57Z

Thanks! I'll run that by Peter. Maybe i should get Ilya to agree as well? @vlad43210

vlad43210 · 2019-09-23T21:00:13Z

Can you say why model dimension is num_edges / 2, rather than num_edges or perhaps num_edges * 2 (depending on whether you standardize)? Vlad

…

On Mon, Sep 23, 2019 at 4:12 PM Joseph Ramsey ***@***.***> wrote: Thanks! I'll run that by Peter. Maybe i should get Ilya to agree as well? @vlad43210 <https://github.com/vlad43210> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#90?email_source=notifications&email_token=AAFIDHDQJGPSZG774VD4Y33QLEPMVA5CNFSM4IY2YGBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7MDXDY#issuecomment-534264719>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAFIDHDLNBH6QMKNIEEIMRTQLEPMVANCNFSM4IY2YGBA> .

bja43 · 2019-09-23T21:06:07Z

I am going off of the statement of BIC here: https://projecteuclid.org/download/pdf_1/euclid.aos/1176350709.

In our case, k is proportional to the number of edges and n is the number of samples so the parameter penalty in the criterion becomes (num edges / 2) log (num samples).

vlad43210 · 2019-09-23T22:11:41Z

Got it, thanks! Vlad

…

On Mon, Sep 23, 2019 at 5:06 PM Bryan Andrews ***@***.***> wrote: I am going off of the statement of BIC here: https://projecteuclid.org/download/pdf_1/euclid.aos/1176350709. [image: image] <https://user-images.githubusercontent.com/16494809/65462583-19875800-de24-11e9-878b-3d9b7da3e434.png> In our case, k is proportional to the number of edges and n is the number of samples so the parameter penalty in the criterion becomes (num edges / 2) log (num samples). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#90?email_source=notifications&email_token=AAFIDHHV3XPU6734VJ2EJCLQLEVUDA5CNFSM4IY2YGBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7MIRSQ#issuecomment-534284490>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAFIDHHD2272O7OC7UK3MDTQLEVUDANCNFSM4IY2YGBA> .

jdramsey · 2019-09-24T11:11:39Z

@bja43 @vlad43210 Of course all of our other BIC scores in Tetrad use the formula 2L - k ln n (higher is better).

jdramsey · 2019-09-25T18:18:09Z

@bja43 @vlad43210 Peter agrees with Bryan. Yay! He says we should use Jiji's method for getting a MAG from a PAG. Bryan, do we have that?

bja43 · 2019-09-25T18:23:17Z

There is a pagToMag function in SearchGraphUtils.java but I have never confirmed that it is Jiji's method. Perhaps Peter knows?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BIC score for FCI #90

BIC score for FCI #90

vlad43210 commented Sep 20, 2019 •

edited

Loading

jdramsey commented Sep 20, 2019 via email

vlad43210 commented Sep 20, 2019 •

edited

Loading

jdramsey commented Sep 21, 2019 via email

jdramsey commented Sep 23, 2019 via email

bja43 commented Sep 23, 2019

bja43 commented Sep 23, 2019

jdramsey commented Sep 23, 2019

vlad43210 commented Sep 23, 2019 via email

bja43 commented Sep 23, 2019

vlad43210 commented Sep 23, 2019 via email

jdramsey commented Sep 24, 2019

jdramsey commented Sep 25, 2019

bja43 commented Sep 25, 2019 •

edited

Loading

BIC score for FCI #90

BIC score for FCI #90

Comments

vlad43210 commented Sep 20, 2019 • edited Loading

jdramsey commented Sep 20, 2019 via email

vlad43210 commented Sep 20, 2019 • edited Loading

jdramsey commented Sep 21, 2019 via email

jdramsey commented Sep 23, 2019 via email

bja43 commented Sep 23, 2019

bja43 commented Sep 23, 2019

jdramsey commented Sep 23, 2019

vlad43210 commented Sep 23, 2019 via email

bja43 commented Sep 23, 2019

vlad43210 commented Sep 23, 2019 via email

jdramsey commented Sep 24, 2019

jdramsey commented Sep 25, 2019

bja43 commented Sep 25, 2019 • edited Loading

vlad43210 commented Sep 20, 2019 •

edited

Loading

vlad43210 commented Sep 20, 2019 •

edited

Loading

bja43 commented Sep 25, 2019 •

edited

Loading