-
Notifications
You must be signed in to change notification settings - Fork 1
/
svarrfci.py
2241 lines (1635 loc) · 101 KB
/
svarrfci.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
import numpy as np
from itertools import product, combinations
import os
class SVARRFCI():
r"""
This class implements an adapation of the RFCI algorithm to stationary time series with the assumption of no selection variables. The RFCI algorithm was introduced in:
Colombo, D., Maathuis, M. H., Kalisch, M., and Richardson, T. S. (2012). Learning high-dimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, 40:294–321.
We note the following:
1) The algorithm is fully order-independence. This is achieved by two things. First, we use the majority rule to decide whether a given node is in a given separating set. Since the unshielded triple rule (given in Lemma 3.1 in the above reference) demands minimal separating set, the majority vote is restricted to separating sets of minimal cardinality (this implies minimality). Second, we apply an orientation rule to the entire graph and resolve potential conflicts among the proposed orientations by means of the conflict mark 'x' before modifing the graph. This also applies to the discriminating path rule (given in Lemma 3.2 in the above reference)
2) Several control parameters apply modifications, see below.
Parameters passed to the constructor:
- dataframe:
Tigramite dataframe object that contains the the time series dataset \bold{X}
- cond_ind_test:
A conditional independence test object that specifies which conditional independence test CI is to be used
Parameters passed to self.run_svarrfci():
- tau_max:
The maximum considered time lag tau_max
- pc_alpha:
The significance level \alpha of conditional independence tests
- max_cond_px:
Consider a pair of variables (X^i_{t-\tau}, X^j_t) with \tau > 0. In the edge removal phase (here this is self._run_pc_removal_phase()), the algorithm does not test for conditional independence given subsets of X^i_{t-\tau} of cardinality higher than max_cond_px.
- max_p_global:
Restricts all conditional independence tests to conditioning sets with cardinality smaller or equal to max_p_global
- max_q_global:
For each ordered pair (X^i_{t-\tau}, X^j_t) of adjacent variables and for each cardinality of the conditioning sets test at most max_q_global many conditioning sets (when summing over all tested cardinalities more than max_q_global tests may be made)
- fix_all_edges_before_final_orientation (will be removed)
- verbosity:
Controls the verbose output self.run_svarrfci() and the function it calls.
Return value of self.run_svarrfci():
The estimated graphin form of a link matrix. This is a numpy array of shape (self.N, self.N, self.tau_max + 1), where the entry array[i, j, \tau] is a string that visualizes the estimated link from X^i_{i-\tau} to X^j_t. For example, if array[0, 2, 1] = 'o->', then the estimated graph contains the link X^i_{t-1} o-> X^j_t. This numpy array is also saved as instance attribute self.graph. Note that self.N is the number of observed time series and self.tau_max the maximal considered time lag.
A note on middle marks:
Even if both max_p_global and max_q_global are np.inf, RFCI does not guarantee that all edges that remain in its graph after convergence (this is an RFCI-PAG) are also in the PAG. However, it does guarantee this for all edges that have a tail. We use the middle marks that we introduced for LPCMCI to explicate this distinction. In particular, we use the middle marks '?' and '' (empty). For convenience (to have strings of the same lengths) we here internally denote the empty middle mark by '-'. For post-processing purposes all middle marks are nevertheless set to the empty middle mark (here '-') in line 80, but if verbosity >= 1 a graph with the middle marks will be printed out before.
A note on wildcards:
The middle mark wildcard \ast and the edge mark wildcard are here represented as *, the edge mark wildcard \star as +
"""
def __init__(self, dataframe, cond_ind_test):
"""Class constructor. Store:
i) data
ii) conditional independence test object
iii) some instance attributes"""
# Save the time series data that the algorithm operates on
self.dataframe = dataframe
# Set the conditional independence test to be used
self.cond_ind_test = cond_ind_test
self.cond_ind_test.set_dataframe(self.dataframe)
# Store the shape of the data in the T and N variables
self.T, self.N = self.dataframe.values.shape
def run_svarrfci(self,
tau_max = 1,
pc_alpha = 0.05,
max_cond_px = 0,
max_p_global = np.inf,
max_q_global = np.inf,
fix_all_edges_before_final_orientation = True,
verbosity = 0):
"""Run an adaption of the RFCI algorithm to stationary time series without selection variables on the dataset and with the conditional independence test passed to the class constructor and with the options passed to this function."""
# Step 0: Intializations
self._initialize(tau_max, pc_alpha, max_cond_px, max_p_global, max_q_global, fix_all_edges_before_final_orientation, verbosity)
# Step 1: PC removal phase
self._run_pc_removal_phase()
# Step 2: RFCI orientation phase
self._run_rfci_orientation_phase()
# Post processing
self._fix_all_edges()
self.graph = self._dict2graph()
self.val_min_matrix = self._dict_to_matrix(self.val_min, self.tau_max, self.N, default = 0)
self.cardinality_matrix = self._dict_to_matrix(self.max_cardinality, self.tau_max, self.N, default = 0)
# Return the estimated graph
return self.graph
def _initialize(self,
tau_max,
pc_alpha,
max_cond_px,
max_p_global,
max_q_global,
fix_all_edges_before_final_orientation,
verbosity):
"""Function for
i) saving the arguments passed to self.run_svarrfci() as instance attributes
ii) initializing various memory variables for storing the current graph, sepsets etc.
"""
# Save the arguments passed to self.run_svarrfci()
self.tau_max = tau_max
self.pc_alpha = pc_alpha
self.max_cond_px = max_cond_px
self.max_p_global = max_p_global
self.max_q_global = max_q_global
self.fix_all_edges_before_final_orientation = fix_all_edges_before_final_orientation
self.verbosity = verbosity
# Initialize the nested dictionary for storing the current graph.
# Syntax: self.graph_dict[j][(i, -tau)] gives the string representing the link from X^i_{t-tau} to X^j_t
self.graph_dict = {}
for j in range(self.N):
self.graph_dict[j] = {(i, 0): "o?o" for i in range(self.N) if j != i}
self.graph_dict[j].update({(i, -tau): "o?>" for i in range(self.N) for tau in range(1, self.tau_max + 1)})
# Initialize the nested dictionary for storing separating sets
# Syntax: self.sepsets[j][(i, -tau)] stores separating sets of X^i_{t-tau} to X^j_t. For tau = 0, i < j.
self.sepsets = {j: {(i, -tau): set() for i in range(self.N) for tau in range(self.tau_max + 1) if (tau > 0 or i < j)} for j in range(self.N)}
# Initialize dictionaries for storing known ancestorships, non-ancestorships, and ambiguous ancestorships
# Syntax: self.def_ancs[j] contains the set of all known ancestors of X^j_t. Equivalently for the others
self.def_ancs = {j: set() for j in range(self.N)}
self.def_non_ancs = {j: set() for j in range(self.N)}
self.ambiguous_ancestorships = {j: set() for j in range(self.N)}
# Initialize nested dictionaries for saving the minimum test statistic among all conditional independence tests of a given pair of variables, the maximum p-values, as well as the maximal cardinality of the known separating sets.
# Syntax: As for self.sepsets
self.val_min = {j: {(i, -tau): float("inf") for i in range(self.N) for tau in
range(self.tau_max + 1) if (tau > 0 or i < j)} for j in range(self.N)}
self.pval_max = {j: {(i, -tau): 0 for i in range(self.N) for tau in
range(self.tau_max + 1) if (tau > 0 or i < j)} for j in range(self.N)}
self.max_cardinality = {j: {(i, -tau): 0 for i in range(self.N) for tau in
range(self.tau_max + 1) if (tau > 0 or i < j)} for j in range(self.N)}
# Return
return True
def _run_pc_removal_phase(self):
"""Run the removal phase of the RFCI algorithm adapted to time series. This is essentially the skeleton phase of the PC algorithm"""
# Verbose output
if self.verbosity >= 1:
print("\n=======================================================")
print("=======================================================")
print("Starting removal phase")
# Remember all edges that are fully tested, even for finite max_p_global and max_q_global. Remember all edges that have not been fully tested
self._can_fix = set()
self._cannot_fix = set()
# Iterate until convergence
# p_pc is the cardinality of the conditioning set
p_pc = 0
while True:
##########################################################################################################
### Run the next removal iteration #######################################################################
# Verbose output
if self.verbosity >= 1:
if p_pc == 0:
print("\nStarting test phase\n")
print("p = {}".format(p_pc))
# Variable to check for convergence
has_converged = True
# Variable for keeping track of edges marked for removal
to_remove = {j: {} for j in range(self.N)}
# Iterate through all links
for (i, j, lag_i) in product(range(self.N), range(self.N), range(-self.tau_max, 1)):
# Decode the triple (i, j, lag_i) into pairs of variables (X, Y)
X = (i, lag_i)
Y = (j, 0)
######################################################################################################
### Exclusion of links ###############################################################################
# Exclude the current link if ...
# ... X = Y
if lag_i == 0 and i == j:
continue
# ... X > Y (so, in fact, we don't distinguish between both directions of the same edge)
if self._is_smaller(Y, X):
continue
# Get the current link from X to Y
link = self._get_link(X, Y)
# Also exclude the current link if ...
# ... X and Y are not adjacent anymore
if link == "":
continue
######################################################################################################
### Preparation of PC search sets ####################################################################
# Search for separating sets in the non-future adjacencies of X, without X and Y themselves
S_search_YX = self._get_non_future_adj([Y]).difference({X, Y})
# Search for separating sets in the non-future adjacencies of Y, without X and Y themselves, always if X and Y are contemporaneous or if specified by self.max_cond_px
test_X = True if (lag_i == 0 or (self.max_cond_px > 0 and self.max_cond_px >= p_pc)) else False
if test_X:
S_search_XY = self._get_non_future_adj([X]).difference({X, Y})
######################################################################################################
### Check whether the link needs testing #############################################################
# If there are less than p_pc elements in both search sets, the link does not need further testing. If the pair (X, Y) has been fully tested, i.e., it has not been added to self._cannot_fix, we add it to self._can_fix. Then, later, in case one edge mark is set to a tail, we know that the link is part of the True MAG
if len(S_search_YX) < p_pc and (not test_X or len(S_search_XY) < p_pc):
if (X, Y) not in self._cannot_fix:
self._can_fix.add((X, Y))
continue
# Force-quit while leep when p_pc exceeds the specified limits
if p_pc > self.max_p_global:
continue
# Since this link does need testing, the algorithm has not converged yet
has_converged = False
######################################################################################################
### Tests for conditional independence ###############################################################
# If self.max_q_global is finite, the below for loop may be broken earlier. To still guarantee order independence, the set from which the potential separating sets are created is ordered in an order independent way. Here, the elements of S_search_YX are ordered according to their minimal test statistic with Y
if not np.isinf(self.max_q_global):
S_search_YX = self._sort_search_set(S_search_YX, Y)
# q_count counts the number of conditional independence tests made for subsets of S_search_YX
q_count = 0
# Run through all cardinality p_pc subsets of S_search_YX
for Z in combinations(S_search_YX, p_pc):
# Stop testing if the number of tests exceeds the bound specified by self.max_q_global
q_count = q_count + 1
if q_count > self.max_q_global:
self._cannot_fix.add((X, Y))
break
# Test conditional independence of X and Y given Z. Correspondingly updateself.val_min, self.pval_max, and self.cardinality
val, pval = self.cond_ind_test.run_test(X = [X], Y = [Y], Z = list(Z), tau_max = self.tau_max)
if self.verbosity >= 2:
print(" %s _|_ %s | S_pc = %s: val = %.2f / pval = % .4f" %
(X, Y, ' '.join([str(z) for z in list(Z)]), val, pval))
self._update_val_min(X, Y, val)
self._update_pval_max(X, Y, pval)
self._update_cardinality(X, Y, len(Z))
# Check whether the test result was significant
if pval > self.pc_alpha:
# Mark the edge from X to Y for removal, save Z as minimal separating set
to_remove[Y[0]][X] = True
self._save_sepset(X, Y, (frozenset(Z), ""))
# Verbose output
if self.verbosity >= 1:
print("({},{:2}) {:11} {} given {}".format(X[0], X[1], "independent", Y, Z))
# Break the for loop
break
# Run through all cardinality p_pc subsets of S_search_XY
if test_X:
if not np.isinf(self.max_q_global):
S_search_XY = self._sort_search_set(S_search_XY, X)
q_count = 0
for Z in combinations(S_search_XY, p_pc):
q_count = q_count + 1
if q_count > self.max_q_global:
self._cannot_fix.add((X, Y))
break
val, pval = self.cond_ind_test.run_test(X = [X], Y = [Y], Z = list(Z), tau_max = self.tau_max)
if self.verbosity >= 2:
print(" %s _|_ %s | S_pc = %s: val = %.2f / pval = % .4f" %
(X, Y, ' '.join([str(z) for z in list(Z)]), val, pval))
self._update_val_min(X, Y, val)
self._update_pval_max(X, Y, pval)
self._update_cardinality(X, Y, len(Z))
if pval > self.pc_alpha:
to_remove[Y[0]][X] = True
self._save_sepset(X, Y, (frozenset(Z), ""))
if self.verbosity >= 1:
print("({},{:2}) {:11} {} given {}".format(X[0], X[1], "independent", Y, Z))
break
# end for (i, j, lag_i) in product(range(self.N), range(self.N), range(-self.tau_max, 1))
##########################################################################################################
### Remove edges marked for removal in to_remove #########################################################
# Remove edges
for j in range(self.N):
for (i, lag_i) in to_remove[j].keys():
self._write_link((i, lag_i), (j, 0), "", verbosity = self.verbosity)
# Verbose output
if self.verbosity >= 1:
print("\nTest phase complete")
##########################################################################################################
### Check for convergence ################################################################################
if has_converged:
# If no link needed testing, this algorithm has converged. Therfore, break the while loop
break
else:
# At least one link needed testing, this algorithm has not yet converged. Therefore, increase p_pc
p_pc = p_pc + 1
# end while True
# Verbose output
if self.verbosity >= 1:
print("\nRemoval phase complete")
print("\nGraph:\n--------------------------------")
self._print_graph_dict()
print("--------------------------------")
# Return
return True
def _run_rfci_orientation_phase(self):
"""Run the orientation phase of the RFCI algorithm: Steps 2 and 3 of algorithm 3.2 in the RFCI paper"""
# Verbose output
if self.verbosity >= 1:
print("\n=======================================================")
print("=======================================================")
print("Starting RFCI orientation phase")
# Run the RFCI unshielded triple rule
M = set(self._find_triples(pattern_ij='***', pattern_jk='***', pattern_ik=''))
self._run_rfci_utr_rule(M)
# Remember whether the middle marks of all links are put to '-' by force. This is done once in the last iteration of the while loop in case self.fix_all_edges_before_final_orientations is True
fixed_all = False
# Run until convergence
changed = True
while changed:
# Remember the current graph
old_graph_dict = {}
for j in range(self.N):
old_graph_dict[j] = {k: v for (k, v) in self.graph_dict[j].items()}
# Run Rules 1 - 3
self._run_orientation_phase(rule_list = [["R-01"], ["R-02"], ["R-03"]])
# Run the RFCI discriminating path rule
self._run_rfci_dpr_rule()
# Run Rules 8 - 10
self._run_orientation_phase(rule_list = [["R-08"], ["R-09"], ["R-10"]])
# Check whether there was a change
changed = False
for j in range(self.N):
for (k, v)in self.graph_dict[j].items():
if v != old_graph_dict[j][k]:
changed = True
# In case the corresonponding option is chosen and graph does not change anymore, set all middle marks to '-'
if not changed and self.fix_all_edges_before_final_orientation and not fixed_all:
self._fix_all_edges()
changed = True
fixed_all = True
# Fix all edges that have a tail
self._fix_edges_with_tail()
# Verbose output
if self.verbosity >= 1:
print("\nRFCI orientation phase complete")
print("\nFinal graph:\n--------------------------------")
print("--------------------------------")
self._print_graph_dict()
print("--------------------------------")
print("--------------------------------\n")
# Return True
return True
def _run_rfci_utr_rule(self, M):
"""Run the RFCI unshielded triple rule: Algorithm 4.4 of the RFCI supplement paper"""
# Verbose output
if self.verbosity >= 1:
print("\nStarting RFCI UTR-Rule:")
# Take care that not both (A, B, C) and (C, B, A) appear in M
M_unique = set()
for (A, B, C) in M:
if not (C, B, A) in M_unique:
M_unique.add((A, B, C))
M = M_unique
# Make a list of triples that will bee tested for orientation ('L' in RFCI paper)
L = set()
# Run as long as there are unshielded triples in M
while len(M) > 0:
# Remember all unshielded triples
old_unshielded_triples = set(self._find_triples(pattern_ij='***', pattern_jk='***', pattern_ik=''))
# Make a list of edges that are marked for removal
to_remove = set()
# Run through all unshielded triples in M
for (A, B, C) in M:
# Unpack A, B, C
(i, lag_i) = A
(j, lag_j) = B
(k, lag_k) = C
# Get all minimal separating sets in SepSet(A, C)
sepsets = self._get_sepsets(A, C)
sepsets = {Z for (Z, status) in sepsets if status == "m"}
###############################################################################################################
###############################################################################################################
remove_AB = False
link_AB = self._get_link(A, B)
# Test A indep B given union(SepSet(A, C), intersection(def-anc(B), adj(B))) setminus{A, B} setminus{future of both A and B}
# Shift the lags appropriately
if lag_i <= lag_j:
X = (i, lag_i - lag_j) # A shifted
Y = (j, 0) # B shifted
delta_lag = lag_j
else:
X = (j, lag_j - lag_i) # B shifted
Y = (i, 0) # A shifted
delta_lag = lag_i
# Run through all minimal separating sets of A and C
for Z in sepsets:
# Construct the conditioning set to test
# Take out future elements
Z_test = {(var, lag - delta_lag) for (var, lag) in Z if lag - delta_lag <= 0 and lag - delta_lag >= -self.tau_max}
# Test conditional independence of X and Y given Z,
val, pval = self.cond_ind_test.run_test(X = [X], Y = [Y], Z = list(Z_test), tau_max = self.tau_max)
if self.verbosity >= 2:
print("UTR: %s _|_ %s | Z_test = %s: val = %.2f / pval = % .4f" %
(X, Y, ' '.join([str(z) for z in list(Z_test)]), val, pval))
# Update val_min and pval_max
self._update_val_min(X, Y, val)
self._update_pval_max(X, Y, pval)
self._update_cardinality(X, Y, len(Z_test))
# Check whether the test result was significant
if pval > self.pc_alpha:
# Mark the edge from X to Y for removal
remove_AB = True
# Store Z as a non-weakly-minimal separating set of X and Y
self._save_sepset(X, Y, (frozenset(Z_test), "nm"))
if remove_AB:
# Remember the edge for removal
pair_key, new_link = self._get_pair_key_and_new_link(A, B, "")
to_remove.add((X, Y))
###############################################################################################################
###############################################################################################################
remove_CB = False
link_CB = self._get_link(C, B)
# Test C indep B given union(SepSet(A, C), intersection(def-anc(B), adj(B))) setminus{A, B} setminus{future of both C and B}
# Shift the lags appropriately
if lag_k <= lag_j:
X = (k, lag_k - lag_j)
Y = (j, 0)
delta_lag = lag_j
else:
X = (j, lag_j - lag_k)
Y = (k, 0)
delta_lag = lag_k
# Run through all minimal separating sets of A and C
for Z in sepsets:
# Construct the conditioning set to test
# Take out future elements
Z_test = {(var, lag - delta_lag) for (var, lag) in Z if lag - delta_lag <= 0 and lag - delta_lag >= -self.tau_max}
# Test conditional independence of X and Y given Z,
val, pval = self.cond_ind_test.run_test(X = [X], Y = [Y], Z = list(Z_test), tau_max = self.tau_max)
if self.verbosity >= 2:
print("UTR: %s _|_ %s | Z_test = %s: val = %.2f / pval = % .4f" %
(X, Y, ' '.join([str(z) for z in list(Z_test)]), val, pval))
# Update val_min and pval_max
self._update_val_min(X, Y, val)
self._update_pval_max(X, Y, pval)
self._update_cardinality(X, Y, len(Z_test))
# Check whether the test result was significant
if pval > self.pc_alpha:
# Mark the edge from X to Y for removal
remove_CB = True
# Store Z as a non-weakly-minimal separating set of X and Y
self._save_sepset(X, Y, (frozenset(Z_test), "nm"))
if remove_CB:
# Remember the edge for removal
pair_key, new_link = self._get_pair_key_and_new_link(C, B, "")
to_remove.add((X, Y))
###############################################################################################################
###############################################################################################################
if not remove_AB and not remove_CB and not link_AB[2] in ["-", "x"] and not link_CB[2] in ["-", "x"] and not (link_AB[2] == ">" and link_CB[2] == ">"):
L.add((A, B, C))
# end for (A, B, C) in M
###################################################################################################################
###################################################################################################################
# Remove edges marked for removal
for (X, Y) in to_remove:
self._write_link(X, Y, "", verbosity = self.verbosity)
# Make sepsets minimal (here, this agrees with minimal)
for (X, Y) in to_remove:
# Read out all separating sets that were found in the rule phase, then consider only those of minimal cardinality
old_sepsets_all = {Z for (Z, _) in self._get_sepsets(X, Y)}
min_size = min({len(Z) for Z in old_sepsets_all})
old_sepsets_smallest = {Z for Z in old_sepsets_all if len(Z) == min_size}
# For all separating sets of minimal cardinality, find minimal separating subsets in an order independent way
self._delete_sepsets(X, Y)
self._make_sepset_minimal(X, Y, old_sepsets_smallest)
# Find new unshielded triples and determine the new "M"
new_unshielded_triples = set(self._find_triples(pattern_ij='***', pattern_jk='***', pattern_ik=''))
M = new_unshielded_triples.difference(old_unshielded_triples)
# Take care that not both (A, B, C) and (C, B, A) appear in M
M_unique = set()
for (A, B, C) in M:
if not (C, B, A) in M_unique:
M_unique.add((A, B, C))
M = M_unique
# end while len(M) > 0
#######################################################################################################################
#######################################################################################################################
# Remove all elements from L that are no langer part of an unshielded triple
L_final = {(A, B, C) for (A, B, C) in L if self._get_link(A, B) != "" and self._get_link(C, B) != ""}
# Run through all these triples and test for orientation as collider
to_orient = []
for (A, B, C) in L_final:
if self._B_not_in_SepSet_AC(A, B, C):
link_AB = self._get_link(A, B)
link_CB = self._get_link(C, B)
# Prepare the new links and save them to the output
if link_AB[2] != ">":
new_link_AB = link_AB[0] + link_AB[1] + ">"
to_orient.append(self._get_pair_key_and_new_link(A, B, new_link_AB))
new_link_CB = link_CB[0] + link_CB[1] + ">"
if link_CB[2] != ">":
to_orient.append(self._get_pair_key_and_new_link(C, B, new_link_CB))
# Verbose output
if self.verbosity >= 1:
print("\nUTR")
for ((i, j, lag_i), new_link) in set(to_orient):
print("{:10} ({},{:2}) {:3} ({},{:2}) ==> ({},{:2}) {:3} ({},{:2}) ".format("Marked:", i, lag_i, self._get_link((i, lag_i), (j, 0)), j, 0,i, lag_i, new_link, j, 0))
if len(to_orient) == 0:
print("Found nothing")
# Return if no orientations were found
if len(to_orient) == 0:
return False
# Aggreate orienations
new_ancs = {j: set() for j in range(self.N)}
new_non_ancs = {j: set() for j in range(self.N)}
for ((i, j, lag_i), new_link) in to_orient:
# The old link
old_link = self._get_link((i, lag_i), (j, 0))
# Assert that no preceeding variable is marked as an ancestor of later variable
assert not (lag_i > 0 and new_link[2] == "-")
# New ancestral relation of (i, lag_i) to (j, 0)
if new_link[0] == "-" and old_link[0] != "-":
new_ancs[j].add((i, lag_i))
elif new_link[0] == "<" and old_link[0] != "<":
new_non_ancs[j].add((i, lag_i))
# New ancestral relation of (j, 0) to (i, lag_i == 0)
if lag_i == 0:
if new_link[2] == "-" and old_link[2] != "-":
new_ancs[i].add((j, 0))
elif new_link[2] == ">" and old_link[2] != ">":
new_non_ancs[i].add((j, 0))
# Make the orientations
self._apply_new_ancestral_information(new_non_ancs, new_ancs)
# Return True
return True
def _run_rfci_dpr_rule(self):
"""Run the RFCI discriminating path rule: Lines 3 - 29 in algorithm 4.5 of the RFCI supplement paper"""
# Verbose output
if self.verbosity >= 1:
print("\nStarting RFCI DPR-Rule:")
# Find all relevant triangles W-V-Y
triangles = set(self._find_triples(pattern_ij='<**', pattern_jk='o*+', pattern_ik='-*>'))
# Verbose output
if self.verbosity >= 1 and len(triangles) == 0:
print("\nFound no suitable triangles")
# Run through all triangles
while len(triangles) > 0:
# Remember all paths that qualify for the orientation test
paths_to_test_for_orientation = dict()
# Remember edges marked for removal
to_remove = set()
# Run through all of these triangles
for (W, V, Y_path) in triangles:
# Find all discriminating paths for this triangle, then consider only the shortest paths
discriminating_paths = self._get_R4_discriminating_paths_rfci((W, V, Y_path), max_length = np.inf)
# If there is no discriminating path, continue with the next triple
if len(discriminating_paths) == 0:
continue
# Only consider shortests discrimintating paths
min_len = min([len(path) for path in discriminating_paths])
shortest_discriminating_paths = [path for path in discriminating_paths if len(path) == min_len]
# Run through all shortests discriminating paths
for path in shortest_discriminating_paths:
path_disqualified = False
# Get the separating set between the end points
X_1 = path[-1]
all_sepsets = {Z for (Z, _) in self._get_sepsets(X_1, Y_path)}
# Run through all pairs of adjancent variables on path
for i in range(min_len - 1):
# Read out the current pair of adjacent variables
(var_A, lag_A) = path[i]
(var_B, lag_B) = path[i + 1]
# Time shift accordingly
if lag_A <= lag_B:
X = (var_A, lag_A - lag_B) # A shifted
Y = (var_B, 0) # B shifted
delta_lag = lag_B
else:
X = (var_B, lag_B - lag_A) # B shifted
Y = (var_A, 0) # A shifted
delta_lag = lag_A
# Run through all sepsets
for S_ik in all_sepsets:
# Time shift the separating set
S_ik_shift = {(var, lag - delta_lag) for (var, lag) in S_ik if lag - delta_lag <= 0 and lag - delta_lag >= -self.tau_max}
# Run through all subsets of S_ik
for p in range(len(S_ik) + 1):
for Z in combinations(S_ik_shift, p):
# HACK
val, pval = self.cond_ind_test.run_test(X = [X], Y = [Y], Z = list(Z), tau_max = self.tau_max)
if self.verbosity >= 2:
print("DPR: %s _|_ %s | Z = %s: val = %.2f / pval = % .4f" %
(X, Y, ' '.join([str(z) for z in list(Z)]), val, pval))
# Update val_min and pval_max
self._update_val_min(X, Y, val)
self._update_pval_max(X, Y, pval)
self._update_cardinality(X, Y, len(Z))
# Check whether the test result was significant
if pval > self.pc_alpha:
# Mark the edge from X to Y for removal and store Z as a weakly-minimal separating set of X and Y
to_remove.add((X, Y))
self._save_sepset(X, Y, (frozenset(Z), "m"))
# Break the inner most for loops
path_disqualified = True
break
if path_disqualified:
break
# end for Z in combinations(S_ik_shift, p)
# end for p in range(len(S_ik) + 1)
# end for S_ik in all_sepsets
# If the path has been disqualifed, break the for loop through adjacent pairs on the path
if path_disqualified:
break
# end for i in range(min_len - 1)
if not path_disqualified:
if (W, V, Y_path) in paths_to_test_for_orientation.keys():
paths_to_test_for_orientation[(W, V, Y_path)].append(path)
else:
paths_to_test_for_orientation[(W, V, Y_path)] = [path]
# end for path in shortest_discriminating_paths
# end for (W, V, Y) in triangles
# Remember unshielded triples at this point
old_unshielded_triples = set(self._find_triples(pattern_ij='***', pattern_jk='***', pattern_ik=''))
# Delete all edges that are marked for removal
for (X, Y) in to_remove:
self._write_link(X, Y, "", verbosity = self.verbosity)
# Determine the unshielded triples
new_unshielded_triples = set(self._find_triples(pattern_ij='***', pattern_jk='***', pattern_ik=''))
new_unshielded_triples = new_unshielded_triples.difference(old_unshielded_triples)
# Run the RFCI unshielded triple rule on the new unshielded triples
restart = self._run_rfci_utr_rule(new_unshielded_triples)
# Keep only those qualfied paths that are still paths
final_paths = dict()
for (key, path_list) in paths_to_test_for_orientation.items():
disqualifed = False
for path in path_list:
for i in range(len(path) - 1):
if self._get_link(path[i], path[i+1]) == "":
disqualifed = True
break
if disqualifed:
continue
if key in final_paths.keys():
final_paths[key].append(path)
else:
final_paths[key] = [path]
# Subject the surviving paths to the orientation test
to_orient = []
for (key, path_list) in final_paths.items():
for path in path_list:
# Get the initial triangle
Y = path[0]
V = path[1]
W = path[2]
# Get the end point node
X_1 = path[-1]
# Get the current link from W to V, which we will need below
link_WV = self._get_link(W, V)
# Check which of the two cases of the rule we are in, then append the appropriate new links to the output list
if self._B_in_SepSet_AC(X_1, V, Y):
# New link from V to Y
to_orient.append(self._get_pair_key_and_new_link(V, Y, "-->"))
elif link_WV != "<-x" and self._B_not_in_SepSet_AC(X_1, V, Y):
# New link from V to Y
to_orient.append(self._get_pair_key_and_new_link(V, Y, "<->"))
# If needed, also the new link from W to V
if link_WV != "<->":
to_orient.append(self._get_pair_key_and_new_link(W, V, "<->"))
# Verbose output
if self.verbosity >= 1:
print("\nDPR")
for ((i, j, lag_i), new_link) in set(to_orient):
print("{:10} ({},{:2}) {:3} ({},{:2}) ==> ({},{:2}) {:3} ({},{:2}) ".format("Marked:", i, lag_i, self._get_link((i, lag_i), (j, 0)), j, 0,i, lag_i, new_link, j, 0))
if len(to_orient) == 0:
print("Found nothing")
# Return if neither UTR nor DPR found anything
if not restart and len(to_orient) == 0:
return True
# Aggreate orienations
new_ancs = {j: set() for j in range(self.N)}
new_non_ancs = {j: set() for j in range(self.N)}
for ((i, j, lag_i), new_link) in to_orient:
# The old link
old_link = self._get_link((i, lag_i), (j, 0))
# Assert that no preceeding variable is marked as an ancestor of later variable
assert not (lag_i > 0 and new_link[2] == "-")
# New ancestral relation of (i, lag_i) to (j, 0)
if new_link[0] == "-" and old_link[0] != "-":
new_ancs[j].add((i, lag_i))
elif new_link[0] == "<" and old_link[0] != "<":
new_non_ancs[j].add((i, lag_i))
# New ancestral relation of (j, 0) to (i, lag_i == 0)
if lag_i == 0:
if new_link[2] == "-" and old_link[2] != "-":
new_ancs[i].add((j, 0))
elif new_link[2] == ">" and old_link[2] != ">":
new_non_ancs[i].add((j, 0))
# Make the orientations
self._apply_new_ancestral_information(new_non_ancs, new_ancs)
# Check for the new relevant triangles
new_triangles = set(self._find_triples(pattern_ij='<**', pattern_jk='o*+', pattern_ik='-*>'))
triangles = new_triangles.difference(triangles)
# end while len(triangles) > 0
def _make_sepset_minimal(self, X, Y, Z_list):
"""
X and Y are conditionally independent given Z in Z_list However, it is not yet clear whether any of these Z are minimal separating set.
This function finds minimal separating subsets in an order independent way and writes them to the self.sepsets dictionary. Only those sets which are minimal separating sets are kept.
"""
# Base Case 1:
# Z in Z_list is minimal if len(Z) <= 1 or Z \subset ancs
any_minimal = False
for Z in Z_list:
if len(Z) <=1:
self._save_sepset(X, Y, (frozenset(Z), "m"))
any_minimal = True
if any_minimal:
return None
# If not Base Case 1, we need to search for separating subsets. We do this for all Z in Z_list, and build a set sepsets_next_call that contains all separating sets for the next recursive call
sepsets_next_call = set()
for Z in Z_list:
# Test for removal of all nodes in removable
new_sepsets = []
val_values = []
for A in Z:
Z_A = [node for node in Z if node != A]
# Run the conditional independence test
val, pval = self.cond_ind_test.run_test(X = [X], Y = [Y], Z = Z_A, tau_max = self.tau_max)
if self.verbosity >= 2:
print("MakeMin: %s _|_ %s | Z_A = %s: val = %.2f / pval = % .4f" %
(X, Y, ' '.join([str(z) for z in list(Z_A)]), val, pval))
# Check whether the test result was significant
if pval > self.pc_alpha:
new_sepsets.append(frozenset(Z_A))
val_values.append(val)
# If new_sepsets is empty, then Z is already minimal
if len(new_sepsets) == 0:
self._save_sepset(X, Y, (frozenset(Z), "m"))
any_minimal = True
# If we did not yet find a minimal separating set
if not any_minimal:
# Sort all separating sets in new_sepets by their test statistic, then append those separating sets with maximal statistic to sepsets_next_call. This i) guarantees order independence while ii) continuing to test as few as possible separating sets
new_sepsets = [node for _, node in sorted(zip(val_values, new_sepsets), reverse = True)]
i = -1
while i <= len(val_values) - 2 and val_values[i + 1] == val_values[0]:
sepsets_next_call.add(new_sepsets[i])
i = i + 1
assert i >= 0
# If we did not yet find a minimal separating set, make a recursive call
if not any_minimal:
self._make_sepset_minimal(X, Y, sepsets_next_call)
else:
return None
########################################################################################################################
########################################################################################################################
########################################################################################################################
def _run_orientation_phase(self, rule_list):
"""Function for exhaustive application of the orientation rules specified by rule_list."""
# Verbose output
if self.verbosity >= 1:
print("\nStarting orientation phase")
print("with rule list: ", rule_list)
# Run through all priority levels of rule_list
idx = 0
while idx <= len(rule_list) - 1:
# Some rule require that self._graph_full_dict is updated. Therefore, initialize this variable once the while loop (re)-starts at the first prioprity level
if idx == 0:
self._initialize_full_graph()
###########################################################################################################
### Rule application ######################################################################################
# Get the current rules
current_rules = rule_list[idx]
# Prepare a list to remember marked orientations
to_orient = []
# Run through all current rules
for rule in current_rules:
# Verbose output
if self.verbosity >= 1:
print("\n{}:".format(rule))
# Exhaustively apply the rule to the graph...
orientations = self._apply_rule(rule)
# Verbose output
if self.verbosity >= 1:
for ((i, j, lag_i), new_link) in set(orientations):
print("{:10} ({},{:2}) {:3} ({},{:2}) ==> ({},{:2}) {:3} ({},{:2}) ".format("Marked:", i, lag_i, self._get_link((i, lag_i), (j, 0)), j, 0,i, lag_i, new_link, j, 0))
if len(orientations) == 0:
print("Found nothing")
# ... and stage the results for orientation and removal
to_orient.extend(orientations)