Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created sphinx documentation for join #6

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,3 +129,6 @@ dmypy.json
.pyre/

.idea/

# Generated by Mac Finder
.DS_Store
11 changes: 11 additions & 0 deletions documentation/source/functions.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Causaleffectpy Functions
=======================

.. toctree::
:maxdepth: 4
:titlesonly:

simplify
join
insert
powerset
29 changes: 29 additions & 0 deletions documentation/source/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
.. Causaleffectpy documentation master file, created by
sphinx-quickstart on Tue Aug 13 12:31:43 2024.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

`Causaleffectpy` Documentation
==========================

This documentation provides an overview of `causaleffectpy`, which is derived from Santu Tikka's `causaleffect` R package. This documentation will focus on `simplify` and related functions in order to integrate them into the open source `y0` (Why Not?) Python package. For further information, see Tikka & Karvanen (2017) "Simplifying Probabilistic Expressions in Causal Inference".

.. toctree::
:maxdepth: 2

functions


References
===============

Hoyt, C.T., Zucker, J., & Parent, M-A. (2021). Y0 “Why Not?” for Causal Inference in Python (1.0) [Python package]. 10.5281/zenodo.4950768. https://github.com/y0-causal-inference/y0.
Tikka, S., & Karvanen, J. (2017). Simplifying probabilistic expressions in causal inference. Journal of Machine Learning Research, 18(36), 1-30.


Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
69 changes: 69 additions & 0 deletions documentation/source/insert.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
Insert
======

The `Insert` function inserts a missing variable into a joint distribution `P(J|D)` using d-separation criteria in a given graph `G`. It is called when there are variables without corresponding terms in the expression.

Parameters
----------
J : list of str
The set of variables representing the joint distribution.
D : list of str
The set of variables representing the conditioning set of the joint distribution.
M : str
The variable to be inserted.
cond : list of str
The set of conditioning variables.
S : list of str
The current summation variable.
O : list of str
The set of observed variables.
G_unobs : y0.Graph
Separate graph that turns bidirected edges into explicit nodes for unobserved confounders.
G : y0.Graph
Main graph `G`. Includes bidirected edges.
G_obs : y0.Graph
Separate graph that does not contain bidirected edges (only contains the directed edges with observed nodes).
topo : list of str
The topological ordering of the vertices in graph `G`.

Returns
-------
dict
A dictionary with the following keys:
- `J_new`: list of str. An updated set of joint distribution variables.
- `D_new`: list of str. An updated set of conditioning variables.
- `M`: str. The inserted variable.
- `ds_i`: list of str. The subset from the power set used in the insertion.

If no conditions were met, `insert` will return the original `J` and `D`.


Examples
--------
Section in-progress
.. code-block:: python


See Also
--------
- :func:`join`
- :func:`simplify`
- :func:`wrap_dSep`
- :func:`powerset`

Keywords
--------
models, manip, math, utilities, graphs, methods, multivariate, distribution, probability

Concepts
--------
probabilistic expressions, graph theory, joint distribution, causal inference, d-separation

References
----------
Tikka, S., & Karvanen, J. (2017). Simplifying probabilistic expressions in causal inference. *Journal of Machine Learning Research*, 18(36), 1-30.

Author
------
Haley Hummel,
Psychology PhD student at Oregon State University
91 changes: 91 additions & 0 deletions documentation/source/join.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
Join
====

The `join` function determines whether the terms of the atomic expression actually represent a joint distribution.
It attempts to combine two terms: the joint term `P(J|D)` obtained from `simplify()` and the term `P(V|C) := P(Vk|Ck)`
of the current iteration step. `join` iterates over potential subsets to find a valid set where the variable `new_variable`
can be added to the joint distribution `joint_dist_variables`. During this process, `join` checks conditional
independencies using both `joint_conditioning_set` and `prob_conditioning_set`. The goal is to determine if these
terms can be combined based on the d-separation criteria in the graph `G`.

Parameters
----------
joint_dist_variables : list of str
Equivalent to `J` in Tikka's `causaleffect` R package.
Existing joint set `P(J|D)`; already processed and included in the joint distribution
from previous `simplify` iteration. Initially, may be empty for the starting point of
the joint distribution. `new_variable` is added to expand it using `insert` if d-separation conditions are met.
joint_conditioning_set : list of str
Equivalent to `D` in Tikka's `causaleffect` R package. Represented by the term `P(V|C) := P(Vk|Ck)` in Tikka & Karvanen (2017).
Conditioning set for the already existing joint distribution `P(J|D)`, used to condition the joint distribution over the set `joint_dist_variables`.
As `join` iterates, `conditioning_set` is modified to determine how the joint distribution `P(J|D)` can be updated to
include the new variable `new_variable`, while preserving the required conditional independencies.
new_variable : str
Equivalent to `vari` in Tikka's `causaleffect` R package.
New variable being considered for inclusion in the joint distribution (the new variable that we may want to add to the joint distribution `joint_dist_variables`).
`join` attempts to update the joint distribution `joint_dist_variables` by adding `new_variable` to define a new probabilistic term if the term still
satisfies the required conditional independencies. `insert` adds `new_variable` to `joint_dist_variables`.
prob_conditioning_set : list of str
Equivalent to `cond` in Tikka's `causaleffect` R package.
Conditioning set for the current probabilistic term P(vari|cond); the set of variables that condition the current variable `new_variable`.
`join` uses `prob_conditioning_set` to evaluate conditional independence and determine if `new_variable` can be added to `joint_dist_variables`.
summation_variables : list of str
Equivalent to `S` in Tikka's `causaleffect` R package.
Not used directly in `join`. Current summation variable.
inserted_variables : list of str
Equivalent to `M` in Tikka's `causaleffect` R package.
Missing variables (variables not contained within the expression).
observed_variables : list of str
Equivalent to `O` in Tikka's `causaleffect` R package.
Observed variables (variables contained within the expression).
G_unobs : `networkx.DiGraph` object
A separate directed acyclic graph (DAG) that includes explicit nodes for unobserved confounders, created using :func:`networkx.DiGraph`.
G : `networkx.DiGraph` object
Main graph G, which includes bidirected edges, and is created with :func:`networkx.DiGraph`.
G_obs : `networkx.DiGraph` object
A DAG that only includes directed edges, representing observed variables, created using :func:`networkx.DiGraph`.
topo : list of nodes
The topological ordering of the vertices in graph `G`, which can be obtained using :func:`networkx.topological_sort`.

Returns
-------
Section in-progress

Dependencies
-------
This function depends on several other functions and classes, including:
- :func:`powerset`
- :func:`is_d_separated`
- :func:`insert`. `insert` adds `new_variable` to `joint_dist_variables`.

See Also
--------
- :func:`simplify`
- :func:`is_d_separated`
- :func:`insert`

Examples
--------
Section in-progress
.. code-block:: python


Keywords
--------
models, manip, math, utilities

Concepts
--------
probabilistic expressions, graph theory, causal inference

References
----------
Tikka, S. (2022). `causaleffect`: Deriving Expressions of Joint Interventional Distributions and Transport Formulas in Causal Models (1.3.15) [R package]. https://github.com/santikka/causaleffect/.
Tikka, S., & Karvanen, J. (2017). Simplifying probabilistic expressions in causal inference. Journal of Machine Learning Research, 18(36), 1-30.
Tikka, S., & Karvanen, J. (2018). Identifying causal effects with the R package causaleffect. arXiv preprint arXiv:1806.07161.

Author
------
Haley Hummel,
Psychology PhD student at Oregon State University

47 changes: 47 additions & 0 deletions documentation/source/powerset.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
Powerset
========

The `Powerset` function generates the power set of a given set. The power set is the set of all possible subsets of the original set, including the empty set and the set itself.

Parameters
----------
set : list
A list representing the original set for which the power set will be generated. The set can contain any type of elements (e.g., numeric, string, or boolean).

Details
-------
The function computes all possible combinations of the elements of the input set. This includes the empty subset, individual elements, and all larger subsets up to and including the full set. The number of subsets in the power set of a set of size `n` is `2^n`.

Returns
-------
list of lists
A list of lists, where each inner list is a subset of the original input set. The list contains `2^n` subsets, where `n` is the length of the input set. If the input set is empty, the function returns a list containing only the empty set.

Examples
--------
.. code-block:: python

set_1 = ['a', 'b', 'c']
powerset_result = powerset(set_1)
# Output: [[], ['a'], ['b'], ['c'], ['a', 'b'], ['a', 'c'], ['b', 'c'], ['a', 'b', 'c']]

See Also
--------
- `join`: for using powerset with conditional independence in probabilistic graphical models.

Keywords
--------
set theory, combinatorics

Concepts
--------
power set, subsets

References
----------
Tikka, S., & Karvanen, J. (2017). Simplifying probabilistic expressions in causal inference. Journal of Machine Learning Research, 18(36), 1-30.

Author
------
Haley Hummel,
Psychology PhD student at Oregon State University
59 changes: 59 additions & 0 deletions documentation/source/simplify.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
Simplify
========

This function algebraically simplifies probabilistic expressions given by the ID algorithm from :func:`identify`. It always attempts to perform maximal simplification, meaning that as many variables of the set are removed as possible. If the simplification in terms of the entire set cannot be completed, the intermediate result with as many variables simplified as possible should be returned.

Run :func:`identify` with the graph information first, then use the output of :func:`identify` as the `P` in :func:`parse_causaleffect`. Use the output from :func:`parse_causaleffect` as the `P` in :func:`simplify`.

For further information, see Tikka & Karvanen (2017) "Simplifying Probabilistic Expressions in Causal Inference" Algorithm 1.


Parameters
----------
P : `sympy` expression or `y0` `Probability` object
The probabilistic expression that will be simplified, typically created using symbolic expressions in `sympy` or using `y0`'s Probability class.
topo : list of nodes
The topological ordering of the vertices in graph `G`, which can be obtained using `networkx.topological_sort`.
G_unobs : networkx.DiGraph object
A separate directed acyclic graph (DAG) that includes explicit nodes for unobserved confounders, created using `networkx.DiGraph`.
G : networkx.DiGraph object
Main graph G, which includes bidirected edges, and is created with :func:`igraph.graph_formula`.
G_obs : networkx.DiGraph object
A DAG that only includes directed edges, representing observed variables, created using `networkx.DiGraph`.

Details
-------
This function depends on several other functions and classes, including: :func:`parents`, :func:`ancestors`, :func:`parse_causaleffect`, :func:`is_d_separated`, and :class:`probability`.

Returns
-------
list
Section in-progress

References
----------
Tikka, S., & Karvanen, J. (2017). Simplifying probabilistic expressions in causal inference. Journal of Machine Learning Research, 18(36), 1-30.


See Also
--------
:func:`identify`, :func:`parse_causaleffect`, :func:`get.expression`, :class:`probability`

Examples
--------
Section in-progress

.. code-block:: python


Keywords
--------
models, manip, math, utilities
Concepts
--------
probabilistic expressions, graph theory, causal inference

Author
------
Haley Hummel,
Psychology PhD student at Oregon State University