diff --git a/.gitignore b/.gitignore
index b97131e..ef9a474 100644
--- a/.gitignore
+++ b/.gitignore
@@ -129,3 +129,6 @@ dmypy.json
 .pyre/
 
 .idea/
+
+# Generated by Mac Finder
+.DS_Store
\ No newline at end of file
diff --git a/documentation/source/functions.rst b/documentation/source/functions.rst
new file mode 100644
index 0000000..09b778d
--- /dev/null
+++ b/documentation/source/functions.rst
@@ -0,0 +1,11 @@
+Causaleffectpy Functions
+=======================
+
+.. toctree::
+   :maxdepth: 4
+   :titlesonly:
+
+   simplify
+   join
+   insert
+   powerset
diff --git a/documentation/source/index.rst b/documentation/source/index.rst
new file mode 100644
index 0000000..aaac172
--- /dev/null
+++ b/documentation/source/index.rst
@@ -0,0 +1,29 @@
+.. Causaleffectpy documentation master file, created by
+   sphinx-quickstart on Tue Aug 13 12:31:43 2024.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+`Causaleffectpy` Documentation
+==========================
+
+This documentation provides an overview of `causaleffectpy`, which is derived from Santu Tikka's `causaleffect` R package. This documentation will focus on `simplify` and related functions in order to integrate them into the open source `y0` (Why Not?) Python package. For further information, see Tikka & Karvanen (2017) "Simplifying Probabilistic Expressions in Causal Inference".
+
+.. toctree::
+   :maxdepth: 2
+
+   functions
+
+
+References
+===============
+
+Hoyt, C.T., Zucker, J., & Parent, M-A. (2021). Y0 “Why Not?” for Causal Inference in Python (1.0) [Python package]. 10.5281/zenodo.4950768. https://github.com/y0-causal-inference/y0.
+Tikka, S., & Karvanen, J. (2017). Simplifying probabilistic expressions in causal inference. Journal of Machine Learning Research, 18(36), 1-30.
+
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
\ No newline at end of file
diff --git a/documentation/source/insert.rst b/documentation/source/insert.rst
new file mode 100644
index 0000000..af70504
--- /dev/null
+++ b/documentation/source/insert.rst
@@ -0,0 +1,69 @@
+Insert
+======
+
+The `Insert` function inserts a missing variable into a joint distribution `P(J|D)` using d-separation criteria in a given graph `G`. It is called when there are variables without corresponding terms in the expression.
+
+Parameters
+----------
+J : list of str
+    The set of variables representing the joint distribution.
+D : list of str
+    The set of variables representing the conditioning set of the joint distribution.
+M : str
+    The variable to be inserted.
+cond : list of str
+    The set of conditioning variables.
+S : list of str
+    The current summation variable.
+O : list of str
+    The set of observed variables.
+G_unobs : y0.Graph
+    Separate graph that turns bidirected edges into explicit nodes for unobserved confounders.
+G : y0.Graph
+    Main graph `G`. Includes bidirected edges.
+G_obs : y0.Graph
+    Separate graph that does not contain bidirected edges (only contains the directed edges with observed nodes).
+topo : list of str
+    The topological ordering of the vertices in graph `G`.
+
+Returns
+-------
+dict
+    A dictionary with the following keys:
+    - `J_new`: list of str. An updated set of joint distribution variables.
+    - `D_new`: list of str. An updated set of conditioning variables.
+    - `M`: str. The inserted variable.
+    - `ds_i`: list of str. The subset from the power set used in the insertion.
+    
+    If no conditions were met, `insert` will return the original `J` and `D`.
+
+
+Examples
+--------
+Section in-progress
+.. code-block:: python
+
+
+See Also
+--------
+- :func:`join`
+- :func:`simplify`
+- :func:`wrap_dSep`
+- :func:`powerset`
+
+Keywords
+--------
+models, manip, math, utilities, graphs, methods, multivariate, distribution, probability
+
+Concepts
+--------
+probabilistic expressions, graph theory, joint distribution, causal inference, d-separation
+
+References
+----------
+Tikka, S., & Karvanen, J. (2017). Simplifying probabilistic expressions in causal inference. *Journal of Machine Learning Research*, 18(36), 1-30.
+
+Author
+------
+Haley Hummel,
+Psychology PhD student at Oregon State University
\ No newline at end of file
diff --git a/documentation/source/join.rst b/documentation/source/join.rst
new file mode 100644
index 0000000..b376843
--- /dev/null
+++ b/documentation/source/join.rst
@@ -0,0 +1,91 @@
+Join
+====
+
+The `join` function determines whether the terms of the atomic expression actually represent a joint distribution.
+It attempts to combine two terms: the joint term `P(J|D)` obtained from `simplify()` and the term `P(V|C) := P(Vk|Ck)` 
+of the current iteration step. `join` iterates over potential subsets to find a valid set where the variable `new_variable` 
+can be added to the joint distribution `joint_dist_variables`. During this process, `join` checks conditional 
+independencies using both `joint_conditioning_set` and `prob_conditioning_set`. The goal is to determine if these 
+terms can be combined based on the d-separation criteria in the graph `G`.
+
+Parameters
+----------
+joint_dist_variables : list of str
+    Equivalent to `J` in Tikka's `causaleffect` R package.
+    Existing joint set `P(J|D)`; already processed and included in the joint distribution
+    from previous `simplify` iteration. Initially, may be empty for the starting point of
+    the joint distribution. `new_variable` is added to expand it using `insert` if d-separation conditions are met.
+joint_conditioning_set : list of str
+     Equivalent to `D` in Tikka's `causaleffect` R package. Represented by the term `P(V|C) := P(Vk|Ck)` in Tikka & Karvanen (2017). 
+     Conditioning set for the already existing joint distribution `P(J|D)`, used to condition the joint distribution over the set `joint_dist_variables`. 
+     As `join` iterates, `conditioning_set` is modified to determine how the joint distribution `P(J|D)` can be updated to 
+     include the new variable `new_variable`, while preserving the required conditional independencies.
+new_variable : str
+    Equivalent to `vari` in Tikka's `causaleffect` R package.
+    New variable being considered for inclusion in the joint distribution (the new variable that we may want to add to the joint distribution `joint_dist_variables`).
+    `join` attempts to update the joint distribution `joint_dist_variables` by adding `new_variable` to define a new probabilistic term if the term still 
+    satisfies the required conditional independencies. `insert` adds `new_variable` to `joint_dist_variables`.
+prob_conditioning_set : list of str
+    Equivalent to `cond` in Tikka's `causaleffect` R package.
+    Conditioning set for the current probabilistic term P(vari|cond); the set of variables that condition the current variable `new_variable`. 
+    `join` uses `prob_conditioning_set` to evaluate conditional independence and determine if `new_variable` can be added to `joint_dist_variables`.
+summation_variables : list of str
+    Equivalent to `S` in Tikka's `causaleffect` R package.
+    Not used directly in `join`. Current summation variable.
+inserted_variables : list of str
+    Equivalent to `M` in Tikka's `causaleffect` R package.
+    Missing variables (variables not contained within the expression).
+observed_variables : list of str
+    Equivalent to `O` in Tikka's `causaleffect` R package.
+    Observed variables (variables contained within the expression).
+G_unobs : `networkx.DiGraph` object
+    A separate directed acyclic graph (DAG) that includes explicit nodes for unobserved confounders, created using :func:`networkx.DiGraph`.
+G : `networkx.DiGraph` object
+    Main graph G, which includes bidirected edges, and is created with :func:`networkx.DiGraph`.
+G_obs : `networkx.DiGraph` object
+    A DAG that only includes directed edges, representing observed variables, created using :func:`networkx.DiGraph`.
+topo : list of nodes
+    The topological ordering of the vertices in graph `G`, which can be obtained using :func:`networkx.topological_sort`.
+
+Returns
+-------
+Section in-progress
+
+Dependencies
+-------
+This function depends on several other functions and classes, including: 
+- :func:`powerset`
+- :func:`is_d_separated`
+- :func:`insert`. `insert` adds `new_variable` to `joint_dist_variables`.
+
+See Also
+--------
+- :func:`simplify`
+- :func:`is_d_separated`
+- :func:`insert`
+
+Examples
+--------
+Section in-progress
+.. code-block:: python
+
+
+Keywords
+--------
+models, manip, math, utilities
+
+Concepts
+--------
+probabilistic expressions, graph theory, causal inference
+
+References
+----------
+Tikka, S. (2022). `causaleffect`: Deriving Expressions of Joint Interventional Distributions and Transport Formulas in Causal Models (1.3.15) [R package]. https://github.com/santikka/causaleffect/.
+Tikka, S., & Karvanen, J. (2017). Simplifying probabilistic expressions in causal inference. Journal of Machine Learning Research, 18(36), 1-30.
+Tikka, S., & Karvanen, J. (2018). Identifying causal effects with the R package causaleffect. arXiv preprint arXiv:1806.07161.
+
+Author
+------
+Haley Hummel,
+Psychology PhD student at Oregon State University
+
diff --git a/documentation/source/powerset.rst b/documentation/source/powerset.rst
new file mode 100644
index 0000000..eb6dd76
--- /dev/null
+++ b/documentation/source/powerset.rst
@@ -0,0 +1,47 @@
+Powerset
+========
+
+The `Powerset` function generates the power set of a given set. The power set is the set of all possible subsets of the original set, including the empty set and the set itself.
+
+Parameters
+----------
+set : list
+    A list representing the original set for which the power set will be generated. The set can contain any type of elements (e.g., numeric, string, or boolean).
+
+Details
+-------
+The function computes all possible combinations of the elements of the input set. This includes the empty subset, individual elements, and all larger subsets up to and including the full set. The number of subsets in the power set of a set of size `n` is `2^n`.
+
+Returns
+-------
+list of lists
+    A list of lists, where each inner list is a subset of the original input set. The list contains `2^n` subsets, where `n` is the length of the input set. If the input set is empty, the function returns a list containing only the empty set.
+
+Examples
+--------
+.. code-block:: python
+
+    set_1 = ['a', 'b', 'c']
+    powerset_result = powerset(set_1)
+    # Output: [[], ['a'], ['b'], ['c'], ['a', 'b'], ['a', 'c'], ['b', 'c'], ['a', 'b', 'c']]
+
+See Also
+--------
+- `join`: for using powerset with conditional independence in probabilistic graphical models.
+
+Keywords
+--------
+set theory, combinatorics
+
+Concepts
+--------
+power set, subsets
+
+References
+----------
+Tikka, S., & Karvanen, J. (2017). Simplifying probabilistic expressions in causal inference. Journal of Machine Learning Research, 18(36), 1-30.
+
+Author
+------
+Haley Hummel,
+Psychology PhD student at Oregon State University
diff --git a/documentation/source/simplify.rst b/documentation/source/simplify.rst
new file mode 100644
index 0000000..d207dd5
--- /dev/null
+++ b/documentation/source/simplify.rst
@@ -0,0 +1,59 @@
+Simplify
+========
+
+This function algebraically simplifies probabilistic expressions given by the ID algorithm from :func:`identify`. It always attempts to perform maximal simplification, meaning that as many variables of the set are removed as possible. If the simplification in terms of the entire set cannot be completed, the intermediate result with as many variables simplified as possible should be returned.
+
+Run :func:`identify` with the graph information first, then use the output of :func:`identify` as the `P` in :func:`parse_causaleffect`. Use the output from :func:`parse_causaleffect` as the `P` in :func:`simplify`.
+
+For further information, see Tikka & Karvanen (2017) "Simplifying Probabilistic Expressions in Causal Inference" Algorithm 1.
+
+
+Parameters
+----------
+P : `sympy` expression or `y0` `Probability` object
+    The probabilistic expression that will be simplified, typically created using symbolic expressions in `sympy` or using `y0`'s Probability class.
+topo : list of nodes
+    The topological ordering of the vertices in graph `G`, which can be obtained using `networkx.topological_sort`.
+G_unobs : networkx.DiGraph object
+    A separate directed acyclic graph (DAG) that includes explicit nodes for unobserved confounders, created using `networkx.DiGraph`.
+G : networkx.DiGraph object
+    Main graph G, which includes bidirected edges, and is created with :func:`igraph.graph_formula`.
+G_obs : networkx.DiGraph object
+    A DAG that only includes directed edges, representing observed variables, created using `networkx.DiGraph`.
+
+Details
+-------
+This function depends on several other functions and classes, including: :func:`parents`, :func:`ancestors`, :func:`parse_causaleffect`, :func:`is_d_separated`, and :class:`probability`.
+
+Returns
+-------
+list
+    Section in-progress 
+
+References
+----------
+Tikka, S., & Karvanen, J. (2017). Simplifying probabilistic expressions in causal inference. Journal of Machine Learning Research, 18(36), 1-30.
+
+
+See Also
+--------
+:func:`identify`, :func:`parse_causaleffect`, :func:`get.expression`, :class:`probability`
+
+Examples
+--------
+Section in-progress
+
+.. code-block:: python
+
+   
+Keywords
+--------
+models, manip, math, utilities
+Concepts
+--------
+probabilistic expressions, graph theory, causal inference
+
+Author
+------
+Haley Hummel,
+Psychology PhD student at Oregon State University
\ No newline at end of file