From 474c946c444fdc88f8df8df47466a24c55dc5b1c Mon Sep 17 00:00:00 2001 From: espinoj Date: Thu, 22 Feb 2024 23:48:22 -0500 Subject: [PATCH] broken link fixes --- docs/causal-cmd/index.html | 6 +++--- docs/index.html | 2 +- docs/py-causal/index.html | 2 +- docs/search/search_index.json | 2 +- docs/sitemap.xml.gz | Bin 288 -> 288 bytes docs_src/causal-cmd.md | 5 ++--- docs_src/py-causal.md | 2 +- 7 files changed, 9 insertions(+), 10 deletions(-) diff --git a/docs/causal-cmd/index.html b/docs/causal-cmd/index.html index 297927c..0bbe30a 100644 --- a/docs/causal-cmd/index.html +++ b/docs/causal-cmd/index.html @@ -128,8 +128,8 @@

Introduction

Causal discovery algorithms allow a user to uncover the causal relationships between variables in a dataset. These discovered causal relationships may be used further--understanding the underlying the processes of a system (e.g., the metabolic pathways of an organism), hypothesis generation (e.g., variables that best explain an outcome), guide experimentation (e.g., what gene knockout experiments should be performed) or prediction (e.g. parameterization of the causal graph using data and then using it as a classifier).

Command Line Usage

Java 8 or higher is the only prerequisite to run the software. Note that by default Java will allocate the smaller of 1/4 system memory or 1GB to the Java virtual machine (JVM). If you run out of memory (heap memory space) running your analyses you should increase the memory allocated to the JVM with the following switch '-XmxXXG' where XX is the number of gigabytes of ram you allow the JVM to utilize. For example to allocate 8 gigabytes of ram you would add -Xmx8G immediately after the java command.

-

In this example, we'll use download the Retention.txt file, which is a dataset containing information on college graduation and used in the publication of "What Do College Ranking Data Tell Us About Student Retention?" by Drudzel and Glymour, 1994.

-

Keep in mind that causal-cmd has different switches for different algorithms. To start, type the following command in your terminal:

+

In this example, we'll use download the Retention.txt +Keep in mind that causal-cmd has different switches for different algorithms. To start, type the following command in your terminal:

java -jar causal-cmd-<version number>-jar-with-dependencies.jar
 

** Note: we are using causal-cmd-<version number>-jar-with-dependencies.jar to indicate the actual executable jar of specific version number that is being used. **

@@ -206,7 +206,7 @@

Command Line Usage

--timeLag <integer> A time lag for time series data, automatically applied (zero if none) --verbose Yes if verbose output should be printed or logged -

In this example, we'll be running the FGES algorith on the dataset Retention.txt.

+

In this example, we'll be running the FGES algorithm on the dataset Retention.txt.

$ java -jar causal-cmd-1.10.0-jar-with-dependencies.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic-score
 

This command will output by default one file fges_<unix timestamp>.txt which is a log and result of the algorithm's activity.
diff --git a/docs/index.html b/docs/index.html index 374f2ca..80cfa67 100644 --- a/docs/index.html +++ b/docs/index.html @@ -205,5 +205,5 @@

Tools and Software

diff --git a/docs/py-causal/index.html b/docs/py-causal/index.html index fab5336..3bf3e3c 100644 --- a/docs/py-causal/index.html +++ b/docs/py-causal/index.html @@ -172,7 +172,7 @@

Running Jupyter/IPython

Anaconda/Jupyter

Installing Python with Anaconda and Jupyter may be easier for some users:

For OS X, this default install does not seem to work well. try the following instead:

diff --git a/docs/search/search_index.json b/docs/search/search_index.json index 58163cf..651ee26 100644 --- a/docs/search/search_index.json +++ b/docs/search/search_index.json @@ -1 +1 @@ -{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Welcome to CCD Docs This site hosts documentation for the Center for Causal Discovery . Tools and Software causal-cmd - a Java API and command line implementation of algorithms for performing causal discovery on Big Data. Use this software if you are interested incorporating analysis via a shell script or in a Java-based program. The software currently includes Fast Greedy Search ( FGES ) for continuous or discrete variables \u2013 an optimized version of Greedy Equivalence Search ( GES ) tested with datasets that contain as many as 1 million continuous variables, and Greedy Fast Causal Inference ( GFCI ) for continuous or discrete variables. Download a release Report bugs or issues with the software Github project Tetrad - a Java API, and desktop environment for learning, performing analyses and experimenting with causal discovery algorithms. Download the Application Tetrad Project Website Causal Web App \u2013 (unsupported and no longer maintained) our user-friendly web-based graphical interface for performing causal discovery analysis on big data using large memory servers. Github project Causal REST API \u2013 (unsupported and no longer maintained) our RESTful API for Causal Web App. Once you create a new user account via Causal Web App, you can use this REST API to upload data files and run Causal Discovery Algorithms. Github project Cytoscape-tetrad - (unsupported and no longer maintained) a native cytoscape plugin that imports tetrad txt output files that contain the structure of a causal graph. It handles causal graphs and partial ancestral graphs. Github project Ccd-annotations-cytoscape - (unsupported and no longer maintained) a native cytoscape plugin that allows for annotating nodes and edges of any cytoscape graph. Github project Py-causal - (unsupported and no longer maintained) a python module that wraps algorithms for performing causal discovery on big data. The software currently includes Fast Greedy Search ( FGES ) for both continuous and discrete variables, and Greedy Fast Causal Inference ( GFCI ) for continuous and discretevariables. Note: This project uses a very old version of Tetrad and a method of connecting Python to Java, Javabridge, that's proven sometimes buggy and hard to install on some platforms, and so we are no longer recommending it. Please consider using py-tetrad instead. Py-tetrad uses JPype to bridge Python and Java, which has already shown itself to be much easier to install and use cross-platform. Also, it allows one to use the most recent version of Tetrad, and it has been well-tested. Github project Docker container of Jupyter Notebook with Py-causal configured R-causal - (unsupported and no longer maintained) an R module that that wraps algorithms for performing causal discovery on big data. The software currently includes Fast Greedy Search ( FGES ) for both continuous and discrete variables, and Greedy Fast Causal Inference ( GFCI ) for continuous variables. Note 2023-03-06: This version of RCausal uses an older version of Tetrad from at least 5 years ago. However, we have updated our Python integration to a much better version--see https://github.com/cmu-phil/py-tetrad . Updating our R integration is one of the next projects we will take up. Github project Docker container of Jupyter Notebook with R-causal configured If you use our software in your research, please acknowledge the Center for Causal Discovery, supported by grant U54HG008540 , in any papers, presentations, or other dissemination of your work. All software is open-source and released under a dual licensing model. For non-profit institutions, the software is available under the GNU General Public License (GPL) v2 license. For-profit organizations that wish to commercialize enhanced or customized versions of the software will be able to purchase a commercial license on a case-by-case basis. The GPL license permits individuals to modify the source code and to share modifications with other colleagues/investigators. Specifically, it permits the dissemination and commercialization of enhanced or customized versions as well as incorporation of the software or its pieces into other license-compatible software packages, as long as modifications or enhancements are made open source. By using software provided by the Center for Causal Discovery, you agree that no warranties of any kind are made by Carnegie Mellon University or the University of Pittsburgh with respect to the data provided by the software or any use thereof, and the universities hereby disclaim the implied warranties of merchantability, fitness for a particular purpose, and non-infringement. The universities shall not be liable for any claims, losses, or damages of any kind arising from the data provided by the software or any use thereof.","title":"Welcome to CCD Docs"},{"location":"#welcome-to-ccd-docs","text":"This site hosts documentation for the Center for Causal Discovery .","title":"Welcome to CCD Docs"},{"location":"#tools-and-software","text":"causal-cmd - a Java API and command line implementation of algorithms for performing causal discovery on Big Data. Use this software if you are interested incorporating analysis via a shell script or in a Java-based program. The software currently includes Fast Greedy Search ( FGES ) for continuous or discrete variables \u2013 an optimized version of Greedy Equivalence Search ( GES ) tested with datasets that contain as many as 1 million continuous variables, and Greedy Fast Causal Inference ( GFCI ) for continuous or discrete variables. Download a release Report bugs or issues with the software Github project Tetrad - a Java API, and desktop environment for learning, performing analyses and experimenting with causal discovery algorithms. Download the Application Tetrad Project Website Causal Web App \u2013 (unsupported and no longer maintained) our user-friendly web-based graphical interface for performing causal discovery analysis on big data using large memory servers. Github project Causal REST API \u2013 (unsupported and no longer maintained) our RESTful API for Causal Web App. Once you create a new user account via Causal Web App, you can use this REST API to upload data files and run Causal Discovery Algorithms. Github project Cytoscape-tetrad - (unsupported and no longer maintained) a native cytoscape plugin that imports tetrad txt output files that contain the structure of a causal graph. It handles causal graphs and partial ancestral graphs. Github project Ccd-annotations-cytoscape - (unsupported and no longer maintained) a native cytoscape plugin that allows for annotating nodes and edges of any cytoscape graph. Github project Py-causal - (unsupported and no longer maintained) a python module that wraps algorithms for performing causal discovery on big data. The software currently includes Fast Greedy Search ( FGES ) for both continuous and discrete variables, and Greedy Fast Causal Inference ( GFCI ) for continuous and discretevariables. Note: This project uses a very old version of Tetrad and a method of connecting Python to Java, Javabridge, that's proven sometimes buggy and hard to install on some platforms, and so we are no longer recommending it. Please consider using py-tetrad instead. Py-tetrad uses JPype to bridge Python and Java, which has already shown itself to be much easier to install and use cross-platform. Also, it allows one to use the most recent version of Tetrad, and it has been well-tested. Github project Docker container of Jupyter Notebook with Py-causal configured R-causal - (unsupported and no longer maintained) an R module that that wraps algorithms for performing causal discovery on big data. The software currently includes Fast Greedy Search ( FGES ) for both continuous and discrete variables, and Greedy Fast Causal Inference ( GFCI ) for continuous variables. Note 2023-03-06: This version of RCausal uses an older version of Tetrad from at least 5 years ago. However, we have updated our Python integration to a much better version--see https://github.com/cmu-phil/py-tetrad . Updating our R integration is one of the next projects we will take up. Github project Docker container of Jupyter Notebook with R-causal configured If you use our software in your research, please acknowledge the Center for Causal Discovery, supported by grant U54HG008540 , in any papers, presentations, or other dissemination of your work. All software is open-source and released under a dual licensing model. For non-profit institutions, the software is available under the GNU General Public License (GPL) v2 license. For-profit organizations that wish to commercialize enhanced or customized versions of the software will be able to purchase a commercial license on a case-by-case basis. The GPL license permits individuals to modify the source code and to share modifications with other colleagues/investigators. Specifically, it permits the dissemination and commercialization of enhanced or customized versions as well as incorporation of the software or its pieces into other license-compatible software packages, as long as modifications or enhancements are made open source. By using software provided by the Center for Causal Discovery, you agree that no warranties of any kind are made by Carnegie Mellon University or the University of Pittsburgh with respect to the data provided by the software or any use thereof, and the universities hereby disclaim the implied warranties of merchantability, fitness for a particular purpose, and non-infringement. The universities shall not be liable for any claims, losses, or damages of any kind arising from the data provided by the software or any use thereof.","title":"Tools and Software"},{"location":"causal-cmd/","text":"causal-cmd v1.10.x Introduction Causal-cmd is a Java application that provides a Command-Line Interface (CLI) tool for causal discovery algorithms produced by the Center for Causal Discovery . The application currently includes the following algorithms: boss, bpc, ccd, cpc, cstar, fas, fask, fask-pw, fci, fcimax, fges, fges-mb, fofc, ftfc, gfci, grasp, grasp-fci, ica-ling-d, ica-lingam, images, mgm, pag-sampling-rfci, pc, pc-mb, pcmax, r-skew, r3, rfci, skew, spfci, svar-fci, svar-gfci Causal discovery algorithms are a class of search algorithms that explore a space of graphical causal models, i.e., graphical models where directed edges imply causation, for a model (or models) that are a good fit for a dataset. We suggest that newcomers to the field review Causation, Prediction and Search by Spirtes, Glymour and Scheines for a primer on the subject. Causal discovery algorithms allow a user to uncover the causal relationships between variables in a dataset. These discovered causal relationships may be used further--understanding the underlying the processes of a system (e.g., the metabolic pathways of an organism), hypothesis generation (e.g., variables that best explain an outcome), guide experimentation (e.g., what gene knockout experiments should be performed) or prediction (e.g. parameterization of the causal graph using data and then using it as a classifier). Command Line Usage Java 8 or higher is the only prerequisite to run the software. Note that by default Java will allocate the smaller of 1/4 system memory or 1GB to the Java virtual machine (JVM). If you run out of memory (heap memory space) running your analyses you should increase the memory allocated to the JVM with the following switch '-XmxXXG' where XX is the number of gigabytes of ram you allow the JVM to utilize. For example to allocate 8 gigabytes of ram you would add -Xmx8G immediately after the java command. In this example, we'll use download the Retention.txt file, which is a dataset containing information on college graduation and used in the publication of \"What Do College Ranking Data Tell Us About Student Retention?\" by Drudzel and Glymour, 1994. Keep in mind that causal-cmd has different switches for different algorithms. To start, type the following command in your terminal: java -jar causal-cmd--jar-with-dependencies.jar ** Note: we are using causal-cmd--jar-with-dependencies.jar to indicate the actual executable jar of specific version number that is being used. ** And you'll see the following instructions: Missing required options: algorithm, data-type, dataset, delimiter usage: java -jar Causal-cmd Project-1.10.0.jar --algorithm [--comment-marker ] --data-type --dataset [--default] --delimiter [--experimental] [--help] [--help-algo-desc] [--help-all] [--help-score-desc] [--help-test-desc] [--json-graph] [--metadata ] [--no-header] [--out ] [--prefix ] [--quote-char ] [--skip-validation] [--version] --algorithm Algorithm: boss, bpc, ccd, cpc, cstar, dagma, direct-lingam, fas, fask, fask-pw, fci, fci-iod, fci-max, fges, fges-mb, fofc, ftfc, gfci, grasp, grasp-fci, ica-ling-d, ica-lingam, images, mgm, pag-sampling-rfci, pc, pc-mb, r-boss, r-skew, r3, rfci, skew, spfci, svar-fci, svar-gfci --comment-marker Comment marker. --data-type Data type: all, continuous, covariance, discrete, mixed --dataset Dataset. Multiple files are seperated by commas. --default Use Tetrad default parameter values. --delimiter Delimiter: colon, comma, pipe, semicolon, space, tab, whitespace --experimental Show experimental algorithms, tests, and scores. --help Show help. --help-algo-desc Show all the algorithms along with their descriptions. --help-all Show all options and descriptions. --help-score-desc Show all the scores along with their descriptions. --help-test-desc Show all the independence tests along with their descriptions. --json-graph Write out graph as json. --metadata Metadata file. Cannot apply to dataset without header. --no-header Indicates tabular dataset has no header. --out Output directory --prefix Replace the default output filename prefix in the format of _. --quote-char Single character denotes quote. --skip-validation Skip validation. --version Show version. Use --help for guidance list of options. Use --help-all to show all options. By specifying an algorithm using the --algorithm switch the program will indicate the additional required switches. The program reminds the user of required switches to run. In general most algorithms also require data-type, dataset, delimiter and score. The switch --help-all displays and extended list of switches for the algorithm. Example of listing all available options for an algorithm: $ java -jar causal-cmd-1.9.0-jar-with-dependencies.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic-score --help usage: java -jar Causal-cmd Project-1.10.0.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic-score [--addOriginalDataset] [--choose-dag-in-pattern] [--choose-mag-in-pag] [--comment-marker ] [--default] [--exclude-var ] [--experimental] [--external-graph ] [--extract-struct-model] [--faithfulnessAssumed] [--generate-complete-graph] [--genereate-pag-from-dag] [--genereate-pag-from-tsdag] [--genereate-pattern-from-dag] [--json-graph] [--knowledge ] [--make-all-edges-undirected] [--make-bidirected-undirected] [--make-undirected-bidirected] [--maxDegree ] [--meekVerbose] [--metadata ] [--missing-marker ] [--no-header] [--numberResampling ] [--out ] [--parallelized] [--penaltyDiscount ] [--percentResampleSize ] [--precomputeCovariances] [--prefix ] [--quote-char ] [--resamplingEnsemble ] [--resamplingWithReplacement] [--saveBootstrapGraphs] [--seed ] [--semBicRule ] [--semBicStructurePrior ] [--skip-validation] [--symmetricFirstStep] [--timeLag ] [--verbose] --addOriginalDataset Yes, if adding the original dataset as another bootstrapping --choose-dag-in-pattern Choose DAG in Pattern graph. --choose-mag-in-pag Choose MAG in PAG. --comment-marker Comment marker. --default Use Tetrad default parameter values. --exclude-var Variables to be excluded from run. --experimental Show experimental algorithms, tests, and scores. --external-graph External graph file. --extract-struct-model Extract sturct model. --faithfulnessAssumed Yes if (one edge) faithfulness should be assumed --generate-complete-graph Generate complete graph. --genereate-pag-from-dag Generate PAG from DAG. --genereate-pag-from-tsdag Generate PAG from TsDAG. --genereate-pattern-from-dag Generate pattern graph from PAG. --json-graph Write out graph as json. --knowledge Prior knowledge file. --make-all-edges-undirected Make all edges undirected. --make-bidirected-undirected Make bidirected edges undirected. --make-undirected-bidirected Make undirected edges bidirected. --maxDegree The maximum degree of the graph (min = -1) --meekVerbose Yes if verbose output for Meek rule applications should be printed or logged --metadata Metadata file. Cannot apply to dataset without header. --missing-marker Denotes missing value. --no-header Indicates tabular dataset has no header. --numberResampling The number of bootstraps/resampling iterations (min = 0) --out Output directory --parallelized Yes if the search should be parallelized --penaltyDiscount Penalty discount (min = 0.0) --percentResampleSize The percentage of resample size (min = 10%) --precomputeCovariances True if covariance matrix should be precomputed for tubular continuous data --prefix Replace the default output filename prefix in the format of _. --quote-char Single character denotes quote. --resamplingEnsemble Ensemble method: Preserved (1), Highest (2), Majority (3) --resamplingWithReplacement Yes, if sampling with replacement (bootstrapping) --saveBootstrapGraphs Yes if individual bootstrapping graphs should be saved --seed Seed for pseudorandom number generator (-1 = off) --semBicRule Lambda: 1 = Chickering, 2 = Nandy --semBicStructurePrior Structure Prior for SEM BIC (default 0) --skip-validation Skip validation. --symmetricFirstStep Yes if the first step step for FGES should do scoring for both X->Y and Y->X --timeLag A time lag for time series data, automatically applied (zero if none) --verbose Yes if verbose output should be printed or logged In this example, we'll be running the FGES algorith on the dataset Retention.txt . $ java -jar causal-cmd-1.10.0-jar-with-dependencies.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic-score This command will output by default one file fges_.txt which is a log and result of the algorithm's activity. '--json-graph' option will enable output fges__graph.json which is a json graph from the algorithm, which is equivalent to the exported json file from tetrad-gui. Example log output from causal-cmd: ================================================================================ FGES (Wed, October 04, 2023 01:42:43 PM) ================================================================================ Runtime Parameters -------------------------------------------------------------------------------- number of threads: 7 Dataset -------------------------------------------------------------------------------- file: Retention.txt header: yes delimiter: tab quote char: none missing marker: none comment marker: none Algorithm Run -------------------------------------------------------------------------------- algorithm: FGES score: Sem BIC Score Algorithm Parameters -------------------------------------------------------------------------------- addOriginalDataset: no faithfulnessAssumed: no maxDegree: 1000 meekVerbose: no numberResampling: 0 parallelized: no penaltyDiscount: 2.0 percentResampleSize: 100 precomputeCovariances: no resamplingEnsemble: 1 resamplingWithReplacement: no saveBootstrapGraphs: no seed: -1 semBicRule: 1 semBicStructurePrior: 0.0 symmetricFirstStep: no timeLag: 0 verbose: no Wed, October 04, 2023 01:42:45 PM: Start data validation on file Retention.txt. Wed, October 04, 2023 01:42:45 PM: End data validation on file Retention.txt. There are 170 cases and 8 variables. Wed, October 04, 2023 01:42:45 PM: Start reading in file Retention.txt. Wed, October 04, 2023 01:42:45 PM: Finished reading in file Retention.txt. Wed, October 04, 2023 01:42:45 PM: File Retention.txt contains 170 cases, 8 variables. Start search: Wed, October 04, 2023 01:42:45 PM End search: Wed, October 04, 2023 01:42:45 PM ================================================================================ Graph Nodes: spending_per_stdt;grad_rate;stdt_clss_stndng;rjct_rate;tst_scores;stdt_accept_rate;stdt_tchr_ratio;fac_salary Graph Edges: 1. spending_per_stdt --- fac_salary 2. spending_per_stdt --- rjct_rate 3. spending_per_stdt --- stdt_tchr_ratio 4. stdt_accept_rate --- fac_salary 5. stdt_clss_stndng --- rjct_rate 6. stdt_clss_stndng --- tst_scores 7. tst_scores --- fac_salary 8. tst_scores --- grad_rate 9. tst_scores --- rjct_rate 10. tst_scores --- spending_per_stdt Graph Attributes: Score: -5181.565079 Graph Node Attributes: Score: [spending_per_stdt: -1408.4382541909688;grad_rate: -416.7933531919986;stdt_clss_stndng: -451.79480827547627;rjct_rate: -439.8087229322177;tst_scores: -330.2039598576225;stdt_accept_rate: -429.64771587695884;stdt_tchr_ratio: -208.85274641239832;fac_salary: -1496.025518245214] Interpretation of graph output The end of the file contains the causal graph edgesfrom the search procedure. Here is a key to the edge types: A --- B - There is causal relationship between variable A and B, but we cannot determine the direction of the relationship A --> B - There is a causal relationship from variable A to B The GFCI algorithm has additional edge types: A <-> B - There is an unmeasured confounder of A and B A o-> B - Either A is a cause of B or there is an unmeasured confounder of A and B or both A o-o B - Either (1) A is a cause of B or B is a cause of A, or (2) there is an unmeasured confounder of A and B, or both 1 and 2 hold. A --> B dd nl - Definitely direct causal relationship and no latent confounder A --> B pd nl - Possibly direct and no latent confounder A --> B pd pl - Possibly direct and possibly latent confounder NNote: the generated result file name is based on the system clock. Sample Prior Knowledge File From the above useage guide, we see the option of --knowledge , with which we can specify the prior knowledge file. Below is the content of a sample prior knowledge file: /knowledge addtemporal 1 spending_per_stdt fac_salary stdt_tchr_ratio 2 rjct_rate stdt_accept_rate 3 tst_scores stdt_clss_stndng 4* grad_rate forbiddirect x3 x4 requiredirect x1 x2 The first line of the prior knowledge file must say /knowledge . And a prior knowledge file consists of three sections: addtemporal - tiers of variables where the first tier preceeds the last. Adding a asterisk next to the tier id prohibits edges between tier variables forbiddirect - forbidden directed edges indicated by a list of pairs of variables: from -> to direction requireddirect - required directed edges indicated by a list of pairs of variables: from -> to direction","title":"Causal Cmd"},{"location":"causal-cmd/#causal-cmd-v110x","text":"","title":"causal-cmd v1.10.x"},{"location":"causal-cmd/#introduction","text":"Causal-cmd is a Java application that provides a Command-Line Interface (CLI) tool for causal discovery algorithms produced by the Center for Causal Discovery . The application currently includes the following algorithms: boss, bpc, ccd, cpc, cstar, fas, fask, fask-pw, fci, fcimax, fges, fges-mb, fofc, ftfc, gfci, grasp, grasp-fci, ica-ling-d, ica-lingam, images, mgm, pag-sampling-rfci, pc, pc-mb, pcmax, r-skew, r3, rfci, skew, spfci, svar-fci, svar-gfci Causal discovery algorithms are a class of search algorithms that explore a space of graphical causal models, i.e., graphical models where directed edges imply causation, for a model (or models) that are a good fit for a dataset. We suggest that newcomers to the field review Causation, Prediction and Search by Spirtes, Glymour and Scheines for a primer on the subject. Causal discovery algorithms allow a user to uncover the causal relationships between variables in a dataset. These discovered causal relationships may be used further--understanding the underlying the processes of a system (e.g., the metabolic pathways of an organism), hypothesis generation (e.g., variables that best explain an outcome), guide experimentation (e.g., what gene knockout experiments should be performed) or prediction (e.g. parameterization of the causal graph using data and then using it as a classifier).","title":"Introduction"},{"location":"causal-cmd/#command-line-usage","text":"Java 8 or higher is the only prerequisite to run the software. Note that by default Java will allocate the smaller of 1/4 system memory or 1GB to the Java virtual machine (JVM). If you run out of memory (heap memory space) running your analyses you should increase the memory allocated to the JVM with the following switch '-XmxXXG' where XX is the number of gigabytes of ram you allow the JVM to utilize. For example to allocate 8 gigabytes of ram you would add -Xmx8G immediately after the java command. In this example, we'll use download the Retention.txt file, which is a dataset containing information on college graduation and used in the publication of \"What Do College Ranking Data Tell Us About Student Retention?\" by Drudzel and Glymour, 1994. Keep in mind that causal-cmd has different switches for different algorithms. To start, type the following command in your terminal: java -jar causal-cmd--jar-with-dependencies.jar ** Note: we are using causal-cmd--jar-with-dependencies.jar to indicate the actual executable jar of specific version number that is being used. ** And you'll see the following instructions: Missing required options: algorithm, data-type, dataset, delimiter usage: java -jar Causal-cmd Project-1.10.0.jar --algorithm [--comment-marker ] --data-type --dataset [--default] --delimiter [--experimental] [--help] [--help-algo-desc] [--help-all] [--help-score-desc] [--help-test-desc] [--json-graph] [--metadata ] [--no-header] [--out ] [--prefix ] [--quote-char ] [--skip-validation] [--version] --algorithm Algorithm: boss, bpc, ccd, cpc, cstar, dagma, direct-lingam, fas, fask, fask-pw, fci, fci-iod, fci-max, fges, fges-mb, fofc, ftfc, gfci, grasp, grasp-fci, ica-ling-d, ica-lingam, images, mgm, pag-sampling-rfci, pc, pc-mb, r-boss, r-skew, r3, rfci, skew, spfci, svar-fci, svar-gfci --comment-marker Comment marker. --data-type Data type: all, continuous, covariance, discrete, mixed --dataset Dataset. Multiple files are seperated by commas. --default Use Tetrad default parameter values. --delimiter Delimiter: colon, comma, pipe, semicolon, space, tab, whitespace --experimental Show experimental algorithms, tests, and scores. --help Show help. --help-algo-desc Show all the algorithms along with their descriptions. --help-all Show all options and descriptions. --help-score-desc Show all the scores along with their descriptions. --help-test-desc Show all the independence tests along with their descriptions. --json-graph Write out graph as json. --metadata Metadata file. Cannot apply to dataset without header. --no-header Indicates tabular dataset has no header. --out Output directory --prefix Replace the default output filename prefix in the format of _. --quote-char Single character denotes quote. --skip-validation Skip validation. --version Show version. Use --help for guidance list of options. Use --help-all to show all options. By specifying an algorithm using the --algorithm switch the program will indicate the additional required switches. The program reminds the user of required switches to run. In general most algorithms also require data-type, dataset, delimiter and score. The switch --help-all displays and extended list of switches for the algorithm. Example of listing all available options for an algorithm: $ java -jar causal-cmd-1.9.0-jar-with-dependencies.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic-score --help usage: java -jar Causal-cmd Project-1.10.0.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic-score [--addOriginalDataset] [--choose-dag-in-pattern] [--choose-mag-in-pag] [--comment-marker ] [--default] [--exclude-var ] [--experimental] [--external-graph ] [--extract-struct-model] [--faithfulnessAssumed] [--generate-complete-graph] [--genereate-pag-from-dag] [--genereate-pag-from-tsdag] [--genereate-pattern-from-dag] [--json-graph] [--knowledge ] [--make-all-edges-undirected] [--make-bidirected-undirected] [--make-undirected-bidirected] [--maxDegree ] [--meekVerbose] [--metadata ] [--missing-marker ] [--no-header] [--numberResampling ] [--out ] [--parallelized] [--penaltyDiscount ] [--percentResampleSize ] [--precomputeCovariances] [--prefix ] [--quote-char ] [--resamplingEnsemble ] [--resamplingWithReplacement] [--saveBootstrapGraphs] [--seed ] [--semBicRule ] [--semBicStructurePrior ] [--skip-validation] [--symmetricFirstStep] [--timeLag ] [--verbose] --addOriginalDataset Yes, if adding the original dataset as another bootstrapping --choose-dag-in-pattern Choose DAG in Pattern graph. --choose-mag-in-pag Choose MAG in PAG. --comment-marker Comment marker. --default Use Tetrad default parameter values. --exclude-var Variables to be excluded from run. --experimental Show experimental algorithms, tests, and scores. --external-graph External graph file. --extract-struct-model Extract sturct model. --faithfulnessAssumed Yes if (one edge) faithfulness should be assumed --generate-complete-graph Generate complete graph. --genereate-pag-from-dag Generate PAG from DAG. --genereate-pag-from-tsdag Generate PAG from TsDAG. --genereate-pattern-from-dag Generate pattern graph from PAG. --json-graph Write out graph as json. --knowledge Prior knowledge file. --make-all-edges-undirected Make all edges undirected. --make-bidirected-undirected Make bidirected edges undirected. --make-undirected-bidirected Make undirected edges bidirected. --maxDegree The maximum degree of the graph (min = -1) --meekVerbose Yes if verbose output for Meek rule applications should be printed or logged --metadata Metadata file. Cannot apply to dataset without header. --missing-marker Denotes missing value. --no-header Indicates tabular dataset has no header. --numberResampling The number of bootstraps/resampling iterations (min = 0) --out Output directory --parallelized Yes if the search should be parallelized --penaltyDiscount Penalty discount (min = 0.0) --percentResampleSize The percentage of resample size (min = 10%) --precomputeCovariances True if covariance matrix should be precomputed for tubular continuous data --prefix Replace the default output filename prefix in the format of _. --quote-char Single character denotes quote. --resamplingEnsemble Ensemble method: Preserved (1), Highest (2), Majority (3) --resamplingWithReplacement Yes, if sampling with replacement (bootstrapping) --saveBootstrapGraphs Yes if individual bootstrapping graphs should be saved --seed Seed for pseudorandom number generator (-1 = off) --semBicRule Lambda: 1 = Chickering, 2 = Nandy --semBicStructurePrior Structure Prior for SEM BIC (default 0) --skip-validation Skip validation. --symmetricFirstStep Yes if the first step step for FGES should do scoring for both X->Y and Y->X --timeLag A time lag for time series data, automatically applied (zero if none) --verbose Yes if verbose output should be printed or logged In this example, we'll be running the FGES algorith on the dataset Retention.txt . $ java -jar causal-cmd-1.10.0-jar-with-dependencies.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic-score This command will output by default one file fges_.txt which is a log and result of the algorithm's activity. '--json-graph' option will enable output fges__graph.json which is a json graph from the algorithm, which is equivalent to the exported json file from tetrad-gui. Example log output from causal-cmd: ================================================================================ FGES (Wed, October 04, 2023 01:42:43 PM) ================================================================================ Runtime Parameters -------------------------------------------------------------------------------- number of threads: 7 Dataset -------------------------------------------------------------------------------- file: Retention.txt header: yes delimiter: tab quote char: none missing marker: none comment marker: none Algorithm Run -------------------------------------------------------------------------------- algorithm: FGES score: Sem BIC Score Algorithm Parameters -------------------------------------------------------------------------------- addOriginalDataset: no faithfulnessAssumed: no maxDegree: 1000 meekVerbose: no numberResampling: 0 parallelized: no penaltyDiscount: 2.0 percentResampleSize: 100 precomputeCovariances: no resamplingEnsemble: 1 resamplingWithReplacement: no saveBootstrapGraphs: no seed: -1 semBicRule: 1 semBicStructurePrior: 0.0 symmetricFirstStep: no timeLag: 0 verbose: no Wed, October 04, 2023 01:42:45 PM: Start data validation on file Retention.txt. Wed, October 04, 2023 01:42:45 PM: End data validation on file Retention.txt. There are 170 cases and 8 variables. Wed, October 04, 2023 01:42:45 PM: Start reading in file Retention.txt. Wed, October 04, 2023 01:42:45 PM: Finished reading in file Retention.txt. Wed, October 04, 2023 01:42:45 PM: File Retention.txt contains 170 cases, 8 variables. Start search: Wed, October 04, 2023 01:42:45 PM End search: Wed, October 04, 2023 01:42:45 PM ================================================================================ Graph Nodes: spending_per_stdt;grad_rate;stdt_clss_stndng;rjct_rate;tst_scores;stdt_accept_rate;stdt_tchr_ratio;fac_salary Graph Edges: 1. spending_per_stdt --- fac_salary 2. spending_per_stdt --- rjct_rate 3. spending_per_stdt --- stdt_tchr_ratio 4. stdt_accept_rate --- fac_salary 5. stdt_clss_stndng --- rjct_rate 6. stdt_clss_stndng --- tst_scores 7. tst_scores --- fac_salary 8. tst_scores --- grad_rate 9. tst_scores --- rjct_rate 10. tst_scores --- spending_per_stdt Graph Attributes: Score: -5181.565079 Graph Node Attributes: Score: [spending_per_stdt: -1408.4382541909688;grad_rate: -416.7933531919986;stdt_clss_stndng: -451.79480827547627;rjct_rate: -439.8087229322177;tst_scores: -330.2039598576225;stdt_accept_rate: -429.64771587695884;stdt_tchr_ratio: -208.85274641239832;fac_salary: -1496.025518245214]","title":"Command Line Usage"},{"location":"causal-cmd/#interpretation-of-graph-output","text":"The end of the file contains the causal graph edgesfrom the search procedure. Here is a key to the edge types: A --- B - There is causal relationship between variable A and B, but we cannot determine the direction of the relationship A --> B - There is a causal relationship from variable A to B The GFCI algorithm has additional edge types: A <-> B - There is an unmeasured confounder of A and B A o-> B - Either A is a cause of B or there is an unmeasured confounder of A and B or both A o-o B - Either (1) A is a cause of B or B is a cause of A, or (2) there is an unmeasured confounder of A and B, or both 1 and 2 hold. A --> B dd nl - Definitely direct causal relationship and no latent confounder A --> B pd nl - Possibly direct and no latent confounder A --> B pd pl - Possibly direct and possibly latent confounder NNote: the generated result file name is based on the system clock.","title":"Interpretation of graph output"},{"location":"causal-cmd/#sample-prior-knowledge-file","text":"From the above useage guide, we see the option of --knowledge , with which we can specify the prior knowledge file. Below is the content of a sample prior knowledge file: /knowledge addtemporal 1 spending_per_stdt fac_salary stdt_tchr_ratio 2 rjct_rate stdt_accept_rate 3 tst_scores stdt_clss_stndng 4* grad_rate forbiddirect x3 x4 requiredirect x1 x2 The first line of the prior knowledge file must say /knowledge . And a prior knowledge file consists of three sections: addtemporal - tiers of variables where the first tier preceeds the last. Adding a asterisk next to the tier id prohibits edges between tier variables forbiddirect - forbidden directed edges indicated by a list of pairs of variables: from -> to direction requireddirect - required directed edges indicated by a list of pairs of variables: from -> to direction","title":"Sample Prior Knowledge File"},{"location":"causal-rest-api/","text":"Causal REST API v0.0.8 This RESTful API is designed for causal web. And it implements the JAX-RS specifications using Jersey. Table of Contents Installation Prerequisites Dependencies Configuration Start the API Server API Usage and Examples Getting JSON Web Token(JWT) 1. Data Management Upload small data file Resumable data file upload List all dataset files of a user Get the detail information of a dataset file based on ID Delete physical dataset file and all records from database for a given file ID Summarize dataset file List all prior knowledge files of a given user Get the detail information of a prior knowledge file based on ID Delete physical prior knowledge file and all records from database for a given file ID 2. Causal Discovery List all the available causal discovery algorithms Add a new job to run the desired algorithm on a given data file List all running jobs Check the job status for a given job ID Cancel a running job 3. Result Management List all result files generated by the algorithm Download a specific result file generated by the algorithm based on file name Compare algorithm result files List all the comparison files Download a specific comparison file based on file name Installation The following installation instructions are supposed to be used by the server admin who deploys this API server. API users can skip this section and just start reading from the API Usage and Examples section. Prerequisites You must have the following installed to build/install Causal REST API: Oracle Java SE Development Kit 8 Maven 3.x Dependencies If you want to run this API server and expose the API to your users, you'll first need to have the Causal Web Application installed and running. Your API users will use this web app to create their user accounts before they can consume the API. Note: currently new users can also be created using Auth0 login option, but the API doesn't work for these users. In order to build the API server, you'll need the released version of ccd-commons-0.3.1 by going to the repo and checkout this specific release version: git clone https://github.com/bd2kccd/ccd-commons.git cd ccd-commons git checkout tags/v0.3.1 mvn clean install You'll also need to download released ccd-db-0.6.3 : git clone https://github.com/bd2kccd/ccd-db.git cd ccd-db git checkout tags/v0.6.3 mvn clean install Then you can go get and install causal-rest-api : git clone https://github.com/bd2kccd/causal-rest-api.git cd causal-rest-api mvn clean package Configuration There are 4 configuration files to configure located at causal-rest-api/src/main/resources : - application-hsqldb.properties : HSQLDB database configurations (for testing only). - application-mysql.properties : MySQL database configurations - application-slurm.properties : Slurm setting for HPC - application.properties : Spring Boot application settings - causal.properties : Data file directory path and folder settings Befor editing the causal.properties file, you need to create a workspace for the application to work in. Create a directory called workspace, for an example /home/zhy19/ccd/workspace . Inside the workspace directory, create another folder called lib . Then build the jar file of Tetred using the latest development branch . After that, copy the jar file to the lib folder created earlier. Start the API Server Once you have all the settings configured, go to causal-rest-api/target and you will find the jar file named causal-rest-api.jar . Then simply run java -jar causal-rest-api.jar API Usage and Examples In the following sections, we'll demonstrate the API usage with examples using the API server that is running on Pittsburgh Super Computing. The API base URI is https:// >/ccd-api. This API requires user to be authenticated. Before using this API, the user creates an account in the Causal Web App. Getting JSON Web Token(JWT) After registration in Causal Web App, the email and password can be used to authenticate against the Causal REST API to get the access token (we use JWT) via HTTP Basic Auth . API Endpoint URI pattern: GET https:///ccd-api/jwt In basic auth, the user provides the username and password, which the HTTP client concatenates (username + \":\" + password), and base64 encodes it. This encoded string is then sent using a Authorization header with the \"Basic\" schema. For instance user email demo@pitt.edu whose password is 123 . POST /ccd-api/jwt HTTP/1.1 Host: Authorization: Basic ZGVtb0BwaXR0LmVkdToxMjM= Once the request is processed successfully, the user ID together with a JWT will be returned in the response for further API queries. { \"userId\": 22, \"jwt\": \"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA0Mjg1OTcsImlhdCI6MTQ3NTg0NjgyODU5N30.FcE7aEpg0u2c-gUVugIjJkzjhlDu5qav_XHtgLu3c6E\", \"issuedTime\": 1475846828597, \"lifetime\": 3600, \"expireTime\": 1475850428597, \"wallTime\": [ 1, 3, 6 ] } We'll need to use this userId in the URI path of all subsequent requests. And this jwt expires in 3600 seconds(1 hour), so the API consumer will need to request for another JWT otherwise the API query to other API endpoints will be denied. And this JWT will need to be sent via the HTTP Authorization header as well, but using the Bearer schema. The wallTime field is designed for users who want to specify the the maximum CPU time when Slurm handles the jobs on PSC. Normally, a job is expected to finish before the specified maximum walltime. After the walltime reaches the maximum, the job terminates regardless whether the job processes are still running or not. In this example, you can pick 1 hour, 3 or 6 hours as the wallTime. Note: querying the JWT endpoint again before the current JWT expires will generate a new JWT, which makes the old JWT expired automatically. And this newly generated JWT will be valid in another hour unless there's another new JWT being queried. Since this API is developed with Jersey, which supports WADL . So you can view the generated WADL by going to https:///ccd-api/application.wadl?detail=true and see all resource available in the application. Accessing to this endpoint doesn't require authentication. Basically, all the API usage examples are grouped into three categories: Data Management Causal Discovery Result Management And all the following examples will be issued by user 22 whose password is 123 . 1. Data Management Upload small data file At this point, you can upload two types of data files: tabular dataset file(either tab delimited or comma delimited) and prior knowledge file. API Endpoint URI pattern: POST https:///ccd-api/{userId}/dataset/upload This is a multipart file upload via an HTML form, and the client is required to use name=\"file\" to name their file upload field in their form. Generated HTTP request code example: POST /ccd-api/22/dataset/upload HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW ----WebKitFormBoundary7MA4YWxkTrZu0gW Content-Disposition: form-data; name=\"file\"; filename=\"\" Content-Type: ----WebKitFormBoundary7MA4YWxkTrZu0gW If the Authorization header is not provided, the response will look like this: { \"timestamp\": 1465414501443, \"status\": 401, \"error\": \"Unauthorized\", \"message\": \"User credentials are required.\", \"path\": \"/22/dataset/upload\" } This POST request will upload the dataset file to the target server location and add corresponding records into database. And the response will contain the following pieces: { \"id\": 6, \"name\": \"Lung-tetrad_hv.txt\", \"creationTime\": 1466622267000, \"lastModifiedTime\": 1466622267000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\", \"fileSummary\": { \"variableType\": null, \"fileDelimiter\": null, \"numOfRows\": null, \"numOfColumns\": null } } The prior knowledge file upload uses a similar API endpoint: POST https:///ccd-api/{userId}/priorknowledge/upload Due to there's no need to summarize a prior knowledge file, the response of a successful prior knowledge file upload will look like: { \"id\": 6, \"name\": \"Lung-tetrad_hv.txt\", \"creationTime\": 1466622267000, \"lastModifiedTime\": 1466622267000, \"fileSize\": 3309465, \"md5checkSum\": \"ugdb7511rt293d29ke3055d9a7b46c9k\" } Resumable data file upload In addition to the regular file upload described in Example 6, we also provide the option of stable and resumable large file upload. It requires the client side to have a resumable upload implementation. We currently support client integrated with Resumable.js , whihc provides multiple simultaneous, stable and resumable uploads via the HTML5 File API. You can also create your own client as long as al the following parameters are set correctly. API Endpoint URI pattern: GET https:///ccd-api/{userId}/chunkupload POST https:///ccd-api/{userId}/chunkupload In this example, the data file is splited into 3 chunks. The upload of each chunk consists of a GET request and a POST request. To handle the state of upload chunks, a number of extra parameters are sent along with all requests: resumableChunkNumber : The index of the chunk in the current upload. First chunk is 1 (no base-0 counting here). resumableChunkSize : The general chunk size. Using this value and resumableTotalSize you can calculate the total number of chunks. Please note that the size of the data received in the HTTP might be lower than resumableChunkSize of this for the last chunk for a file. resumableCurrentChunkSize : The size of the current resumable chuck. resumableTotalSize : The total file size. resumableType : The file type of the resumable chuck, e.e., \"text/plain\". resumableIdentifier : A unique identifier for the file contained in the request. resumableFilename : The original file name (since a bug in Firefox results in the file name not being transmitted in chunk multipart posts). resumableRelativePath : The file's relative path when selecting a directory (defaults to file name in all browsers except Chrome). resumableTotalChunks : The total number of chunks. Generated HTTP request code example: GET /ccd-api/22/chunkupload?resumableChunkNumber=2&resumableChunkSize=1048576&resumableCurrentChunkSize=1048576&resumableTotalSize=3309465&resumableType=text%2Fplain&resumableIdentifier=3309465-large-datatxt&resumableFilename=large-data.txt&resumableRelativePath=large-data.txt&resumableTotalChunks=3 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY This GET request checks if the data chunk is already on the server side. If the target file chunk is not found on the server, the client will issue a POST request to upload the actual data. Generated HTTP request code example: POST /ccd-api/22/chunkupload HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryMFjgApg56XGyeTnZ ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableChunkNumber\" 2 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableChunkSize\" 1048576 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableCurrentChunkSize\" 1048576 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableTotalSize\" 3309465 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableType\" text/plain ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableIdentifier\" 3309465-large-datatxt ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableFilename\" large-data.txt ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableRelativePath\" large-data.txt ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableTotalChunks\" 3 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"file\"; filename=\"blob\" Content-Type: application/octet-stream ------WebKitFormBoundaryMFjgApg56XGyeTnZ-- Each chunk upload POST will get a 200 status code from response if everything works fine. And finally the md5checkSum string of the reassemabled file will be returned once the whole file has been uploaded successfully. In this example, the POST request that uploads the third chunk will response this: b1db7511ee293d297e3055d9a7b46c5e List all dataset files of a user API Endpoint URI pattern: GET https:///ccd-api/{userId}/dataset Generated HTTP request code example: GET /ccd-api/22/dataset HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Accept: application/json A JSON formatted list of all the input dataset files that are associated with user 22 will be returned. [ { \"id\": 8, \"name\": \"data_small.txt\", \"creationTime\": 1467132449000, \"lastModifiedTime\": 1467132449000, \"fileSize\": 278428, \"md5checkSum\": \"ed5f27a2cf94fe3735a5d9ed9191c382\", \"fileSummary\": { \"variableType\": \"continuous\", \"fileDelimiter\": \"tab\", \"numOfRows\": 302, \"numOfColumns\": 123 } }, { \"id\": 10, \"name\": \"large-data.txt\", \"creationTime\": 1467134048000, \"lastModifiedTime\": 1467134048000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\", \"fileSummary\": { \"variableType\": null, \"fileDelimiter\": null, \"numOfRows\": null, \"numOfColumns\": null } }, { \"id\": 11, \"name\": \"Lung-tetrad_hv (copy).txt\", \"creationTime\": 1467140415000, \"lastModifiedTime\": 1467140415000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\", \"fileSummary\": { \"variableType\": \"continuous\", \"fileDelimiter\": \"tab\", \"numOfRows\": 302, \"numOfColumns\": 608 } } ] You can also specify the response format as XML in your request Generated HTTP request code example: GET /ccd-api/22/dataset HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Accept: application/xml And the response will look like this: 8 data_small.txt 2016-06-28T12:47:29-04:00 2016-06-28T12:47:29-04:00 278428 ed5f27a2cf94fe3735a5d9ed9191c382 tab 123 302 continuous 10 large-data.txt 2016-06-28T13:14:08-04:00 2016-06-28T13:14:08-04:00 3309465 b1db7511ee293d297e3055d9a7b46c5e 11 Lung-tetrad_hv (copy).txt 2016-06-28T15:00:15-04:00 2016-06-28T15:00:15-04:00 3309465 b1db7511ee293d297e3055d9a7b46c5e tab 608 302 continuous Form the above output, we can also tell that data file with ID 10 doesn't have all the fileSummary field values set, we'll cover this in the dataset summarization section. Get the detail information of a dataset file based on ID API Endpoint URI pattern: GET https:///ccd-api/{userId}/dataset/{id} Generated HTTP request code example: GET /ccd-api/22/dataset/8 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY And the resulting response looks like this: { \"id\": 8, \"name\": \"data_small.txt\", \"creationTime\": 1467132449000, \"lastModifiedTime\": 1467132449000, \"fileSize\": 278428, \"fileSummary\": { \"md5checkSum\": \"ed5f27a2cf94fe3735a5d9ed9191c382\", \"variableType\": \"continuous\", \"fileDelimiter\": \"tab\", \"numOfRows\": 302, \"numOfColumns\": 123 } } Delete physical dataset file and all records from database for a given file ID API Endpoint URI pattern: DELETE https:///ccd-api/{userId}/dataset/{id} Generated HTTP request code example: DELETE /ccd-api/22/dataset/8 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY And this will result a HTTP 204 No Content status in response on success, which means the server successfully processed the deletion request but there's no content to response. Summarize dataset file So from the first example we can tell that file with ID 10 doesn't have variableType , fileDelimiter , numOfRows , and numOfColumns specified under fileSummary . Among these attributes, variableType and fileDelimiter` are the ones that users will need to provide during this summarization process. Before we can go ahead to run the desired algorithm with the newly uploaded data file, we'll need to summarize the data by specifing the variable type and file delimiter. Required Fields Description id The data file ID variableType discrete or continuous fileDelimiter tab or comma API Endpoint URI pattern: POST https:///ccd-api/{userId}/dataset/summarize Generated HTTP request code example: POST /ccd-api/22/dataset/summarize HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: application/json { \"id\": 1, \"variableType\": \"continuous\", \"fileDelimiter\": \"comma\" } This POST request will summarize the dataset file and generate a response (JSON or XML) like below: { \"id\": 10, \"name\": \"large-data.txt\", \"creationTime\": 1467134048000, \"lastModifiedTime\": 1467134048000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\", \"fileSummary\": { \"variableType\": \"continuous\", \"fileDelimiter\": \"tab\", \"numOfRows\": 302, \"numOfColumns\": 608 } } List all prior knowledge files of a given user API Endpoint URI pattern: GET https:///ccd-api/{userId}/priorknowledge Generated HTTP request code example: GET /ccd-api/22/priorknowledge HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Accept: application/json A JSON formatted list of all the input dataset files that are associated with user 22 will be returned. [ { \"id\": 9, \"name\": \"data_small.prior\", \"creationTime\": 1467132449000, \"lastModifiedTime\": 1467132449000, \"fileSize\": 278428, \"md5checkSum\": \"ed5f27a2cf94fe3735a5d9ed9191c382\" }, { \"id\": 12, \"name\": \"large-data.prior\", \"creationTime\": 1467134048000, \"lastModifiedTime\": 1467134048000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\" } ] Get the detail information of a prior knowledge file based on ID API Endpoint URI pattern: GET https:///ccd-api/{userId}/priorknowledge/{id} Generated HTTP request code example: GET /ccd-api/22/priorknowledge/9 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY And the resulting response looks like this: { \"id\": 9, \"name\": \"data_small.prior\", \"creationTime\": 1467132449000, \"lastModifiedTime\": 1467132449000, \"fileSize\": 278428, \"md5checkSum\": \"ed5f27a2cf94fe3735a5d9ed9191c382\" } Delete physical prior knowledge file and all records from database for a given file ID API Endpoint URI pattern: DELETE https:///ccd-api/{userId}/priorknowledge/{id} Generated HTTP request code example: DELETE /ccd-api/22/priorknowledge/9 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY And this will result a HTTP 204 No Content status in response on success, which means the server successfully processed the deletion request but there's no content to response. 2. Causal Discovery Once the data file is uploaded and summaried, you can start running a Causal Discovery Algorithm on the uploaded data file. List all the available causal discovery algorithms API Endpoint URI pattern: GET https:///ccd-api/{userId}/algorithms Generated HTTP request code example: GET /ccd-api/22/algorithms HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY [ { \"id\": 1, \"name\": \"FGESc\", \"description\": \"FGES continuous\" }, { \"id\": 2, \"name\": \"FGESd\", \"description\": \"FGES discrete\" }, { \"id\": 3, \"name\": \"GFCIc\", \"description\": \"GFCI continuous\" }, { \"id\": 4, \"name\": \"GFCId\", \"description\": \"GFCI discrete\" } ] Currently we support \"FGES continuous\", \"FGES discrete\", \"GFCI continuous\", and \"GFCI discrete\". They also share a common JSON structure as of their input, for example: Input JSON Fields Description datasetFileId The dataset file ID, integer priorKnowledgeFileId The optional prior knowledge file ID, integer dataValidation Algorithm specific input data validation flags, JSON object algorithmParameters Algorithm specific parameters, JSON object jvmOptions Advanced Options For Java Virtual Machine (JVM), JSON object. Currently only support maxHeapSize (Gigabyte, max value is 100) hpcParameters Parameters for High-Performance Computing, JSON array of key-value objects. Currently only support wallTime Below are the data validation flags and parameters that you can use for each algorithm. FGES continuous Data validation: Parameters Description Default Value skipNonzeroVariance Skip check for zero variance variables false skipUniqueVarName Skip check for unique variable names false Algorithm parameters: Parameters Description Default Value faithfulnessAssumed Yes if (one edge) faithfulness should be assumed true maxDegree The maximum degree of the output graph 100 penaltyDiscount Penalty discount 4.0 verbose Print additional information true FGES discrete Data validation: Parameters Description Default Value skipUniqueVarName Skip check for unique variable names false skipCategoryLimit Skip 'limit number of categories' check false Algorithm parameters: Parameters Description Default Value structurePrior Structure prior coefficient 1.0 samplePrior Sample prior 1.0 maxDegree The maximum degree of the output graph 100 faithfulnessAssumed Yes if (one edge) faithfulness should be assumed true verbose Print additional information true GFCI continuous Data validation: Parameters Description Default Value skipNonzeroVariance Skip check for zero variance variables false skipUniqueVarName Skip check for unique variable names false Algorithm parameters: Parameters Description Default Value alpha Cutoff for p values (alpha) 0.01 penaltyDiscount Penalty discount 4.0 maxDegree The maximum degree of the output graph 100 faithfulnessAssumed Yes if (one edge) faithfulness should be assumed false verbose Print additional information true GFCI discrete Data validation: Parameters Description Default Value skipUniqueVarName Skip check for unique variable names false skipCategoryLimit Skip 'limit number of categories' check false Algorithm parameters: Parameters Description Default Value alpha Cutoff for p values (alpha) 0.01 structurePrior Structure prior coefficient 1.0 samplePrior Sample prior 1.0 maxDegree The maximum degree of the output graph 100 faithfulnessAssumed Yes if (one edge) faithfulness should be assumed false verbose Print additional information true Add a new job to run the desired algorithm on a given data file This is a POST request and the algorithm details and data file id will need to be specified in the POST body as a JSON when you make the request. API Endpoint URI pattern: POST https:///ccd-api/{userId}/jobs/FGESc Generated HTTP request code example: POST /ccd-api/22/jobs/FGESc HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: application/json { \"datasetFileId\": 8, \"priorKnowledgeFileId\": 9, \"dataValidation\": { \"skipNonzeroVariance\": true, \"skipUniqueVarName\": true }, \"algorithmParameters\": { \"penaltyDiscount\": 5.0, \"maxDegree\": 100 }, \"jvmOptions\": { \"maxHeapSize\": 100 }, \"hpcParameters\": [ { \"key\":\"wallTime\", \"value\":1 } ] } In this example, we are running the \"FGES continuous\" algorithm on the file of ID 8. We also set the wallTime as 1 hour. And this call will return the job info with a 201 Created response status code. { \"id\": 5, \"algorithmName\": \"FGESc\", \"status\": 0, \"addedTime\": 1472742564355, \"resultFileName\": \"FGESc_data_small.txt_1472742564353.txt\", \"errorResultFileName\": \"error_FGESc_data_small.txt_1472742564353.txt\" } From this response we can tell that the job ID is 5, and the result file name will be FGESc_data_small.txt_1472742564353.txt if everything goes well. If something is wrong an error result file with name error_FGEsc_data_small.txt_1472742564353.txt will be created. When you need to run \"FGES discrete\", just send the request to a different endpont URI: API Endpoint URI pattern: POST https:///ccd-api/{userId}/jobs/FGESd Generated HTTP request code example: POST /ccd-api/22/jobs/FGESd HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: application/json { \"datasetFileId\": 10, \"priorKnowledgeFileId\": 12, \"dataValidation\": { \"skipUniqueVarName\": true, \"skipCategoryLimit\": true }, \"algorithmParameters\": { \"structurePrior\": 1.0, \"samplePrior\": 1.0, \"maxDegree\": 102 }, \"jvmOptions\": { \"maxHeapSize\": 100 } } List all running jobs API Endpoint URI pattern: GET https:///ccd-api/{userId}/jobs Generated HTTP request code example: GET /ccd-api/22/jobs/ HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: application/json Then you'll see the information of all jobs that are currently running: [ { \"id\": 32, \"algorithmName\": \"FGESc\", \"addedTime\": 1468436085000 }, { \"id\": 33, \"algorithmName\": \"FGESd\", \"addedTime\": 1468436087000 } ] Check the job status for a given job ID Once the new job is submitted, it takes time and resources to run the algorithm on the server. During the waiting, you can check the status of a given job ID: API Endpoint URI pattern: GET https:///ccd-api/{userId}/jobs/{id} Generated HTTP request code example: GET /ccd-api/22/jobs/32 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY This will either return \"Pending\" or \"Completed\". Cancel a running job Sometimes you may want to cancel a submitted job. API Endpoint URI pattern: DELETE https:///ccd-api/{userId}/jobs/{id} Generated HTTP request code example: DELETE /ccd-api/22/jobs/8 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY This call will response either \"Job 8 has been canceled\" or \"Unable to cancel job 8\". It's not guranteed that the system can always cencal a job successfully. 3. Result Management List all result files generated by the algorithm API Endpoint URI pattern: GET https:///ccd-api/{userId}/results Generated HTTP request code example: GET /ccd-api/22/results HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY The response to this request will look like this: [ { \"name\": \"FGESc_sim_data_20vars_100cases.csv_1466171729046.txt\", \"creationTime\": 1466171732000, \"lastModifiedTime\": 1466171732000, \"fileSize\": 1660 }, { \"name\": \"FGESc_data_small.txt_1466172140585.txt\", \"creationTime\": 1466172145000, \"lastModifiedTime\": 1466172145000, \"fileSize\": 39559 } ] Download a specific result file generated by the algorithm based on file name API Endpoint URI pattern: GET https:///ccd-api/{userId}/results/{result_file_name} Generated HTTP request code example: GET /ccd-api/22/results/FGESc_data_small.txt_1466172140585.txt HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY On success, you will get the result file back as text file content. If there's a typo in file name of the that file doesn't exist, you'll get either a JSON or XML message based on the accept header in your request: The response to this request will look like this: { \"timestamp\": 1467210996233, \"status\": 404, \"error\": \"Not Found\", \"message\": \"Resource not found.\", \"path\": \"/22/results/FGESc_data_small.txt_146172140585.txt\" } Compare algorithm result files Since we can list all the algorithm result files, based on the results, we can also choose multiple files and run a comparison. API Endpoint URI pattern: POST https:///ccd-api/{userId}/results/compare The request body is a JSON that contains an array of result files to be compared. Generated HTTP request code example: POST /ccd-api/22/results/compare HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY { \"resultFiles\": [ \"FGESc_sim_data_20vars_100cases.csv_1466171729046.txt\", \"FGESc_data_small.txt_1467305104859.txt\" ] } When you specify multiple file names, use the !! as a delimiter. This request will generate a result comparison file with the following content (shortened version): FGESc_sim_data_20vars_100cases.csv_1466171729046.txt FGESc_data_small.txt_1467305104859.txt Edges In All Same End Point NR4A2,FOS 0 0 X5,X17 0 0 MMP11,ASB5 0 0 X12,X8 0 0 hsa_miR_654_3p,hsa_miR_337_3p 0 0 RND1,FGA 0 0 HHLA2,UBXN10 0 0 HS6ST2,RND1 0 0 SCRG1,hsa_miR_377 0 0 CDH3,diag 0 0 SERPINI2,FGG 0 0 hsa_miR_451,hsa_miR_136_ 0 0 From this comparison, you can see if the two algorithm graphs have common edges and endpoints. List all the comparison files API Endpoint URI pattern: GET https:///ccd-api/{userId}/results/comparisons Generated HTTP request code example: GET /ccd-api/22/results/comparisons HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY The response will show a list of comparison files: [ { \"name\": \"result_comparison_1467385923407.txt\", \"creationTime\": 1467385923000, \"lastModifiedTime\": 1467385923000, \"fileSize\": 7505 }, { \"name\": \"result_comparison_1467387034358.txt\", \"creationTime\": 1467387034000, \"lastModifiedTime\": 1467387034000, \"fileSize\": 7505 }, { \"name\": \"result_comparison_1467388042261.txt\", \"creationTime\": 1467388042000, \"lastModifiedTime\": 1467388042000, \"fileSize\": 7533 } ] Download a specific comparison file based on file name API Endpoint URI pattern: GET https:///ccd-api/{userId}/results/comparisons/{comparison_file_name} Generated HTTP request code example: GET /ccd-api/22/results/comparisons/result_comparison_1467388042261.txt HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Then it returns the content of that comparison file (shorted version): FGESc_sim_data_20vars_100cases.csv_1466171729046.txt FGESc_data_small.txt_1467305104859.txt Edges In All Same End Point NR4A2,FOS 0 0 X5,X17 0 0 MMP11,ASB5 0 0 X12,X8 0 0 hsa_miR_654_3p,hsa_miR_337_3p 0 0 RND1,FGA 0 0 HHLA2,UBXN10 0 0 HS6ST2,RND1 0 0 SCRG1,hsa_miR_377 0 0 CDH3,diag 0 0 SERPINI2,FGG 0 0","title":"Causal REST API"},{"location":"causal-rest-api/#causal-rest-api-v008","text":"This RESTful API is designed for causal web. And it implements the JAX-RS specifications using Jersey. Table of Contents Installation Prerequisites Dependencies Configuration Start the API Server API Usage and Examples Getting JSON Web Token(JWT) 1. Data Management Upload small data file Resumable data file upload List all dataset files of a user Get the detail information of a dataset file based on ID Delete physical dataset file and all records from database for a given file ID Summarize dataset file List all prior knowledge files of a given user Get the detail information of a prior knowledge file based on ID Delete physical prior knowledge file and all records from database for a given file ID 2. Causal Discovery List all the available causal discovery algorithms Add a new job to run the desired algorithm on a given data file List all running jobs Check the job status for a given job ID Cancel a running job 3. Result Management List all result files generated by the algorithm Download a specific result file generated by the algorithm based on file name Compare algorithm result files List all the comparison files Download a specific comparison file based on file name","title":"Causal REST API v0.0.8"},{"location":"causal-rest-api/#installation","text":"The following installation instructions are supposed to be used by the server admin who deploys this API server. API users can skip this section and just start reading from the API Usage and Examples section.","title":"Installation"},{"location":"causal-rest-api/#prerequisites","text":"You must have the following installed to build/install Causal REST API: Oracle Java SE Development Kit 8 Maven 3.x","title":"Prerequisites"},{"location":"causal-rest-api/#dependencies","text":"If you want to run this API server and expose the API to your users, you'll first need to have the Causal Web Application installed and running. Your API users will use this web app to create their user accounts before they can consume the API. Note: currently new users can also be created using Auth0 login option, but the API doesn't work for these users. In order to build the API server, you'll need the released version of ccd-commons-0.3.1 by going to the repo and checkout this specific release version: git clone https://github.com/bd2kccd/ccd-commons.git cd ccd-commons git checkout tags/v0.3.1 mvn clean install You'll also need to download released ccd-db-0.6.3 : git clone https://github.com/bd2kccd/ccd-db.git cd ccd-db git checkout tags/v0.6.3 mvn clean install Then you can go get and install causal-rest-api : git clone https://github.com/bd2kccd/causal-rest-api.git cd causal-rest-api mvn clean package","title":"Dependencies"},{"location":"causal-rest-api/#configuration","text":"There are 4 configuration files to configure located at causal-rest-api/src/main/resources : - application-hsqldb.properties : HSQLDB database configurations (for testing only). - application-mysql.properties : MySQL database configurations - application-slurm.properties : Slurm setting for HPC - application.properties : Spring Boot application settings - causal.properties : Data file directory path and folder settings Befor editing the causal.properties file, you need to create a workspace for the application to work in. Create a directory called workspace, for an example /home/zhy19/ccd/workspace . Inside the workspace directory, create another folder called lib . Then build the jar file of Tetred using the latest development branch . After that, copy the jar file to the lib folder created earlier.","title":"Configuration"},{"location":"causal-rest-api/#start-the-api-server","text":"Once you have all the settings configured, go to causal-rest-api/target and you will find the jar file named causal-rest-api.jar . Then simply run java -jar causal-rest-api.jar","title":"Start the API Server"},{"location":"causal-rest-api/#api-usage-and-examples","text":"In the following sections, we'll demonstrate the API usage with examples using the API server that is running on Pittsburgh Super Computing. The API base URI is https:// >/ccd-api. This API requires user to be authenticated. Before using this API, the user creates an account in the Causal Web App.","title":"API Usage and Examples"},{"location":"causal-rest-api/#getting-json-web-tokenjwt","text":"After registration in Causal Web App, the email and password can be used to authenticate against the Causal REST API to get the access token (we use JWT) via HTTP Basic Auth . API Endpoint URI pattern: GET https:///ccd-api/jwt In basic auth, the user provides the username and password, which the HTTP client concatenates (username + \":\" + password), and base64 encodes it. This encoded string is then sent using a Authorization header with the \"Basic\" schema. For instance user email demo@pitt.edu whose password is 123 . POST /ccd-api/jwt HTTP/1.1 Host: Authorization: Basic ZGVtb0BwaXR0LmVkdToxMjM= Once the request is processed successfully, the user ID together with a JWT will be returned in the response for further API queries. { \"userId\": 22, \"jwt\": \"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA0Mjg1OTcsImlhdCI6MTQ3NTg0NjgyODU5N30.FcE7aEpg0u2c-gUVugIjJkzjhlDu5qav_XHtgLu3c6E\", \"issuedTime\": 1475846828597, \"lifetime\": 3600, \"expireTime\": 1475850428597, \"wallTime\": [ 1, 3, 6 ] } We'll need to use this userId in the URI path of all subsequent requests. And this jwt expires in 3600 seconds(1 hour), so the API consumer will need to request for another JWT otherwise the API query to other API endpoints will be denied. And this JWT will need to be sent via the HTTP Authorization header as well, but using the Bearer schema. The wallTime field is designed for users who want to specify the the maximum CPU time when Slurm handles the jobs on PSC. Normally, a job is expected to finish before the specified maximum walltime. After the walltime reaches the maximum, the job terminates regardless whether the job processes are still running or not. In this example, you can pick 1 hour, 3 or 6 hours as the wallTime. Note: querying the JWT endpoint again before the current JWT expires will generate a new JWT, which makes the old JWT expired automatically. And this newly generated JWT will be valid in another hour unless there's another new JWT being queried. Since this API is developed with Jersey, which supports WADL . So you can view the generated WADL by going to https:///ccd-api/application.wadl?detail=true and see all resource available in the application. Accessing to this endpoint doesn't require authentication. Basically, all the API usage examples are grouped into three categories: Data Management Causal Discovery Result Management And all the following examples will be issued by user 22 whose password is 123 .","title":"Getting JSON Web Token(JWT)"},{"location":"causal-rest-api/#1-data-management","text":"","title":"1. Data Management"},{"location":"causal-rest-api/#upload-small-data-file","text":"At this point, you can upload two types of data files: tabular dataset file(either tab delimited or comma delimited) and prior knowledge file. API Endpoint URI pattern: POST https:///ccd-api/{userId}/dataset/upload This is a multipart file upload via an HTML form, and the client is required to use name=\"file\" to name their file upload field in their form. Generated HTTP request code example: POST /ccd-api/22/dataset/upload HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW ----WebKitFormBoundary7MA4YWxkTrZu0gW Content-Disposition: form-data; name=\"file\"; filename=\"\" Content-Type: ----WebKitFormBoundary7MA4YWxkTrZu0gW If the Authorization header is not provided, the response will look like this: { \"timestamp\": 1465414501443, \"status\": 401, \"error\": \"Unauthorized\", \"message\": \"User credentials are required.\", \"path\": \"/22/dataset/upload\" } This POST request will upload the dataset file to the target server location and add corresponding records into database. And the response will contain the following pieces: { \"id\": 6, \"name\": \"Lung-tetrad_hv.txt\", \"creationTime\": 1466622267000, \"lastModifiedTime\": 1466622267000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\", \"fileSummary\": { \"variableType\": null, \"fileDelimiter\": null, \"numOfRows\": null, \"numOfColumns\": null } } The prior knowledge file upload uses a similar API endpoint: POST https:///ccd-api/{userId}/priorknowledge/upload Due to there's no need to summarize a prior knowledge file, the response of a successful prior knowledge file upload will look like: { \"id\": 6, \"name\": \"Lung-tetrad_hv.txt\", \"creationTime\": 1466622267000, \"lastModifiedTime\": 1466622267000, \"fileSize\": 3309465, \"md5checkSum\": \"ugdb7511rt293d29ke3055d9a7b46c9k\" }","title":"Upload small data file"},{"location":"causal-rest-api/#resumable-data-file-upload","text":"In addition to the regular file upload described in Example 6, we also provide the option of stable and resumable large file upload. It requires the client side to have a resumable upload implementation. We currently support client integrated with Resumable.js , whihc provides multiple simultaneous, stable and resumable uploads via the HTML5 File API. You can also create your own client as long as al the following parameters are set correctly. API Endpoint URI pattern: GET https:///ccd-api/{userId}/chunkupload POST https:///ccd-api/{userId}/chunkupload In this example, the data file is splited into 3 chunks. The upload of each chunk consists of a GET request and a POST request. To handle the state of upload chunks, a number of extra parameters are sent along with all requests: resumableChunkNumber : The index of the chunk in the current upload. First chunk is 1 (no base-0 counting here). resumableChunkSize : The general chunk size. Using this value and resumableTotalSize you can calculate the total number of chunks. Please note that the size of the data received in the HTTP might be lower than resumableChunkSize of this for the last chunk for a file. resumableCurrentChunkSize : The size of the current resumable chuck. resumableTotalSize : The total file size. resumableType : The file type of the resumable chuck, e.e., \"text/plain\". resumableIdentifier : A unique identifier for the file contained in the request. resumableFilename : The original file name (since a bug in Firefox results in the file name not being transmitted in chunk multipart posts). resumableRelativePath : The file's relative path when selecting a directory (defaults to file name in all browsers except Chrome). resumableTotalChunks : The total number of chunks. Generated HTTP request code example: GET /ccd-api/22/chunkupload?resumableChunkNumber=2&resumableChunkSize=1048576&resumableCurrentChunkSize=1048576&resumableTotalSize=3309465&resumableType=text%2Fplain&resumableIdentifier=3309465-large-datatxt&resumableFilename=large-data.txt&resumableRelativePath=large-data.txt&resumableTotalChunks=3 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY This GET request checks if the data chunk is already on the server side. If the target file chunk is not found on the server, the client will issue a POST request to upload the actual data. Generated HTTP request code example: POST /ccd-api/22/chunkupload HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryMFjgApg56XGyeTnZ ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableChunkNumber\" 2 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableChunkSize\" 1048576 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableCurrentChunkSize\" 1048576 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableTotalSize\" 3309465 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableType\" text/plain ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableIdentifier\" 3309465-large-datatxt ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableFilename\" large-data.txt ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableRelativePath\" large-data.txt ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableTotalChunks\" 3 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"file\"; filename=\"blob\" Content-Type: application/octet-stream ------WebKitFormBoundaryMFjgApg56XGyeTnZ-- Each chunk upload POST will get a 200 status code from response if everything works fine. And finally the md5checkSum string of the reassemabled file will be returned once the whole file has been uploaded successfully. In this example, the POST request that uploads the third chunk will response this: b1db7511ee293d297e3055d9a7b46c5e","title":"Resumable data file upload"},{"location":"causal-rest-api/#list-all-dataset-files-of-a-user","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/dataset Generated HTTP request code example: GET /ccd-api/22/dataset HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Accept: application/json A JSON formatted list of all the input dataset files that are associated with user 22 will be returned. [ { \"id\": 8, \"name\": \"data_small.txt\", \"creationTime\": 1467132449000, \"lastModifiedTime\": 1467132449000, \"fileSize\": 278428, \"md5checkSum\": \"ed5f27a2cf94fe3735a5d9ed9191c382\", \"fileSummary\": { \"variableType\": \"continuous\", \"fileDelimiter\": \"tab\", \"numOfRows\": 302, \"numOfColumns\": 123 } }, { \"id\": 10, \"name\": \"large-data.txt\", \"creationTime\": 1467134048000, \"lastModifiedTime\": 1467134048000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\", \"fileSummary\": { \"variableType\": null, \"fileDelimiter\": null, \"numOfRows\": null, \"numOfColumns\": null } }, { \"id\": 11, \"name\": \"Lung-tetrad_hv (copy).txt\", \"creationTime\": 1467140415000, \"lastModifiedTime\": 1467140415000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\", \"fileSummary\": { \"variableType\": \"continuous\", \"fileDelimiter\": \"tab\", \"numOfRows\": 302, \"numOfColumns\": 608 } } ] You can also specify the response format as XML in your request Generated HTTP request code example: GET /ccd-api/22/dataset HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Accept: application/xml And the response will look like this: 8 data_small.txt 2016-06-28T12:47:29-04:00 2016-06-28T12:47:29-04:00 278428 ed5f27a2cf94fe3735a5d9ed9191c382 tab 123 302 continuous 10 large-data.txt 2016-06-28T13:14:08-04:00 2016-06-28T13:14:08-04:00 3309465 b1db7511ee293d297e3055d9a7b46c5e 11 Lung-tetrad_hv (copy).txt 2016-06-28T15:00:15-04:00 2016-06-28T15:00:15-04:00 3309465 b1db7511ee293d297e3055d9a7b46c5e tab 608 302 continuous Form the above output, we can also tell that data file with ID 10 doesn't have all the fileSummary field values set, we'll cover this in the dataset summarization section.","title":"List all dataset files of a user"},{"location":"causal-rest-api/#get-the-detail-information-of-a-dataset-file-based-on-id","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/dataset/{id} Generated HTTP request code example: GET /ccd-api/22/dataset/8 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY And the resulting response looks like this: { \"id\": 8, \"name\": \"data_small.txt\", \"creationTime\": 1467132449000, \"lastModifiedTime\": 1467132449000, \"fileSize\": 278428, \"fileSummary\": { \"md5checkSum\": \"ed5f27a2cf94fe3735a5d9ed9191c382\", \"variableType\": \"continuous\", \"fileDelimiter\": \"tab\", \"numOfRows\": 302, \"numOfColumns\": 123 } }","title":"Get the detail information of a dataset file based on ID"},{"location":"causal-rest-api/#delete-physical-dataset-file-and-all-records-from-database-for-a-given-file-id","text":"API Endpoint URI pattern: DELETE https:///ccd-api/{userId}/dataset/{id} Generated HTTP request code example: DELETE /ccd-api/22/dataset/8 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY And this will result a HTTP 204 No Content status in response on success, which means the server successfully processed the deletion request but there's no content to response.","title":"Delete physical dataset file and all records from database for a given file ID"},{"location":"causal-rest-api/#summarize-dataset-file","text":"So from the first example we can tell that file with ID 10 doesn't have variableType , fileDelimiter , numOfRows , and numOfColumns specified under fileSummary . Among these attributes, variableType and fileDelimiter` are the ones that users will need to provide during this summarization process. Before we can go ahead to run the desired algorithm with the newly uploaded data file, we'll need to summarize the data by specifing the variable type and file delimiter. Required Fields Description id The data file ID variableType discrete or continuous fileDelimiter tab or comma API Endpoint URI pattern: POST https:///ccd-api/{userId}/dataset/summarize Generated HTTP request code example: POST /ccd-api/22/dataset/summarize HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: application/json { \"id\": 1, \"variableType\": \"continuous\", \"fileDelimiter\": \"comma\" } This POST request will summarize the dataset file and generate a response (JSON or XML) like below: { \"id\": 10, \"name\": \"large-data.txt\", \"creationTime\": 1467134048000, \"lastModifiedTime\": 1467134048000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\", \"fileSummary\": { \"variableType\": \"continuous\", \"fileDelimiter\": \"tab\", \"numOfRows\": 302, \"numOfColumns\": 608 } }","title":"Summarize dataset file"},{"location":"causal-rest-api/#list-all-prior-knowledge-files-of-a-given-user","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/priorknowledge Generated HTTP request code example: GET /ccd-api/22/priorknowledge HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Accept: application/json A JSON formatted list of all the input dataset files that are associated with user 22 will be returned. [ { \"id\": 9, \"name\": \"data_small.prior\", \"creationTime\": 1467132449000, \"lastModifiedTime\": 1467132449000, \"fileSize\": 278428, \"md5checkSum\": \"ed5f27a2cf94fe3735a5d9ed9191c382\" }, { \"id\": 12, \"name\": \"large-data.prior\", \"creationTime\": 1467134048000, \"lastModifiedTime\": 1467134048000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\" } ]","title":"List all prior knowledge files of a given user"},{"location":"causal-rest-api/#get-the-detail-information-of-a-prior-knowledge-file-based-on-id","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/priorknowledge/{id} Generated HTTP request code example: GET /ccd-api/22/priorknowledge/9 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY And the resulting response looks like this: { \"id\": 9, \"name\": \"data_small.prior\", \"creationTime\": 1467132449000, \"lastModifiedTime\": 1467132449000, \"fileSize\": 278428, \"md5checkSum\": \"ed5f27a2cf94fe3735a5d9ed9191c382\" }","title":"Get the detail information of a prior knowledge file based on ID"},{"location":"causal-rest-api/#delete-physical-prior-knowledge-file-and-all-records-from-database-for-a-given-file-id","text":"API Endpoint URI pattern: DELETE https:///ccd-api/{userId}/priorknowledge/{id} Generated HTTP request code example: DELETE /ccd-api/22/priorknowledge/9 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY And this will result a HTTP 204 No Content status in response on success, which means the server successfully processed the deletion request but there's no content to response.","title":"Delete physical prior knowledge file and all records from database for a given file ID"},{"location":"causal-rest-api/#2-causal-discovery","text":"Once the data file is uploaded and summaried, you can start running a Causal Discovery Algorithm on the uploaded data file.","title":"2. Causal Discovery"},{"location":"causal-rest-api/#list-all-the-available-causal-discovery-algorithms","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/algorithms Generated HTTP request code example: GET /ccd-api/22/algorithms HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY [ { \"id\": 1, \"name\": \"FGESc\", \"description\": \"FGES continuous\" }, { \"id\": 2, \"name\": \"FGESd\", \"description\": \"FGES discrete\" }, { \"id\": 3, \"name\": \"GFCIc\", \"description\": \"GFCI continuous\" }, { \"id\": 4, \"name\": \"GFCId\", \"description\": \"GFCI discrete\" } ] Currently we support \"FGES continuous\", \"FGES discrete\", \"GFCI continuous\", and \"GFCI discrete\". They also share a common JSON structure as of their input, for example: Input JSON Fields Description datasetFileId The dataset file ID, integer priorKnowledgeFileId The optional prior knowledge file ID, integer dataValidation Algorithm specific input data validation flags, JSON object algorithmParameters Algorithm specific parameters, JSON object jvmOptions Advanced Options For Java Virtual Machine (JVM), JSON object. Currently only support maxHeapSize (Gigabyte, max value is 100) hpcParameters Parameters for High-Performance Computing, JSON array of key-value objects. Currently only support wallTime Below are the data validation flags and parameters that you can use for each algorithm. FGES continuous Data validation: Parameters Description Default Value skipNonzeroVariance Skip check for zero variance variables false skipUniqueVarName Skip check for unique variable names false Algorithm parameters: Parameters Description Default Value faithfulnessAssumed Yes if (one edge) faithfulness should be assumed true maxDegree The maximum degree of the output graph 100 penaltyDiscount Penalty discount 4.0 verbose Print additional information true FGES discrete Data validation: Parameters Description Default Value skipUniqueVarName Skip check for unique variable names false skipCategoryLimit Skip 'limit number of categories' check false Algorithm parameters: Parameters Description Default Value structurePrior Structure prior coefficient 1.0 samplePrior Sample prior 1.0 maxDegree The maximum degree of the output graph 100 faithfulnessAssumed Yes if (one edge) faithfulness should be assumed true verbose Print additional information true GFCI continuous Data validation: Parameters Description Default Value skipNonzeroVariance Skip check for zero variance variables false skipUniqueVarName Skip check for unique variable names false Algorithm parameters: Parameters Description Default Value alpha Cutoff for p values (alpha) 0.01 penaltyDiscount Penalty discount 4.0 maxDegree The maximum degree of the output graph 100 faithfulnessAssumed Yes if (one edge) faithfulness should be assumed false verbose Print additional information true GFCI discrete Data validation: Parameters Description Default Value skipUniqueVarName Skip check for unique variable names false skipCategoryLimit Skip 'limit number of categories' check false Algorithm parameters: Parameters Description Default Value alpha Cutoff for p values (alpha) 0.01 structurePrior Structure prior coefficient 1.0 samplePrior Sample prior 1.0 maxDegree The maximum degree of the output graph 100 faithfulnessAssumed Yes if (one edge) faithfulness should be assumed false verbose Print additional information true","title":"List all the available causal discovery algorithms"},{"location":"causal-rest-api/#add-a-new-job-to-run-the-desired-algorithm-on-a-given-data-file","text":"This is a POST request and the algorithm details and data file id will need to be specified in the POST body as a JSON when you make the request. API Endpoint URI pattern: POST https:///ccd-api/{userId}/jobs/FGESc Generated HTTP request code example: POST /ccd-api/22/jobs/FGESc HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: application/json { \"datasetFileId\": 8, \"priorKnowledgeFileId\": 9, \"dataValidation\": { \"skipNonzeroVariance\": true, \"skipUniqueVarName\": true }, \"algorithmParameters\": { \"penaltyDiscount\": 5.0, \"maxDegree\": 100 }, \"jvmOptions\": { \"maxHeapSize\": 100 }, \"hpcParameters\": [ { \"key\":\"wallTime\", \"value\":1 } ] } In this example, we are running the \"FGES continuous\" algorithm on the file of ID 8. We also set the wallTime as 1 hour. And this call will return the job info with a 201 Created response status code. { \"id\": 5, \"algorithmName\": \"FGESc\", \"status\": 0, \"addedTime\": 1472742564355, \"resultFileName\": \"FGESc_data_small.txt_1472742564353.txt\", \"errorResultFileName\": \"error_FGESc_data_small.txt_1472742564353.txt\" } From this response we can tell that the job ID is 5, and the result file name will be FGESc_data_small.txt_1472742564353.txt if everything goes well. If something is wrong an error result file with name error_FGEsc_data_small.txt_1472742564353.txt will be created. When you need to run \"FGES discrete\", just send the request to a different endpont URI: API Endpoint URI pattern: POST https:///ccd-api/{userId}/jobs/FGESd Generated HTTP request code example: POST /ccd-api/22/jobs/FGESd HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: application/json { \"datasetFileId\": 10, \"priorKnowledgeFileId\": 12, \"dataValidation\": { \"skipUniqueVarName\": true, \"skipCategoryLimit\": true }, \"algorithmParameters\": { \"structurePrior\": 1.0, \"samplePrior\": 1.0, \"maxDegree\": 102 }, \"jvmOptions\": { \"maxHeapSize\": 100 } }","title":"Add a new job to run the desired algorithm on a given data file"},{"location":"causal-rest-api/#list-all-running-jobs","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/jobs Generated HTTP request code example: GET /ccd-api/22/jobs/ HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: application/json Then you'll see the information of all jobs that are currently running: [ { \"id\": 32, \"algorithmName\": \"FGESc\", \"addedTime\": 1468436085000 }, { \"id\": 33, \"algorithmName\": \"FGESd\", \"addedTime\": 1468436087000 } ]","title":"List all running jobs"},{"location":"causal-rest-api/#check-the-job-status-for-a-given-job-id","text":"Once the new job is submitted, it takes time and resources to run the algorithm on the server. During the waiting, you can check the status of a given job ID: API Endpoint URI pattern: GET https:///ccd-api/{userId}/jobs/{id} Generated HTTP request code example: GET /ccd-api/22/jobs/32 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY This will either return \"Pending\" or \"Completed\".","title":"Check the job status for a given job ID"},{"location":"causal-rest-api/#cancel-a-running-job","text":"Sometimes you may want to cancel a submitted job. API Endpoint URI pattern: DELETE https:///ccd-api/{userId}/jobs/{id} Generated HTTP request code example: DELETE /ccd-api/22/jobs/8 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY This call will response either \"Job 8 has been canceled\" or \"Unable to cancel job 8\". It's not guranteed that the system can always cencal a job successfully.","title":"Cancel a running job"},{"location":"causal-rest-api/#3-result-management","text":"","title":"3. Result Management"},{"location":"causal-rest-api/#list-all-result-files-generated-by-the-algorithm","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/results Generated HTTP request code example: GET /ccd-api/22/results HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY The response to this request will look like this: [ { \"name\": \"FGESc_sim_data_20vars_100cases.csv_1466171729046.txt\", \"creationTime\": 1466171732000, \"lastModifiedTime\": 1466171732000, \"fileSize\": 1660 }, { \"name\": \"FGESc_data_small.txt_1466172140585.txt\", \"creationTime\": 1466172145000, \"lastModifiedTime\": 1466172145000, \"fileSize\": 39559 } ]","title":"List all result files generated by the algorithm"},{"location":"causal-rest-api/#download-a-specific-result-file-generated-by-the-algorithm-based-on-file-name","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/results/{result_file_name} Generated HTTP request code example: GET /ccd-api/22/results/FGESc_data_small.txt_1466172140585.txt HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY On success, you will get the result file back as text file content. If there's a typo in file name of the that file doesn't exist, you'll get either a JSON or XML message based on the accept header in your request: The response to this request will look like this: { \"timestamp\": 1467210996233, \"status\": 404, \"error\": \"Not Found\", \"message\": \"Resource not found.\", \"path\": \"/22/results/FGESc_data_small.txt_146172140585.txt\" }","title":"Download a specific result file generated by the algorithm based on file name"},{"location":"causal-rest-api/#compare-algorithm-result-files","text":"Since we can list all the algorithm result files, based on the results, we can also choose multiple files and run a comparison. API Endpoint URI pattern: POST https:///ccd-api/{userId}/results/compare The request body is a JSON that contains an array of result files to be compared. Generated HTTP request code example: POST /ccd-api/22/results/compare HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY { \"resultFiles\": [ \"FGESc_sim_data_20vars_100cases.csv_1466171729046.txt\", \"FGESc_data_small.txt_1467305104859.txt\" ] } When you specify multiple file names, use the !! as a delimiter. This request will generate a result comparison file with the following content (shortened version): FGESc_sim_data_20vars_100cases.csv_1466171729046.txt FGESc_data_small.txt_1467305104859.txt Edges In All Same End Point NR4A2,FOS 0 0 X5,X17 0 0 MMP11,ASB5 0 0 X12,X8 0 0 hsa_miR_654_3p,hsa_miR_337_3p 0 0 RND1,FGA 0 0 HHLA2,UBXN10 0 0 HS6ST2,RND1 0 0 SCRG1,hsa_miR_377 0 0 CDH3,diag 0 0 SERPINI2,FGG 0 0 hsa_miR_451,hsa_miR_136_ 0 0 From this comparison, you can see if the two algorithm graphs have common edges and endpoints.","title":"Compare algorithm result files"},{"location":"causal-rest-api/#list-all-the-comparison-files","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/results/comparisons Generated HTTP request code example: GET /ccd-api/22/results/comparisons HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY The response will show a list of comparison files: [ { \"name\": \"result_comparison_1467385923407.txt\", \"creationTime\": 1467385923000, \"lastModifiedTime\": 1467385923000, \"fileSize\": 7505 }, { \"name\": \"result_comparison_1467387034358.txt\", \"creationTime\": 1467387034000, \"lastModifiedTime\": 1467387034000, \"fileSize\": 7505 }, { \"name\": \"result_comparison_1467388042261.txt\", \"creationTime\": 1467388042000, \"lastModifiedTime\": 1467388042000, \"fileSize\": 7533 } ]","title":"List all the comparison files"},{"location":"causal-rest-api/#download-a-specific-comparison-file-based-on-file-name","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/results/comparisons/{comparison_file_name} Generated HTTP request code example: GET /ccd-api/22/results/comparisons/result_comparison_1467388042261.txt HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Then it returns the content of that comparison file (shorted version): FGESc_sim_data_20vars_100cases.csv_1466171729046.txt FGESc_data_small.txt_1467305104859.txt Edges In All Same End Point NR4A2,FOS 0 0 X5,X17 0 0 MMP11,ASB5 0 0 X12,X8 0 0 hsa_miR_654_3p,hsa_miR_337_3p 0 0 RND1,FGA 0 0 HHLA2,UBXN10 0 0 HS6ST2,RND1 0 0 SCRG1,hsa_miR_377 0 0 CDH3,diag 0 0 SERPINI2,FGG 0 0","title":"Download a specific comparison file based on file name"},{"location":"causal-web/","text":"Causal Web Application Quick Start and User Guide Causal web is a Java web-based application that allows users to run causal modeling algorithms on their dataset. Creating Your Account You can create a new account by clicking \"Create an account\" link on the login page. Fill your information in the signup page. Make sure to read the Terms & Conditions agreement and check the agree box before clicking the signup button. Upon finishing registration, the system will send out an email with an activation link. Go to your email accont and click on that link, then the Causal Web shows a confirmation message. Login to Causal Web Application Input your email address and password that you want to use to register with the Causal Web system. Check the \"Remember Me\" checkbox if you would like the browser automatically log you in next time you visit. Here we go! You are now in the Causal Web application. Uploading Your Dataset Click on the Data Management link on the navigation bar on the left side. There is a sub menu that will appear. Click on the Import Data link. You can EITHER drag & drop dataset file(s) into the dash-surrounded canvas OR you can click on the Browse button and choose the dataset file(s) you would like to upload to the Causal Web application. For testing purposes download this dataset: Retention.txt and upload it. The Import Data panel shows the dataset upload progress as a percentage along with MD5 checksums (confirms that an uploaded file's contents are unchanged after upload) for each of uploaded files. You can also pause the upload of files and resume later. In the case of a disrupted connection, you can resume the upload by repeating the previous steps. The Causal Web application will detect the unfinished upload and resume from the latest point of the last attempt. Once all your dataset file(s) are all uploaded, the progress bar will show the (completed) sign. Summarizing Your Dataset Before any analysis can proceed, the datasets need to be summarized. Specifically, you must indicate the delimiter used in the data file (tab vs. comma), and the types of variables found in the file. Once this is done, the Causal Web application will determine the number of columns (features) and rows (records) in the dataset. Click on the Data Management menu on the navigation bar on the left side. The sub menu will slowly appear. Click on the Datasets menu. The dataset page shows a list of datasets and their attributes. On the second Summarized column from the right, the yellow warning buttons indicate that the system has not yet summarized. Click on the dataset's name's link to see the dataset information. From this stage, the data summary information is missing: the dataset needs to be summarized before conducting causal analysis. From the dataset page, click on the yellow warning button to summarize a dataset. The data summarization page shows information of the dataset, its basic information, and additional information that will be determined after summarization (a number of rows and columns). The bottom panel has two radio boxes for you to choose variable type (continuous, discrete, or mixed), and delimiter (tab or comma). The Retention.txt dataset described above is tab-delimited and contains continous variables. Once the dataset is summarized, the dataset page changes its sign to be a green button. Click to see the additional information of this summarized dataset. Click on the dataset's name's link to see the additional information. Annotating Your Dataset On the Datasets main page, the blue icon is for viewing and entering annotations. Click the annotation icon, and you can add new annotation, just click the \"New annotation\" button. The application will pop up the annotation form. You can also add another annotation on top of the exisiting annotation. Uploading the Prior Knowledge Click on the Data Management menu on the navigation bar on the left side. There is a sub menu that will appear. Click on the Import Data menu. You can EITHER drag & drop prior knowledge file(s) into the dash-surrounded canvas OR you can click on the Browse button and choose the prior knowledge file(s) you would like to upload to the CCD Web application. Note that the prior knowledge file needs to have .prior file extension. Executing an Analysis on Your Dataset Click on the Causal Discovery menu on the navigation bar on the left side. The sub menu will slowly appear. FGES and GFCI are the currently supported algorithms. FGES algorithm can handle Continuous, Discrete, and Mixed data files. GFCI algorithm can handle Continuous, Discrete, and Mixed data files as well. The Dataset drop-down box contains a list of datasets that you have uploaded. If those datasets are already uploaded and they are not displayed in the dataset drop-down box, it means that the Data Summarization process to be reviewed in the first place prior to execute a causal FGES (Continuous) analysis. If a prior knowledge file needs to be included in the analysis, Prior Knowledge File drop-down box contains a list of knowledge files. Before clicking the Next button, the data validation parameters need to be input. Here, the FGES Continuous algorithm page allows user to modify its parameters. The first one is Penalty Discount and its default value is 2. The second one is Search Maximum Degree and its default value is 100. The third one is Faithfulness Assumed and its default value is checked. The fifth one is Verbose output and its default value is checked. Click Next to proceed or click Advanced Options (JVM) for the JVM customization. Expert Mode : the JVM parameters allow users to customize JVM parameters such how much maximum memory (in Gigabyte scale) the process would allocate (e.g. 4). This is the summary page before the FGES job analysis is put into the queue. Click on the number 1 (Select Dataset) or number 2 (Set Parameters) to go back to modify the parameters. Once, everything is set. Click the Run Algorithm! button. The application will redirect to the Job Queue page. The analysis job is added to the queue. The Queued status means that it waits for the scheduler to run it once the executing slot is available. However, the Job Queue page does not currently automatically update the jobs' status (at least in this development stage). Refresh the Job Queue page from time to time to see the latest jobs' status. Once the job slot is available, the queued job is then executed and its status changes to Running . When the job is finished, it is automatically removed from the Job Queue page. The result of the analysis is added to the Results page. In case the queued or running job needs to be killed or removed, click the Remove button on the first column on the Job Queue page from the right. The Remove Job confirmation page is popped up. Click Yes to kill the job or No to cancel the kill operation. After the job cancellation is confirmed, the job's status changes to Kill Request . The scheduler will take care of removing of the job from the queue or killing a job in the server. If the running job was killed or any error happened during the process, the error result will appear in the Results page. Its background is highlighted in red. If there is an error, you will see the details of the error by clicking on error result link. Reviewing Your Results Click on the Results menu on the navigation bar on the left side. Click on the Algorithm Results menu. The Algorithm Results page shows a list of results, their creation time and their size. In the first column from the right, the green Save buttons provide the ability for users to download results to their local computers. Click on the result's name's link to see a causal graph of the result. Check the result files on their checkboxes to compare the results . Note : a number of comparing datasets can be more than two files. The results page details the graph, the original dataset, and its parameters. Click on the View Full Screen button to see the causal graph in more detail. Based on the nature of your data, sometimes you may see the generated graph (PAG) containing dashed links in addition to solid links. For example: If an edge is dashed that means there is no latent confounder. Otherwise, there is possibly a latent confounder. If an edge is green that means it is definitely direct. Otherwise, it is possibly direct. Comparing Your Results Click on the Results menu on the navigation bar on the left side. To compare two results click on the Algorithm Results item on the left. Select at least two results (place a checkmark next to the results) and click on Compare. Now click on the Result Comparisions item on the left. The Result Comparisons page shows a list of results, their creation time and their size. On the first column from the right, the green Save buttons provide the ability for users to download results to their local computers. Click on the result's name's the link to see the detail of the result comparisons. The Result Comparisons page shows the datasets compared, and the table of edges, their mutual appearance in all comparing datasets, and their mutual endpoint types. Downloading Your Result And Comparision Result On the first column from the right of the Algorithm Results page, the green Save buttons provide the ability for users to download results to their local computers. On the first column from the right of the Result Comparisions page, the green Save buttons provide the ability for users to download result comparisons to their local computers. Submit Your Feedback Click the Feedback menu on the navigation menu bar on the left. The Feedback page shows the email (optional), and the text area for the user feedback (required). Once, the feedback is filled, click the Send Feedback button. The green Thank you for you feedback! banner shows that the feedback submitted successfully.","title":"Causal Web"},{"location":"causal-web/#causal-web-application-quick-start-and-user-guide","text":"Causal web is a Java web-based application that allows users to run causal modeling algorithms on their dataset.","title":"Causal Web Application Quick Start and User Guide"},{"location":"causal-web/#creating-your-account","text":"You can create a new account by clicking \"Create an account\" link on the login page. Fill your information in the signup page. Make sure to read the Terms & Conditions agreement and check the agree box before clicking the signup button. Upon finishing registration, the system will send out an email with an activation link. Go to your email accont and click on that link, then the Causal Web shows a confirmation message.","title":"Creating Your Account"},{"location":"causal-web/#login-to-causal-web-application","text":"Input your email address and password that you want to use to register with the Causal Web system. Check the \"Remember Me\" checkbox if you would like the browser automatically log you in next time you visit. Here we go! You are now in the Causal Web application.","title":"Login to Causal Web Application"},{"location":"causal-web/#uploading-your-dataset","text":"Click on the Data Management link on the navigation bar on the left side. There is a sub menu that will appear. Click on the Import Data link. You can EITHER drag & drop dataset file(s) into the dash-surrounded canvas OR you can click on the Browse button and choose the dataset file(s) you would like to upload to the Causal Web application. For testing purposes download this dataset: Retention.txt and upload it. The Import Data panel shows the dataset upload progress as a percentage along with MD5 checksums (confirms that an uploaded file's contents are unchanged after upload) for each of uploaded files. You can also pause the upload of files and resume later. In the case of a disrupted connection, you can resume the upload by repeating the previous steps. The Causal Web application will detect the unfinished upload and resume from the latest point of the last attempt. Once all your dataset file(s) are all uploaded, the progress bar will show the (completed) sign.","title":"Uploading Your Dataset"},{"location":"causal-web/#summarizing-your-dataset","text":"Before any analysis can proceed, the datasets need to be summarized. Specifically, you must indicate the delimiter used in the data file (tab vs. comma), and the types of variables found in the file. Once this is done, the Causal Web application will determine the number of columns (features) and rows (records) in the dataset. Click on the Data Management menu on the navigation bar on the left side. The sub menu will slowly appear. Click on the Datasets menu. The dataset page shows a list of datasets and their attributes. On the second Summarized column from the right, the yellow warning buttons indicate that the system has not yet summarized. Click on the dataset's name's link to see the dataset information. From this stage, the data summary information is missing: the dataset needs to be summarized before conducting causal analysis. From the dataset page, click on the yellow warning button to summarize a dataset. The data summarization page shows information of the dataset, its basic information, and additional information that will be determined after summarization (a number of rows and columns). The bottom panel has two radio boxes for you to choose variable type (continuous, discrete, or mixed), and delimiter (tab or comma). The Retention.txt dataset described above is tab-delimited and contains continous variables. Once the dataset is summarized, the dataset page changes its sign to be a green button. Click to see the additional information of this summarized dataset. Click on the dataset's name's link to see the additional information.","title":"Summarizing Your Dataset"},{"location":"causal-web/#annotating-your-dataset","text":"On the Datasets main page, the blue icon is for viewing and entering annotations. Click the annotation icon, and you can add new annotation, just click the \"New annotation\" button. The application will pop up the annotation form. You can also add another annotation on top of the exisiting annotation.","title":"Annotating Your Dataset"},{"location":"causal-web/#uploading-the-prior-knowledge","text":"Click on the Data Management menu on the navigation bar on the left side. There is a sub menu that will appear. Click on the Import Data menu. You can EITHER drag & drop prior knowledge file(s) into the dash-surrounded canvas OR you can click on the Browse button and choose the prior knowledge file(s) you would like to upload to the CCD Web application. Note that the prior knowledge file needs to have .prior file extension.","title":"Uploading the Prior Knowledge"},{"location":"causal-web/#executing-an-analysis-on-your-dataset","text":"Click on the Causal Discovery menu on the navigation bar on the left side. The sub menu will slowly appear. FGES and GFCI are the currently supported algorithms. FGES algorithm can handle Continuous, Discrete, and Mixed data files. GFCI algorithm can handle Continuous, Discrete, and Mixed data files as well. The Dataset drop-down box contains a list of datasets that you have uploaded. If those datasets are already uploaded and they are not displayed in the dataset drop-down box, it means that the Data Summarization process to be reviewed in the first place prior to execute a causal FGES (Continuous) analysis. If a prior knowledge file needs to be included in the analysis, Prior Knowledge File drop-down box contains a list of knowledge files. Before clicking the Next button, the data validation parameters need to be input. Here, the FGES Continuous algorithm page allows user to modify its parameters. The first one is Penalty Discount and its default value is 2. The second one is Search Maximum Degree and its default value is 100. The third one is Faithfulness Assumed and its default value is checked. The fifth one is Verbose output and its default value is checked. Click Next to proceed or click Advanced Options (JVM) for the JVM customization. Expert Mode : the JVM parameters allow users to customize JVM parameters such how much maximum memory (in Gigabyte scale) the process would allocate (e.g. 4). This is the summary page before the FGES job analysis is put into the queue. Click on the number 1 (Select Dataset) or number 2 (Set Parameters) to go back to modify the parameters. Once, everything is set. Click the Run Algorithm! button. The application will redirect to the Job Queue page. The analysis job is added to the queue. The Queued status means that it waits for the scheduler to run it once the executing slot is available. However, the Job Queue page does not currently automatically update the jobs' status (at least in this development stage). Refresh the Job Queue page from time to time to see the latest jobs' status. Once the job slot is available, the queued job is then executed and its status changes to Running . When the job is finished, it is automatically removed from the Job Queue page. The result of the analysis is added to the Results page. In case the queued or running job needs to be killed or removed, click the Remove button on the first column on the Job Queue page from the right. The Remove Job confirmation page is popped up. Click Yes to kill the job or No to cancel the kill operation. After the job cancellation is confirmed, the job's status changes to Kill Request . The scheduler will take care of removing of the job from the queue or killing a job in the server. If the running job was killed or any error happened during the process, the error result will appear in the Results page. Its background is highlighted in red. If there is an error, you will see the details of the error by clicking on error result link.","title":"Executing an Analysis on Your Dataset"},{"location":"causal-web/#reviewing-your-results","text":"Click on the Results menu on the navigation bar on the left side. Click on the Algorithm Results menu. The Algorithm Results page shows a list of results, their creation time and their size. In the first column from the right, the green Save buttons provide the ability for users to download results to their local computers. Click on the result's name's link to see a causal graph of the result. Check the result files on their checkboxes to compare the results . Note : a number of comparing datasets can be more than two files. The results page details the graph, the original dataset, and its parameters. Click on the View Full Screen button to see the causal graph in more detail. Based on the nature of your data, sometimes you may see the generated graph (PAG) containing dashed links in addition to solid links. For example: If an edge is dashed that means there is no latent confounder. Otherwise, there is possibly a latent confounder. If an edge is green that means it is definitely direct. Otherwise, it is possibly direct.","title":"Reviewing Your Results"},{"location":"causal-web/#comparing-your-results","text":"Click on the Results menu on the navigation bar on the left side. To compare two results click on the Algorithm Results item on the left. Select at least two results (place a checkmark next to the results) and click on Compare. Now click on the Result Comparisions item on the left. The Result Comparisons page shows a list of results, their creation time and their size. On the first column from the right, the green Save buttons provide the ability for users to download results to their local computers. Click on the result's name's the link to see the detail of the result comparisons. The Result Comparisons page shows the datasets compared, and the table of edges, their mutual appearance in all comparing datasets, and their mutual endpoint types.","title":"Comparing Your Results"},{"location":"causal-web/#downloading-your-result-and-comparision-result","text":"On the first column from the right of the Algorithm Results page, the green Save buttons provide the ability for users to download results to their local computers. On the first column from the right of the Result Comparisions page, the green Save buttons provide the ability for users to download result comparisons to their local computers.","title":"Downloading Your Result And Comparision Result"},{"location":"causal-web/#submit-your-feedback","text":"Click the Feedback menu on the navigation menu bar on the left. The Feedback page shows the email (optional), and the text area for the user feedback (required). Once, the feedback is filled, click the Send Feedback button. The green Thank you for you feedback! banner shows that the feedback submitted successfully.","title":"Submit Your Feedback"},{"location":"ccd-annotations-cytoscape/","text":"ccd-annotations-cytoscape Installation The source code for the plugin is available from the project site: (https://github.com/bd2kccd/ccd-annotations-cytoscape) To install the plugin compile the source code or download a release then start the Cytoscape application and click on apps->install from file and select the jar file. Using the Plugin Create new annotations Search for existing annotations Set auto label placement","title":"CCD Annotations Cytoscape Plugin"},{"location":"ccd-annotations-cytoscape/#ccd-annotations-cytoscape","text":"","title":"ccd-annotations-cytoscape"},{"location":"ccd-annotations-cytoscape/#installation","text":"The source code for the plugin is available from the project site: (https://github.com/bd2kccd/ccd-annotations-cytoscape) To install the plugin compile the source code or download a release then start the Cytoscape application and click on apps->install from file and select the jar file.","title":"Installation"},{"location":"ccd-annotations-cytoscape/#using-the-plugin","text":"Create new annotations Search for existing annotations Set auto label placement","title":"Using the Plugin"},{"location":"cytoscape-tetrad/","text":"cytoscape-tetrad Displaying Tetrad Networks in Cytoscape The Cytoscape application has significant power and flexibility to display networks. This webpage describes how to load a plugin into Cytoscape that will allow you to import and display a Tetrad network (graph) that you have saved from Tetrad. Cytoscape can be download for free from http://www.cytoscape.org/ Download the latest version of the plugin from github To install the plugin start the Cytoscape application and click on Apps --> App Manager --> Install Apps from the Cytoscape menu. Using the Plugin Put a graph box on the Tetrad workspace and select the graph type \u201cgraph\u201d. Double click on the graph box to display the graph in Tetrad. Within the graph display box, click on File --> Save JSON. In Cytoscape, select the File --> Import --> Network --> Tetrad option and select the file that you saved previously from Tetrad. Apply a layout in Cytoscape. By default Cytoscape doesn't apply a layout so the initial rendering will look like a single node. Apply a layout by selecting Layout in the top menu and then choosing a layout to see your graph (e.g., Layouts --> Prefuse Force Directed Layout).","title":"Cytoscape Tetrad Plugin"},{"location":"cytoscape-tetrad/#cytoscape-tetrad","text":"","title":"cytoscape-tetrad"},{"location":"cytoscape-tetrad/#displaying-tetrad-networks-in-cytoscape","text":"The Cytoscape application has significant power and flexibility to display networks. This webpage describes how to load a plugin into Cytoscape that will allow you to import and display a Tetrad network (graph) that you have saved from Tetrad. Cytoscape can be download for free from http://www.cytoscape.org/ Download the latest version of the plugin from github To install the plugin start the Cytoscape application and click on Apps --> App Manager --> Install Apps from the Cytoscape menu.","title":"Displaying Tetrad Networks in Cytoscape"},{"location":"cytoscape-tetrad/#using-the-plugin","text":"Put a graph box on the Tetrad workspace and select the graph type \u201cgraph\u201d. Double click on the graph box to display the graph in Tetrad. Within the graph display box, click on File --> Save JSON. In Cytoscape, select the File --> Import --> Network --> Tetrad option and select the file that you saved previously from Tetrad. Apply a layout in Cytoscape. By default Cytoscape doesn't apply a layout so the initial rendering will look like a single node. Apply a layout by selecting Layout in the top menu and then choosing a layout to see your graph (e.g., Layouts --> Prefuse Force Directed Layout).","title":"Using the Plugin"},{"location":"py-causal/","text":"py-causal Python APIs for causal modeling algorithms developed by the University of Pittsburgh/Carnegie Mellon University Center for Causal Discovery . Note: This project uses a very old version of Tetrad and a method of connecting Python to Java, Javabridge, that's proven sometimes buggy and hard to install on some platforms, and so we are no longer recommending it. Please consider using py-tetrad instead. Py-tetrad uses JPype to bridge Python and Java, which has already shown itself to be much easier to install and use cross-platform. Also, it allows one to use the most recent version of Tetrad, and it has been well-tested. This code is distributed under the LGPL 2.1 license. Requirements: Python 2.7 and 3.6 javabridge>=1.0.11 pandas numpy JDK 1.8 pydot (Optional) GraphViz (Optional) Docker Image A pre-installed py-causal Docker image is available at Docker Hub Installation overview: To install on existing Python installation, we have found two approaches to be useful: * Direct python installation with pip, possibly including use of Jupyter . This approach is likely best for users who have Python installed and are familiar with installing Python modules. * Installation via Anaconda Directions for both approaches are given below... Installation with pip If you do not have pip installed already, try these instructions . Once pip is installed, execute these commands pip install -U numpy pip install -U pandas pip install -U javabridge pip install -U pydot pip install -U GraphViz Note: you also need to install the GraphViz engine by following these instructions . We have observed that on some OS X installations, pydot may provide the following response Couldn't import dot_parser, loading of dot files will not be possible. If you see this, try the following pip uninstall pydot pip install pyparsing==1.5.7 pip install pydot Then, from within the py-causal directory, run the following command: python setup.py install or use the pip command: pip install git+git://github.com/bd2kccd/py-causal After running this command, enter a python shell and attempt the follwing import import pandas as pd import pydot from tetrad import search as s Finally, try to run the python example python py-causal-fges-continuous-example.py Be sure to run this from within the py-causal directory. This program will create a file named tetrad.svg, which should be viewable in any SVG capable program. If you see a causal graph, everything is working correctly. Running Jupyter/IPython We have found Jupyter notebooks to be helpful. (Those who have run IPython in the past should know that Jupyter is simply a new name for IPython). To add Jupyter to your completed python install, simply run pip -U jupyter jupyter notebook and then load one of the Jupyter notebooks found in this installation. Anaconda/Jupyter Installing Python with Anaconda and Jupyter may be easier for some users: Download and install Anaconda conda install python-javabridge For OS X, this default install does not seem to work well. try the following instead: conda install --channel https://conda.anaconda.org/david_baddeley python-javabridge Then run the following to configure anacoda conda install pandas conda install numpy conda install pydot conda install graphviz conda install -c https://conda.anaconda.org/chirayu pycausal jupyter notebook and then load one of the Jupyter notebooks.","title":"Py-causal"},{"location":"py-causal/#py-causal","text":"Python APIs for causal modeling algorithms developed by the University of Pittsburgh/Carnegie Mellon University Center for Causal Discovery . Note: This project uses a very old version of Tetrad and a method of connecting Python to Java, Javabridge, that's proven sometimes buggy and hard to install on some platforms, and so we are no longer recommending it. Please consider using py-tetrad instead. Py-tetrad uses JPype to bridge Python and Java, which has already shown itself to be much easier to install and use cross-platform. Also, it allows one to use the most recent version of Tetrad, and it has been well-tested. This code is distributed under the LGPL 2.1 license.","title":"py-causal"},{"location":"py-causal/#requirements","text":"Python 2.7 and 3.6 javabridge>=1.0.11 pandas numpy JDK 1.8 pydot (Optional) GraphViz (Optional)","title":"Requirements:"},{"location":"py-causal/#docker-image","text":"A pre-installed py-causal Docker image is available at Docker Hub","title":"Docker Image"},{"location":"py-causal/#installation-overview","text":"To install on existing Python installation, we have found two approaches to be useful: * Direct python installation with pip, possibly including use of Jupyter . This approach is likely best for users who have Python installed and are familiar with installing Python modules. * Installation via Anaconda Directions for both approaches are given below...","title":"Installation overview:"},{"location":"py-causal/#installation-with-pip","text":"If you do not have pip installed already, try these instructions . Once pip is installed, execute these commands pip install -U numpy pip install -U pandas pip install -U javabridge pip install -U pydot pip install -U GraphViz Note: you also need to install the GraphViz engine by following these instructions . We have observed that on some OS X installations, pydot may provide the following response Couldn't import dot_parser, loading of dot files will not be possible. If you see this, try the following pip uninstall pydot pip install pyparsing==1.5.7 pip install pydot Then, from within the py-causal directory, run the following command: python setup.py install or use the pip command: pip install git+git://github.com/bd2kccd/py-causal After running this command, enter a python shell and attempt the follwing import import pandas as pd import pydot from tetrad import search as s Finally, try to run the python example python py-causal-fges-continuous-example.py Be sure to run this from within the py-causal directory. This program will create a file named tetrad.svg, which should be viewable in any SVG capable program. If you see a causal graph, everything is working correctly.","title":"Installation with pip"},{"location":"py-causal/#running-jupyteripython","text":"We have found Jupyter notebooks to be helpful. (Those who have run IPython in the past should know that Jupyter is simply a new name for IPython). To add Jupyter to your completed python install, simply run pip -U jupyter jupyter notebook and then load one of the Jupyter notebooks found in this installation.","title":"Running Jupyter/IPython"},{"location":"py-causal/#anacondajupyter","text":"Installing Python with Anaconda and Jupyter may be easier for some users: Download and install Anaconda conda install python-javabridge For OS X, this default install does not seem to work well. try the following instead: conda install --channel https://conda.anaconda.org/david_baddeley python-javabridge Then run the following to configure anacoda conda install pandas conda install numpy conda install pydot conda install graphviz conda install -c https://conda.anaconda.org/chirayu pycausal jupyter notebook and then load one of the Jupyter notebooks.","title":"Anaconda/Jupyter"},{"location":"r-causal/","text":"r-causal R Wrapper for Tetrad Library Note 2023-03-06: This version of RCausal uses an older version of Tetrad from at least 5 years ago. However, we have updated our Python integration to a much better version--see https://github.com/cmu-phil/py-tetrad . Updating our R integration is one of the next projects we will take up. News 2023-04-05: We have put forward a proposal to replace the r-causal functionality using the py-tetrad functionality, here: https://github.com/cmu-phil/py-tetrad/tree/main/pytetrad/R . The installation procedure for this is still somewhat complicated, and we will try to simplify it. If you try it and and have difficulties, please let us know. Once you have it installed, it is very easy and intuitive to use. By the way, rcausal has not been maintained for some time now, as the tireless maintainer has since moved on to different work :-)... but going back through some of the issues posted for r-causal gives some hints as to additional functionality that pytetrad/R should have. We'll try to get caught up. R Library Requirement R >= 3.3.0, stringr , rJava , Docker As an alternative to installing the library and getting rJava working with your installation (i.e., does not work well on mac) we have a Docker image Installation Install the R library requirements: install.packages(\"stringr\") install.packages(\"rJava\") Install r-causal from github: library(devtools) install_github(\"bd2kccd/r-causal\") Example Continuous Dataset library(rcausal) data(\"charity\") #Load the charity dataset tetradrunner.getAlgorithmDescription(algoId = 'fges') tetradrunner.getAlgorithmParameters(algoId = 'fges',scoreId = 'fisher-z') #Compute FGES search tetradrunner <- tetradrunner(algoId = 'fges',df = charity,scoreId = 'fisher-z', dataType = 'continuous',alpha=0.1,faithfulnessAssumed=TRUE,maxDegree=-1,verbose=TRUE) tetradrunner$nodes #Show the result's nodes tetradrunner$edges #Show the result's edges Discrete Dataset library(rcausal) data(\"audiology\") #Load the charity dataset tetradrunner.getAlgorithmParameters(algoId = 'fges',scoreId = 'bdeu') #Compute FGES search tetradrunner <- tetradrunner(algoId = 'fges',df = audiology,scoreId = 'bdeu',dataType = 'discrete', alpha=0.1,faithfulnessAssumed=TRUE,maxDegree=-1,verbose=TRUE) tetradrunner$nodes #Show the result's nodes tetradrunner$edges #Show the result's edges Prior Knowledge Create PriorKnowledge Object forbid <- list(c('TangibilityCondition','Impact')) # List of forbidden directed edges require <- list(c('Sympathy','TangibilityCondition')) # List of required directed edges forbiddenWithin <- c('TangibilityCondition','Imaginability') class(forbiddenWithin) <- 'forbiddenWithin' # Make this tier forbidden within temporal <- list(forbiddenWithin, c('Sympathy','AmountDonated'),c('Impact')) # List of temporal node tiers prior <- priorKnowledge(forbiddirect = forbid, requiredirect = require, addtemporal = temporal) fgs <- fgs(df = charity, penaltydiscount = 2, depth = -1, ignoreLinearDependence = TRUE, heuristicSpeedup = TRUE, numOfThreads = 2, verbose = TRUE, priorKnowledge = prior) Load Knowledge File # knowledge file: audiology.prior # /knowledge # forbiddirect # class tymp # class age_gt_60 # class notch_at_4k # # requiredirect # history_noise class # # addtemporal # 0* bser late_wave_poor tymp notch_at_4k o_ar_c ar_c airBoneGap air bone o_ar_u airBoneGap # 1 history_noise history_dizziness history_buzzing history_roaring history_recruitment history_fluctuating history_heredity history_nausea # 2 class prior <- priorKnowledgeFromFile('audiology.prior') fgs.discrete <- fgs.discrete(df=audiology,structurePrior=1.0,samplePrior=1.0, depth = -1, heuristicSpeedup = TRUE, numOfThreads = 2,verbose = TRUE, priorKnowledge = prior) Plot a DOT graph library(DOT) graph_dot <- tetradrunner.tetradGraphToDot(tetradrunner$graph) dot(graph_dot) Useful rJava Trouble-shooting Installation in Mac OS X Links http://stackoverflow.com/questions/26948777/how-can-i-make-rjava-use-the-newer-version-of-java-on-osx/32544358#32544358","title":"R-causal"},{"location":"r-causal/#r-causal","text":"R Wrapper for Tetrad Library Note 2023-03-06: This version of RCausal uses an older version of Tetrad from at least 5 years ago. However, we have updated our Python integration to a much better version--see https://github.com/cmu-phil/py-tetrad . Updating our R integration is one of the next projects we will take up. News 2023-04-05: We have put forward a proposal to replace the r-causal functionality using the py-tetrad functionality, here: https://github.com/cmu-phil/py-tetrad/tree/main/pytetrad/R . The installation procedure for this is still somewhat complicated, and we will try to simplify it. If you try it and and have difficulties, please let us know. Once you have it installed, it is very easy and intuitive to use. By the way, rcausal has not been maintained for some time now, as the tireless maintainer has since moved on to different work :-)... but going back through some of the issues posted for r-causal gives some hints as to additional functionality that pytetrad/R should have. We'll try to get caught up.","title":"r-causal"},{"location":"r-causal/#r-library-requirement","text":"R >= 3.3.0, stringr , rJava ,","title":"R Library Requirement"},{"location":"r-causal/#docker","text":"As an alternative to installing the library and getting rJava working with your installation (i.e., does not work well on mac) we have a Docker image","title":"Docker"},{"location":"r-causal/#installation","text":"Install the R library requirements: install.packages(\"stringr\") install.packages(\"rJava\") Install r-causal from github: library(devtools) install_github(\"bd2kccd/r-causal\")","title":"Installation"},{"location":"r-causal/#example","text":"","title":"Example"},{"location":"r-causal/#continuous-dataset","text":"library(rcausal) data(\"charity\") #Load the charity dataset tetradrunner.getAlgorithmDescription(algoId = 'fges') tetradrunner.getAlgorithmParameters(algoId = 'fges',scoreId = 'fisher-z') #Compute FGES search tetradrunner <- tetradrunner(algoId = 'fges',df = charity,scoreId = 'fisher-z', dataType = 'continuous',alpha=0.1,faithfulnessAssumed=TRUE,maxDegree=-1,verbose=TRUE) tetradrunner$nodes #Show the result's nodes tetradrunner$edges #Show the result's edges","title":"Continuous Dataset"},{"location":"r-causal/#discrete-dataset","text":"library(rcausal) data(\"audiology\") #Load the charity dataset tetradrunner.getAlgorithmParameters(algoId = 'fges',scoreId = 'bdeu') #Compute FGES search tetradrunner <- tetradrunner(algoId = 'fges',df = audiology,scoreId = 'bdeu',dataType = 'discrete', alpha=0.1,faithfulnessAssumed=TRUE,maxDegree=-1,verbose=TRUE) tetradrunner$nodes #Show the result's nodes tetradrunner$edges #Show the result's edges","title":"Discrete Dataset"},{"location":"r-causal/#prior-knowledge","text":"","title":"Prior Knowledge"},{"location":"r-causal/#create-priorknowledge-object","text":"forbid <- list(c('TangibilityCondition','Impact')) # List of forbidden directed edges require <- list(c('Sympathy','TangibilityCondition')) # List of required directed edges forbiddenWithin <- c('TangibilityCondition','Imaginability') class(forbiddenWithin) <- 'forbiddenWithin' # Make this tier forbidden within temporal <- list(forbiddenWithin, c('Sympathy','AmountDonated'),c('Impact')) # List of temporal node tiers prior <- priorKnowledge(forbiddirect = forbid, requiredirect = require, addtemporal = temporal) fgs <- fgs(df = charity, penaltydiscount = 2, depth = -1, ignoreLinearDependence = TRUE, heuristicSpeedup = TRUE, numOfThreads = 2, verbose = TRUE, priorKnowledge = prior)","title":"Create PriorKnowledge Object"},{"location":"r-causal/#load-knowledge-file","text":"# knowledge file: audiology.prior # /knowledge # forbiddirect # class tymp # class age_gt_60 # class notch_at_4k # # requiredirect # history_noise class # # addtemporal # 0* bser late_wave_poor tymp notch_at_4k o_ar_c ar_c airBoneGap air bone o_ar_u airBoneGap # 1 history_noise history_dizziness history_buzzing history_roaring history_recruitment history_fluctuating history_heredity history_nausea # 2 class prior <- priorKnowledgeFromFile('audiology.prior') fgs.discrete <- fgs.discrete(df=audiology,structurePrior=1.0,samplePrior=1.0, depth = -1, heuristicSpeedup = TRUE, numOfThreads = 2,verbose = TRUE, priorKnowledge = prior)","title":"Load Knowledge File"},{"location":"r-causal/#plot-a-dot-graph","text":"library(DOT) graph_dot <- tetradrunner.tetradGraphToDot(tetradrunner$graph) dot(graph_dot)","title":"Plot a DOT graph"},{"location":"r-causal/#useful-rjava-trouble-shooting-installation-in-mac-os-x-links","text":"http://stackoverflow.com/questions/26948777/how-can-i-make-rjava-use-the-newer-version-of-java-on-osx/32544358#32544358","title":"Useful rJava Trouble-shooting Installation in Mac OS X Links"},{"location":"tetrad-express/","text":"Tetrad Express Description A Cytoscape application (plugin) for running a simple causal model search. Purpose Provide a basic user-friendly interface for running a simple search algorithm from Tetrad. Workflow Below are the workflows you can peform: Workflow 1: Simple Search This is the simplest workflow to run a simple search. Figure 1 shows the same workflow in Tetrad. Import data. Select algorithm. Set algorithm parameters. Run search. Display graph. Figure 1. Workflow 2: Add Knowledge This workflow is to add additional knowledge to the dataset. Figure 2 shows the same workflow in Tetrad. Import data. Select knowledge type. Set knowledge. Select algorithm. Set algorithm parameters. Run search. Display graph. Figure 2. Workflow 3: Apply Data Transformation This workflow is to apply data transformation to the dataset. Figure 3 shows the same workflow in Tetrad. Import data. Edit the data: Select a data transformation. Select algorithm. Set algorithm parameters. Run search. Display graph. Figure 3.","title":"Tetrad Express"},{"location":"tetrad-express/#tetrad-express","text":"","title":"Tetrad Express"},{"location":"tetrad-express/#description","text":"A Cytoscape application (plugin) for running a simple causal model search.","title":"Description"},{"location":"tetrad-express/#purpose","text":"Provide a basic user-friendly interface for running a simple search algorithm from Tetrad.","title":"Purpose"},{"location":"tetrad-express/#workflow","text":"Below are the workflows you can peform:","title":"Workflow"},{"location":"tetrad-express/#workflow-1-simple-search","text":"This is the simplest workflow to run a simple search. Figure 1 shows the same workflow in Tetrad. Import data. Select algorithm. Set algorithm parameters. Run search. Display graph. Figure 1.","title":"Workflow 1: Simple Search"},{"location":"tetrad-express/#workflow-2-add-knowledge","text":"This workflow is to add additional knowledge to the dataset. Figure 2 shows the same workflow in Tetrad. Import data. Select knowledge type. Set knowledge. Select algorithm. Set algorithm parameters. Run search. Display graph. Figure 2.","title":"Workflow 2: Add Knowledge"},{"location":"tetrad-express/#workflow-3-apply-data-transformation","text":"This workflow is to apply data transformation to the dataset. Figure 3 shows the same workflow in Tetrad. Import data. Edit the data: Select a data transformation. Select algorithm. Set algorithm parameters. Run search. Display graph. Figure 3.","title":"Workflow 3: Apply Data Transformation"},{"location":"tetrad/","text":"Tetrad Introduction Tetrad is a program which creates, simulates data from, estimates, tests, predicts with, and searches for causal and statistical models. The aim of the program is to provide sophisticated methods in a friendly interface requiring very little statistical sophistication of the user and no programming knowledge. It is not intended to replace flexible statistical programming systems such as Matlab, Splus or R. Tetrad is open-source, free software that performs many of the functions in commercial programs such as Netica, Hugin, LISREL, EQS and other programs, and many discovery functions these commercial programs do not perform. Tetrad User Manual The Tetrad User Manual is a comprehensive guide to get you started and become profecient on using these tools for causal inference. Tetrad Tutorial The Tetrad Tutorial describes the things you can do with Tetrad with a lot of examples.","title":"Tetrad"},{"location":"tetrad/#tetrad","text":"","title":"Tetrad"},{"location":"tetrad/#introduction","text":"Tetrad is a program which creates, simulates data from, estimates, tests, predicts with, and searches for causal and statistical models. The aim of the program is to provide sophisticated methods in a friendly interface requiring very little statistical sophistication of the user and no programming knowledge. It is not intended to replace flexible statistical programming systems such as Matlab, Splus or R. Tetrad is open-source, free software that performs many of the functions in commercial programs such as Netica, Hugin, LISREL, EQS and other programs, and many discovery functions these commercial programs do not perform.","title":"Introduction"},{"location":"tetrad/#tetrad-user-manual","text":"The Tetrad User Manual is a comprehensive guide to get you started and become profecient on using these tools for causal inference.","title":"Tetrad User Manual"},{"location":"tetrad/#tetrad-tutorial","text":"The Tetrad Tutorial describes the things you can do with Tetrad with a lot of examples.","title":"Tetrad Tutorial"}]} \ No newline at end of file +{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Welcome to CCD Docs This site hosts documentation for the Center for Causal Discovery . Tools and Software causal-cmd - a Java API and command line implementation of algorithms for performing causal discovery on Big Data. Use this software if you are interested incorporating analysis via a shell script or in a Java-based program. The software currently includes Fast Greedy Search ( FGES ) for continuous or discrete variables \u2013 an optimized version of Greedy Equivalence Search ( GES ) tested with datasets that contain as many as 1 million continuous variables, and Greedy Fast Causal Inference ( GFCI ) for continuous or discrete variables. Download a release Report bugs or issues with the software Github project Tetrad - a Java API, and desktop environment for learning, performing analyses and experimenting with causal discovery algorithms. Download the Application Tetrad Project Website Causal Web App \u2013 (unsupported and no longer maintained) our user-friendly web-based graphical interface for performing causal discovery analysis on big data using large memory servers. Github project Causal REST API \u2013 (unsupported and no longer maintained) our RESTful API for Causal Web App. Once you create a new user account via Causal Web App, you can use this REST API to upload data files and run Causal Discovery Algorithms. Github project Cytoscape-tetrad - (unsupported and no longer maintained) a native cytoscape plugin that imports tetrad txt output files that contain the structure of a causal graph. It handles causal graphs and partial ancestral graphs. Github project Ccd-annotations-cytoscape - (unsupported and no longer maintained) a native cytoscape plugin that allows for annotating nodes and edges of any cytoscape graph. Github project Py-causal - (unsupported and no longer maintained) a python module that wraps algorithms for performing causal discovery on big data. The software currently includes Fast Greedy Search ( FGES ) for both continuous and discrete variables, and Greedy Fast Causal Inference ( GFCI ) for continuous and discretevariables. Note: This project uses a very old version of Tetrad and a method of connecting Python to Java, Javabridge, that's proven sometimes buggy and hard to install on some platforms, and so we are no longer recommending it. Please consider using py-tetrad instead. Py-tetrad uses JPype to bridge Python and Java, which has already shown itself to be much easier to install and use cross-platform. Also, it allows one to use the most recent version of Tetrad, and it has been well-tested. Github project Docker container of Jupyter Notebook with Py-causal configured R-causal - (unsupported and no longer maintained) an R module that that wraps algorithms for performing causal discovery on big data. The software currently includes Fast Greedy Search ( FGES ) for both continuous and discrete variables, and Greedy Fast Causal Inference ( GFCI ) for continuous variables. Note 2023-03-06: This version of RCausal uses an older version of Tetrad from at least 5 years ago. However, we have updated our Python integration to a much better version--see https://github.com/cmu-phil/py-tetrad . Updating our R integration is one of the next projects we will take up. Github project Docker container of Jupyter Notebook with R-causal configured If you use our software in your research, please acknowledge the Center for Causal Discovery, supported by grant U54HG008540 , in any papers, presentations, or other dissemination of your work. All software is open-source and released under a dual licensing model. For non-profit institutions, the software is available under the GNU General Public License (GPL) v2 license. For-profit organizations that wish to commercialize enhanced or customized versions of the software will be able to purchase a commercial license on a case-by-case basis. The GPL license permits individuals to modify the source code and to share modifications with other colleagues/investigators. Specifically, it permits the dissemination and commercialization of enhanced or customized versions as well as incorporation of the software or its pieces into other license-compatible software packages, as long as modifications or enhancements are made open source. By using software provided by the Center for Causal Discovery, you agree that no warranties of any kind are made by Carnegie Mellon University or the University of Pittsburgh with respect to the data provided by the software or any use thereof, and the universities hereby disclaim the implied warranties of merchantability, fitness for a particular purpose, and non-infringement. The universities shall not be liable for any claims, losses, or damages of any kind arising from the data provided by the software or any use thereof.","title":"Welcome to CCD Docs"},{"location":"#welcome-to-ccd-docs","text":"This site hosts documentation for the Center for Causal Discovery .","title":"Welcome to CCD Docs"},{"location":"#tools-and-software","text":"causal-cmd - a Java API and command line implementation of algorithms for performing causal discovery on Big Data. Use this software if you are interested incorporating analysis via a shell script or in a Java-based program. The software currently includes Fast Greedy Search ( FGES ) for continuous or discrete variables \u2013 an optimized version of Greedy Equivalence Search ( GES ) tested with datasets that contain as many as 1 million continuous variables, and Greedy Fast Causal Inference ( GFCI ) for continuous or discrete variables. Download a release Report bugs or issues with the software Github project Tetrad - a Java API, and desktop environment for learning, performing analyses and experimenting with causal discovery algorithms. Download the Application Tetrad Project Website Causal Web App \u2013 (unsupported and no longer maintained) our user-friendly web-based graphical interface for performing causal discovery analysis on big data using large memory servers. Github project Causal REST API \u2013 (unsupported and no longer maintained) our RESTful API for Causal Web App. Once you create a new user account via Causal Web App, you can use this REST API to upload data files and run Causal Discovery Algorithms. Github project Cytoscape-tetrad - (unsupported and no longer maintained) a native cytoscape plugin that imports tetrad txt output files that contain the structure of a causal graph. It handles causal graphs and partial ancestral graphs. Github project Ccd-annotations-cytoscape - (unsupported and no longer maintained) a native cytoscape plugin that allows for annotating nodes and edges of any cytoscape graph. Github project Py-causal - (unsupported and no longer maintained) a python module that wraps algorithms for performing causal discovery on big data. The software currently includes Fast Greedy Search ( FGES ) for both continuous and discrete variables, and Greedy Fast Causal Inference ( GFCI ) for continuous and discretevariables. Note: This project uses a very old version of Tetrad and a method of connecting Python to Java, Javabridge, that's proven sometimes buggy and hard to install on some platforms, and so we are no longer recommending it. Please consider using py-tetrad instead. Py-tetrad uses JPype to bridge Python and Java, which has already shown itself to be much easier to install and use cross-platform. Also, it allows one to use the most recent version of Tetrad, and it has been well-tested. Github project Docker container of Jupyter Notebook with Py-causal configured R-causal - (unsupported and no longer maintained) an R module that that wraps algorithms for performing causal discovery on big data. The software currently includes Fast Greedy Search ( FGES ) for both continuous and discrete variables, and Greedy Fast Causal Inference ( GFCI ) for continuous variables. Note 2023-03-06: This version of RCausal uses an older version of Tetrad from at least 5 years ago. However, we have updated our Python integration to a much better version--see https://github.com/cmu-phil/py-tetrad . Updating our R integration is one of the next projects we will take up. Github project Docker container of Jupyter Notebook with R-causal configured If you use our software in your research, please acknowledge the Center for Causal Discovery, supported by grant U54HG008540 , in any papers, presentations, or other dissemination of your work. All software is open-source and released under a dual licensing model. For non-profit institutions, the software is available under the GNU General Public License (GPL) v2 license. For-profit organizations that wish to commercialize enhanced or customized versions of the software will be able to purchase a commercial license on a case-by-case basis. The GPL license permits individuals to modify the source code and to share modifications with other colleagues/investigators. Specifically, it permits the dissemination and commercialization of enhanced or customized versions as well as incorporation of the software or its pieces into other license-compatible software packages, as long as modifications or enhancements are made open source. By using software provided by the Center for Causal Discovery, you agree that no warranties of any kind are made by Carnegie Mellon University or the University of Pittsburgh with respect to the data provided by the software or any use thereof, and the universities hereby disclaim the implied warranties of merchantability, fitness for a particular purpose, and non-infringement. The universities shall not be liable for any claims, losses, or damages of any kind arising from the data provided by the software or any use thereof.","title":"Tools and Software"},{"location":"causal-cmd/","text":"causal-cmd v1.10.x Introduction Causal-cmd is a Java application that provides a Command-Line Interface (CLI) tool for causal discovery algorithms produced by the Center for Causal Discovery . The application currently includes the following algorithms: boss, bpc, ccd, cpc, cstar, fas, fask, fask-pw, fci, fcimax, fges, fges-mb, fofc, ftfc, gfci, grasp, grasp-fci, ica-ling-d, ica-lingam, images, mgm, pag-sampling-rfci, pc, pc-mb, pcmax, r-skew, r3, rfci, skew, spfci, svar-fci, svar-gfci Causal discovery algorithms are a class of search algorithms that explore a space of graphical causal models, i.e., graphical models where directed edges imply causation, for a model (or models) that are a good fit for a dataset. We suggest that newcomers to the field review Causation, Prediction and Search by Spirtes, Glymour and Scheines for a primer on the subject. Causal discovery algorithms allow a user to uncover the causal relationships between variables in a dataset. These discovered causal relationships may be used further--understanding the underlying the processes of a system (e.g., the metabolic pathways of an organism), hypothesis generation (e.g., variables that best explain an outcome), guide experimentation (e.g., what gene knockout experiments should be performed) or prediction (e.g. parameterization of the causal graph using data and then using it as a classifier). Command Line Usage Java 8 or higher is the only prerequisite to run the software. Note that by default Java will allocate the smaller of 1/4 system memory or 1GB to the Java virtual machine (JVM). If you run out of memory (heap memory space) running your analyses you should increase the memory allocated to the JVM with the following switch '-XmxXXG' where XX is the number of gigabytes of ram you allow the JVM to utilize. For example to allocate 8 gigabytes of ram you would add -Xmx8G immediately after the java command. In this example, we'll use download the Retention.txt Keep in mind that causal-cmd has different switches for different algorithms. To start, type the following command in your terminal: java -jar causal-cmd--jar-with-dependencies.jar ** Note: we are using causal-cmd--jar-with-dependencies.jar to indicate the actual executable jar of specific version number that is being used. ** And you'll see the following instructions: Missing required options: algorithm, data-type, dataset, delimiter usage: java -jar Causal-cmd Project-1.10.0.jar --algorithm [--comment-marker ] --data-type --dataset [--default] --delimiter [--experimental] [--help] [--help-algo-desc] [--help-all] [--help-score-desc] [--help-test-desc] [--json-graph] [--metadata ] [--no-header] [--out ] [--prefix ] [--quote-char ] [--skip-validation] [--version] --algorithm Algorithm: boss, bpc, ccd, cpc, cstar, dagma, direct-lingam, fas, fask, fask-pw, fci, fci-iod, fci-max, fges, fges-mb, fofc, ftfc, gfci, grasp, grasp-fci, ica-ling-d, ica-lingam, images, mgm, pag-sampling-rfci, pc, pc-mb, r-boss, r-skew, r3, rfci, skew, spfci, svar-fci, svar-gfci --comment-marker Comment marker. --data-type Data type: all, continuous, covariance, discrete, mixed --dataset Dataset. Multiple files are seperated by commas. --default Use Tetrad default parameter values. --delimiter Delimiter: colon, comma, pipe, semicolon, space, tab, whitespace --experimental Show experimental algorithms, tests, and scores. --help Show help. --help-algo-desc Show all the algorithms along with their descriptions. --help-all Show all options and descriptions. --help-score-desc Show all the scores along with their descriptions. --help-test-desc Show all the independence tests along with their descriptions. --json-graph Write out graph as json. --metadata Metadata file. Cannot apply to dataset without header. --no-header Indicates tabular dataset has no header. --out Output directory --prefix Replace the default output filename prefix in the format of _. --quote-char Single character denotes quote. --skip-validation Skip validation. --version Show version. Use --help for guidance list of options. Use --help-all to show all options. By specifying an algorithm using the --algorithm switch the program will indicate the additional required switches. The program reminds the user of required switches to run. In general most algorithms also require data-type, dataset, delimiter and score. The switch --help-all displays and extended list of switches for the algorithm. Example of listing all available options for an algorithm: $ java -jar causal-cmd-1.9.0-jar-with-dependencies.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic-score --help usage: java -jar Causal-cmd Project-1.10.0.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic-score [--addOriginalDataset] [--choose-dag-in-pattern] [--choose-mag-in-pag] [--comment-marker ] [--default] [--exclude-var ] [--experimental] [--external-graph ] [--extract-struct-model] [--faithfulnessAssumed] [--generate-complete-graph] [--genereate-pag-from-dag] [--genereate-pag-from-tsdag] [--genereate-pattern-from-dag] [--json-graph] [--knowledge ] [--make-all-edges-undirected] [--make-bidirected-undirected] [--make-undirected-bidirected] [--maxDegree ] [--meekVerbose] [--metadata ] [--missing-marker ] [--no-header] [--numberResampling ] [--out ] [--parallelized] [--penaltyDiscount ] [--percentResampleSize ] [--precomputeCovariances] [--prefix ] [--quote-char ] [--resamplingEnsemble ] [--resamplingWithReplacement] [--saveBootstrapGraphs] [--seed ] [--semBicRule ] [--semBicStructurePrior ] [--skip-validation] [--symmetricFirstStep] [--timeLag ] [--verbose] --addOriginalDataset Yes, if adding the original dataset as another bootstrapping --choose-dag-in-pattern Choose DAG in Pattern graph. --choose-mag-in-pag Choose MAG in PAG. --comment-marker Comment marker. --default Use Tetrad default parameter values. --exclude-var Variables to be excluded from run. --experimental Show experimental algorithms, tests, and scores. --external-graph External graph file. --extract-struct-model Extract sturct model. --faithfulnessAssumed Yes if (one edge) faithfulness should be assumed --generate-complete-graph Generate complete graph. --genereate-pag-from-dag Generate PAG from DAG. --genereate-pag-from-tsdag Generate PAG from TsDAG. --genereate-pattern-from-dag Generate pattern graph from PAG. --json-graph Write out graph as json. --knowledge Prior knowledge file. --make-all-edges-undirected Make all edges undirected. --make-bidirected-undirected Make bidirected edges undirected. --make-undirected-bidirected Make undirected edges bidirected. --maxDegree The maximum degree of the graph (min = -1) --meekVerbose Yes if verbose output for Meek rule applications should be printed or logged --metadata Metadata file. Cannot apply to dataset without header. --missing-marker Denotes missing value. --no-header Indicates tabular dataset has no header. --numberResampling The number of bootstraps/resampling iterations (min = 0) --out Output directory --parallelized Yes if the search should be parallelized --penaltyDiscount Penalty discount (min = 0.0) --percentResampleSize The percentage of resample size (min = 10%) --precomputeCovariances True if covariance matrix should be precomputed for tubular continuous data --prefix Replace the default output filename prefix in the format of _. --quote-char Single character denotes quote. --resamplingEnsemble Ensemble method: Preserved (1), Highest (2), Majority (3) --resamplingWithReplacement Yes, if sampling with replacement (bootstrapping) --saveBootstrapGraphs Yes if individual bootstrapping graphs should be saved --seed Seed for pseudorandom number generator (-1 = off) --semBicRule Lambda: 1 = Chickering, 2 = Nandy --semBicStructurePrior Structure Prior for SEM BIC (default 0) --skip-validation Skip validation. --symmetricFirstStep Yes if the first step step for FGES should do scoring for both X->Y and Y->X --timeLag A time lag for time series data, automatically applied (zero if none) --verbose Yes if verbose output should be printed or logged In this example, we'll be running the FGES algorithm on the dataset Retention.txt . $ java -jar causal-cmd-1.10.0-jar-with-dependencies.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic-score This command will output by default one file fges_.txt which is a log and result of the algorithm's activity. '--json-graph' option will enable output fges__graph.json which is a json graph from the algorithm, which is equivalent to the exported json file from tetrad-gui. Example log output from causal-cmd: ================================================================================ FGES (Wed, October 04, 2023 01:42:43 PM) ================================================================================ Runtime Parameters -------------------------------------------------------------------------------- number of threads: 7 Dataset -------------------------------------------------------------------------------- file: Retention.txt header: yes delimiter: tab quote char: none missing marker: none comment marker: none Algorithm Run -------------------------------------------------------------------------------- algorithm: FGES score: Sem BIC Score Algorithm Parameters -------------------------------------------------------------------------------- addOriginalDataset: no faithfulnessAssumed: no maxDegree: 1000 meekVerbose: no numberResampling: 0 parallelized: no penaltyDiscount: 2.0 percentResampleSize: 100 precomputeCovariances: no resamplingEnsemble: 1 resamplingWithReplacement: no saveBootstrapGraphs: no seed: -1 semBicRule: 1 semBicStructurePrior: 0.0 symmetricFirstStep: no timeLag: 0 verbose: no Wed, October 04, 2023 01:42:45 PM: Start data validation on file Retention.txt. Wed, October 04, 2023 01:42:45 PM: End data validation on file Retention.txt. There are 170 cases and 8 variables. Wed, October 04, 2023 01:42:45 PM: Start reading in file Retention.txt. Wed, October 04, 2023 01:42:45 PM: Finished reading in file Retention.txt. Wed, October 04, 2023 01:42:45 PM: File Retention.txt contains 170 cases, 8 variables. Start search: Wed, October 04, 2023 01:42:45 PM End search: Wed, October 04, 2023 01:42:45 PM ================================================================================ Graph Nodes: spending_per_stdt;grad_rate;stdt_clss_stndng;rjct_rate;tst_scores;stdt_accept_rate;stdt_tchr_ratio;fac_salary Graph Edges: 1. spending_per_stdt --- fac_salary 2. spending_per_stdt --- rjct_rate 3. spending_per_stdt --- stdt_tchr_ratio 4. stdt_accept_rate --- fac_salary 5. stdt_clss_stndng --- rjct_rate 6. stdt_clss_stndng --- tst_scores 7. tst_scores --- fac_salary 8. tst_scores --- grad_rate 9. tst_scores --- rjct_rate 10. tst_scores --- spending_per_stdt Graph Attributes: Score: -5181.565079 Graph Node Attributes: Score: [spending_per_stdt: -1408.4382541909688;grad_rate: -416.7933531919986;stdt_clss_stndng: -451.79480827547627;rjct_rate: -439.8087229322177;tst_scores: -330.2039598576225;stdt_accept_rate: -429.64771587695884;stdt_tchr_ratio: -208.85274641239832;fac_salary: -1496.025518245214] Interpretation of graph output The end of the file contains the causal graph edgesfrom the search procedure. Here is a key to the edge types: A --- B - There is causal relationship between variable A and B, but we cannot determine the direction of the relationship A --> B - There is a causal relationship from variable A to B The GFCI algorithm has additional edge types: A <-> B - There is an unmeasured confounder of A and B A o-> B - Either A is a cause of B or there is an unmeasured confounder of A and B or both A o-o B - Either (1) A is a cause of B or B is a cause of A, or (2) there is an unmeasured confounder of A and B, or both 1 and 2 hold. A --> B dd nl - Definitely direct causal relationship and no latent confounder A --> B pd nl - Possibly direct and no latent confounder A --> B pd pl - Possibly direct and possibly latent confounder NNote: the generated result file name is based on the system clock. Sample Prior Knowledge File From the above useage guide, we see the option of --knowledge , with which we can specify the prior knowledge file. Below is the content of a sample prior knowledge file: /knowledge addtemporal 1 spending_per_stdt fac_salary stdt_tchr_ratio 2 rjct_rate stdt_accept_rate 3 tst_scores stdt_clss_stndng 4* grad_rate forbiddirect x3 x4 requiredirect x1 x2 The first line of the prior knowledge file must say /knowledge . And a prior knowledge file consists of three sections: addtemporal - tiers of variables where the first tier preceeds the last. Adding a asterisk next to the tier id prohibits edges between tier variables forbiddirect - forbidden directed edges indicated by a list of pairs of variables: from -> to direction requireddirect - required directed edges indicated by a list of pairs of variables: from -> to direction","title":"Causal Cmd"},{"location":"causal-cmd/#causal-cmd-v110x","text":"","title":"causal-cmd v1.10.x"},{"location":"causal-cmd/#introduction","text":"Causal-cmd is a Java application that provides a Command-Line Interface (CLI) tool for causal discovery algorithms produced by the Center for Causal Discovery . The application currently includes the following algorithms: boss, bpc, ccd, cpc, cstar, fas, fask, fask-pw, fci, fcimax, fges, fges-mb, fofc, ftfc, gfci, grasp, grasp-fci, ica-ling-d, ica-lingam, images, mgm, pag-sampling-rfci, pc, pc-mb, pcmax, r-skew, r3, rfci, skew, spfci, svar-fci, svar-gfci Causal discovery algorithms are a class of search algorithms that explore a space of graphical causal models, i.e., graphical models where directed edges imply causation, for a model (or models) that are a good fit for a dataset. We suggest that newcomers to the field review Causation, Prediction and Search by Spirtes, Glymour and Scheines for a primer on the subject. Causal discovery algorithms allow a user to uncover the causal relationships between variables in a dataset. These discovered causal relationships may be used further--understanding the underlying the processes of a system (e.g., the metabolic pathways of an organism), hypothesis generation (e.g., variables that best explain an outcome), guide experimentation (e.g., what gene knockout experiments should be performed) or prediction (e.g. parameterization of the causal graph using data and then using it as a classifier).","title":"Introduction"},{"location":"causal-cmd/#command-line-usage","text":"Java 8 or higher is the only prerequisite to run the software. Note that by default Java will allocate the smaller of 1/4 system memory or 1GB to the Java virtual machine (JVM). If you run out of memory (heap memory space) running your analyses you should increase the memory allocated to the JVM with the following switch '-XmxXXG' where XX is the number of gigabytes of ram you allow the JVM to utilize. For example to allocate 8 gigabytes of ram you would add -Xmx8G immediately after the java command. In this example, we'll use download the Retention.txt Keep in mind that causal-cmd has different switches for different algorithms. To start, type the following command in your terminal: java -jar causal-cmd--jar-with-dependencies.jar ** Note: we are using causal-cmd--jar-with-dependencies.jar to indicate the actual executable jar of specific version number that is being used. ** And you'll see the following instructions: Missing required options: algorithm, data-type, dataset, delimiter usage: java -jar Causal-cmd Project-1.10.0.jar --algorithm [--comment-marker ] --data-type --dataset [--default] --delimiter [--experimental] [--help] [--help-algo-desc] [--help-all] [--help-score-desc] [--help-test-desc] [--json-graph] [--metadata ] [--no-header] [--out ] [--prefix ] [--quote-char ] [--skip-validation] [--version] --algorithm Algorithm: boss, bpc, ccd, cpc, cstar, dagma, direct-lingam, fas, fask, fask-pw, fci, fci-iod, fci-max, fges, fges-mb, fofc, ftfc, gfci, grasp, grasp-fci, ica-ling-d, ica-lingam, images, mgm, pag-sampling-rfci, pc, pc-mb, r-boss, r-skew, r3, rfci, skew, spfci, svar-fci, svar-gfci --comment-marker Comment marker. --data-type Data type: all, continuous, covariance, discrete, mixed --dataset Dataset. Multiple files are seperated by commas. --default Use Tetrad default parameter values. --delimiter Delimiter: colon, comma, pipe, semicolon, space, tab, whitespace --experimental Show experimental algorithms, tests, and scores. --help Show help. --help-algo-desc Show all the algorithms along with their descriptions. --help-all Show all options and descriptions. --help-score-desc Show all the scores along with their descriptions. --help-test-desc Show all the independence tests along with their descriptions. --json-graph Write out graph as json. --metadata Metadata file. Cannot apply to dataset without header. --no-header Indicates tabular dataset has no header. --out Output directory --prefix Replace the default output filename prefix in the format of _. --quote-char Single character denotes quote. --skip-validation Skip validation. --version Show version. Use --help for guidance list of options. Use --help-all to show all options. By specifying an algorithm using the --algorithm switch the program will indicate the additional required switches. The program reminds the user of required switches to run. In general most algorithms also require data-type, dataset, delimiter and score. The switch --help-all displays and extended list of switches for the algorithm. Example of listing all available options for an algorithm: $ java -jar causal-cmd-1.9.0-jar-with-dependencies.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic-score --help usage: java -jar Causal-cmd Project-1.10.0.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic-score [--addOriginalDataset] [--choose-dag-in-pattern] [--choose-mag-in-pag] [--comment-marker ] [--default] [--exclude-var ] [--experimental] [--external-graph ] [--extract-struct-model] [--faithfulnessAssumed] [--generate-complete-graph] [--genereate-pag-from-dag] [--genereate-pag-from-tsdag] [--genereate-pattern-from-dag] [--json-graph] [--knowledge ] [--make-all-edges-undirected] [--make-bidirected-undirected] [--make-undirected-bidirected] [--maxDegree ] [--meekVerbose] [--metadata ] [--missing-marker ] [--no-header] [--numberResampling ] [--out ] [--parallelized] [--penaltyDiscount ] [--percentResampleSize ] [--precomputeCovariances] [--prefix ] [--quote-char ] [--resamplingEnsemble ] [--resamplingWithReplacement] [--saveBootstrapGraphs] [--seed ] [--semBicRule ] [--semBicStructurePrior ] [--skip-validation] [--symmetricFirstStep] [--timeLag ] [--verbose] --addOriginalDataset Yes, if adding the original dataset as another bootstrapping --choose-dag-in-pattern Choose DAG in Pattern graph. --choose-mag-in-pag Choose MAG in PAG. --comment-marker Comment marker. --default Use Tetrad default parameter values. --exclude-var Variables to be excluded from run. --experimental Show experimental algorithms, tests, and scores. --external-graph External graph file. --extract-struct-model Extract sturct model. --faithfulnessAssumed Yes if (one edge) faithfulness should be assumed --generate-complete-graph Generate complete graph. --genereate-pag-from-dag Generate PAG from DAG. --genereate-pag-from-tsdag Generate PAG from TsDAG. --genereate-pattern-from-dag Generate pattern graph from PAG. --json-graph Write out graph as json. --knowledge Prior knowledge file. --make-all-edges-undirected Make all edges undirected. --make-bidirected-undirected Make bidirected edges undirected. --make-undirected-bidirected Make undirected edges bidirected. --maxDegree The maximum degree of the graph (min = -1) --meekVerbose Yes if verbose output for Meek rule applications should be printed or logged --metadata Metadata file. Cannot apply to dataset without header. --missing-marker Denotes missing value. --no-header Indicates tabular dataset has no header. --numberResampling The number of bootstraps/resampling iterations (min = 0) --out Output directory --parallelized Yes if the search should be parallelized --penaltyDiscount Penalty discount (min = 0.0) --percentResampleSize The percentage of resample size (min = 10%) --precomputeCovariances True if covariance matrix should be precomputed for tubular continuous data --prefix Replace the default output filename prefix in the format of _. --quote-char Single character denotes quote. --resamplingEnsemble Ensemble method: Preserved (1), Highest (2), Majority (3) --resamplingWithReplacement Yes, if sampling with replacement (bootstrapping) --saveBootstrapGraphs Yes if individual bootstrapping graphs should be saved --seed Seed for pseudorandom number generator (-1 = off) --semBicRule Lambda: 1 = Chickering, 2 = Nandy --semBicStructurePrior Structure Prior for SEM BIC (default 0) --skip-validation Skip validation. --symmetricFirstStep Yes if the first step step for FGES should do scoring for both X->Y and Y->X --timeLag A time lag for time series data, automatically applied (zero if none) --verbose Yes if verbose output should be printed or logged In this example, we'll be running the FGES algorithm on the dataset Retention.txt . $ java -jar causal-cmd-1.10.0-jar-with-dependencies.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic-score This command will output by default one file fges_.txt which is a log and result of the algorithm's activity. '--json-graph' option will enable output fges__graph.json which is a json graph from the algorithm, which is equivalent to the exported json file from tetrad-gui. Example log output from causal-cmd: ================================================================================ FGES (Wed, October 04, 2023 01:42:43 PM) ================================================================================ Runtime Parameters -------------------------------------------------------------------------------- number of threads: 7 Dataset -------------------------------------------------------------------------------- file: Retention.txt header: yes delimiter: tab quote char: none missing marker: none comment marker: none Algorithm Run -------------------------------------------------------------------------------- algorithm: FGES score: Sem BIC Score Algorithm Parameters -------------------------------------------------------------------------------- addOriginalDataset: no faithfulnessAssumed: no maxDegree: 1000 meekVerbose: no numberResampling: 0 parallelized: no penaltyDiscount: 2.0 percentResampleSize: 100 precomputeCovariances: no resamplingEnsemble: 1 resamplingWithReplacement: no saveBootstrapGraphs: no seed: -1 semBicRule: 1 semBicStructurePrior: 0.0 symmetricFirstStep: no timeLag: 0 verbose: no Wed, October 04, 2023 01:42:45 PM: Start data validation on file Retention.txt. Wed, October 04, 2023 01:42:45 PM: End data validation on file Retention.txt. There are 170 cases and 8 variables. Wed, October 04, 2023 01:42:45 PM: Start reading in file Retention.txt. Wed, October 04, 2023 01:42:45 PM: Finished reading in file Retention.txt. Wed, October 04, 2023 01:42:45 PM: File Retention.txt contains 170 cases, 8 variables. Start search: Wed, October 04, 2023 01:42:45 PM End search: Wed, October 04, 2023 01:42:45 PM ================================================================================ Graph Nodes: spending_per_stdt;grad_rate;stdt_clss_stndng;rjct_rate;tst_scores;stdt_accept_rate;stdt_tchr_ratio;fac_salary Graph Edges: 1. spending_per_stdt --- fac_salary 2. spending_per_stdt --- rjct_rate 3. spending_per_stdt --- stdt_tchr_ratio 4. stdt_accept_rate --- fac_salary 5. stdt_clss_stndng --- rjct_rate 6. stdt_clss_stndng --- tst_scores 7. tst_scores --- fac_salary 8. tst_scores --- grad_rate 9. tst_scores --- rjct_rate 10. tst_scores --- spending_per_stdt Graph Attributes: Score: -5181.565079 Graph Node Attributes: Score: [spending_per_stdt: -1408.4382541909688;grad_rate: -416.7933531919986;stdt_clss_stndng: -451.79480827547627;rjct_rate: -439.8087229322177;tst_scores: -330.2039598576225;stdt_accept_rate: -429.64771587695884;stdt_tchr_ratio: -208.85274641239832;fac_salary: -1496.025518245214]","title":"Command Line Usage"},{"location":"causal-cmd/#interpretation-of-graph-output","text":"The end of the file contains the causal graph edgesfrom the search procedure. Here is a key to the edge types: A --- B - There is causal relationship between variable A and B, but we cannot determine the direction of the relationship A --> B - There is a causal relationship from variable A to B The GFCI algorithm has additional edge types: A <-> B - There is an unmeasured confounder of A and B A o-> B - Either A is a cause of B or there is an unmeasured confounder of A and B or both A o-o B - Either (1) A is a cause of B or B is a cause of A, or (2) there is an unmeasured confounder of A and B, or both 1 and 2 hold. A --> B dd nl - Definitely direct causal relationship and no latent confounder A --> B pd nl - Possibly direct and no latent confounder A --> B pd pl - Possibly direct and possibly latent confounder NNote: the generated result file name is based on the system clock.","title":"Interpretation of graph output"},{"location":"causal-cmd/#sample-prior-knowledge-file","text":"From the above useage guide, we see the option of --knowledge , with which we can specify the prior knowledge file. Below is the content of a sample prior knowledge file: /knowledge addtemporal 1 spending_per_stdt fac_salary stdt_tchr_ratio 2 rjct_rate stdt_accept_rate 3 tst_scores stdt_clss_stndng 4* grad_rate forbiddirect x3 x4 requiredirect x1 x2 The first line of the prior knowledge file must say /knowledge . And a prior knowledge file consists of three sections: addtemporal - tiers of variables where the first tier preceeds the last. Adding a asterisk next to the tier id prohibits edges between tier variables forbiddirect - forbidden directed edges indicated by a list of pairs of variables: from -> to direction requireddirect - required directed edges indicated by a list of pairs of variables: from -> to direction","title":"Sample Prior Knowledge File"},{"location":"causal-rest-api/","text":"Causal REST API v0.0.8 This RESTful API is designed for causal web. And it implements the JAX-RS specifications using Jersey. Table of Contents Installation Prerequisites Dependencies Configuration Start the API Server API Usage and Examples Getting JSON Web Token(JWT) 1. Data Management Upload small data file Resumable data file upload List all dataset files of a user Get the detail information of a dataset file based on ID Delete physical dataset file and all records from database for a given file ID Summarize dataset file List all prior knowledge files of a given user Get the detail information of a prior knowledge file based on ID Delete physical prior knowledge file and all records from database for a given file ID 2. Causal Discovery List all the available causal discovery algorithms Add a new job to run the desired algorithm on a given data file List all running jobs Check the job status for a given job ID Cancel a running job 3. Result Management List all result files generated by the algorithm Download a specific result file generated by the algorithm based on file name Compare algorithm result files List all the comparison files Download a specific comparison file based on file name Installation The following installation instructions are supposed to be used by the server admin who deploys this API server. API users can skip this section and just start reading from the API Usage and Examples section. Prerequisites You must have the following installed to build/install Causal REST API: Oracle Java SE Development Kit 8 Maven 3.x Dependencies If you want to run this API server and expose the API to your users, you'll first need to have the Causal Web Application installed and running. Your API users will use this web app to create their user accounts before they can consume the API. Note: currently new users can also be created using Auth0 login option, but the API doesn't work for these users. In order to build the API server, you'll need the released version of ccd-commons-0.3.1 by going to the repo and checkout this specific release version: git clone https://github.com/bd2kccd/ccd-commons.git cd ccd-commons git checkout tags/v0.3.1 mvn clean install You'll also need to download released ccd-db-0.6.3 : git clone https://github.com/bd2kccd/ccd-db.git cd ccd-db git checkout tags/v0.6.3 mvn clean install Then you can go get and install causal-rest-api : git clone https://github.com/bd2kccd/causal-rest-api.git cd causal-rest-api mvn clean package Configuration There are 4 configuration files to configure located at causal-rest-api/src/main/resources : - application-hsqldb.properties : HSQLDB database configurations (for testing only). - application-mysql.properties : MySQL database configurations - application-slurm.properties : Slurm setting for HPC - application.properties : Spring Boot application settings - causal.properties : Data file directory path and folder settings Befor editing the causal.properties file, you need to create a workspace for the application to work in. Create a directory called workspace, for an example /home/zhy19/ccd/workspace . Inside the workspace directory, create another folder called lib . Then build the jar file of Tetred using the latest development branch . After that, copy the jar file to the lib folder created earlier. Start the API Server Once you have all the settings configured, go to causal-rest-api/target and you will find the jar file named causal-rest-api.jar . Then simply run java -jar causal-rest-api.jar API Usage and Examples In the following sections, we'll demonstrate the API usage with examples using the API server that is running on Pittsburgh Super Computing. The API base URI is https:// >/ccd-api. This API requires user to be authenticated. Before using this API, the user creates an account in the Causal Web App. Getting JSON Web Token(JWT) After registration in Causal Web App, the email and password can be used to authenticate against the Causal REST API to get the access token (we use JWT) via HTTP Basic Auth . API Endpoint URI pattern: GET https:///ccd-api/jwt In basic auth, the user provides the username and password, which the HTTP client concatenates (username + \":\" + password), and base64 encodes it. This encoded string is then sent using a Authorization header with the \"Basic\" schema. For instance user email demo@pitt.edu whose password is 123 . POST /ccd-api/jwt HTTP/1.1 Host: Authorization: Basic ZGVtb0BwaXR0LmVkdToxMjM= Once the request is processed successfully, the user ID together with a JWT will be returned in the response for further API queries. { \"userId\": 22, \"jwt\": \"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA0Mjg1OTcsImlhdCI6MTQ3NTg0NjgyODU5N30.FcE7aEpg0u2c-gUVugIjJkzjhlDu5qav_XHtgLu3c6E\", \"issuedTime\": 1475846828597, \"lifetime\": 3600, \"expireTime\": 1475850428597, \"wallTime\": [ 1, 3, 6 ] } We'll need to use this userId in the URI path of all subsequent requests. And this jwt expires in 3600 seconds(1 hour), so the API consumer will need to request for another JWT otherwise the API query to other API endpoints will be denied. And this JWT will need to be sent via the HTTP Authorization header as well, but using the Bearer schema. The wallTime field is designed for users who want to specify the the maximum CPU time when Slurm handles the jobs on PSC. Normally, a job is expected to finish before the specified maximum walltime. After the walltime reaches the maximum, the job terminates regardless whether the job processes are still running or not. In this example, you can pick 1 hour, 3 or 6 hours as the wallTime. Note: querying the JWT endpoint again before the current JWT expires will generate a new JWT, which makes the old JWT expired automatically. And this newly generated JWT will be valid in another hour unless there's another new JWT being queried. Since this API is developed with Jersey, which supports WADL . So you can view the generated WADL by going to https:///ccd-api/application.wadl?detail=true and see all resource available in the application. Accessing to this endpoint doesn't require authentication. Basically, all the API usage examples are grouped into three categories: Data Management Causal Discovery Result Management And all the following examples will be issued by user 22 whose password is 123 . 1. Data Management Upload small data file At this point, you can upload two types of data files: tabular dataset file(either tab delimited or comma delimited) and prior knowledge file. API Endpoint URI pattern: POST https:///ccd-api/{userId}/dataset/upload This is a multipart file upload via an HTML form, and the client is required to use name=\"file\" to name their file upload field in their form. Generated HTTP request code example: POST /ccd-api/22/dataset/upload HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW ----WebKitFormBoundary7MA4YWxkTrZu0gW Content-Disposition: form-data; name=\"file\"; filename=\"\" Content-Type: ----WebKitFormBoundary7MA4YWxkTrZu0gW If the Authorization header is not provided, the response will look like this: { \"timestamp\": 1465414501443, \"status\": 401, \"error\": \"Unauthorized\", \"message\": \"User credentials are required.\", \"path\": \"/22/dataset/upload\" } This POST request will upload the dataset file to the target server location and add corresponding records into database. And the response will contain the following pieces: { \"id\": 6, \"name\": \"Lung-tetrad_hv.txt\", \"creationTime\": 1466622267000, \"lastModifiedTime\": 1466622267000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\", \"fileSummary\": { \"variableType\": null, \"fileDelimiter\": null, \"numOfRows\": null, \"numOfColumns\": null } } The prior knowledge file upload uses a similar API endpoint: POST https:///ccd-api/{userId}/priorknowledge/upload Due to there's no need to summarize a prior knowledge file, the response of a successful prior knowledge file upload will look like: { \"id\": 6, \"name\": \"Lung-tetrad_hv.txt\", \"creationTime\": 1466622267000, \"lastModifiedTime\": 1466622267000, \"fileSize\": 3309465, \"md5checkSum\": \"ugdb7511rt293d29ke3055d9a7b46c9k\" } Resumable data file upload In addition to the regular file upload described in Example 6, we also provide the option of stable and resumable large file upload. It requires the client side to have a resumable upload implementation. We currently support client integrated with Resumable.js , whihc provides multiple simultaneous, stable and resumable uploads via the HTML5 File API. You can also create your own client as long as al the following parameters are set correctly. API Endpoint URI pattern: GET https:///ccd-api/{userId}/chunkupload POST https:///ccd-api/{userId}/chunkupload In this example, the data file is splited into 3 chunks. The upload of each chunk consists of a GET request and a POST request. To handle the state of upload chunks, a number of extra parameters are sent along with all requests: resumableChunkNumber : The index of the chunk in the current upload. First chunk is 1 (no base-0 counting here). resumableChunkSize : The general chunk size. Using this value and resumableTotalSize you can calculate the total number of chunks. Please note that the size of the data received in the HTTP might be lower than resumableChunkSize of this for the last chunk for a file. resumableCurrentChunkSize : The size of the current resumable chuck. resumableTotalSize : The total file size. resumableType : The file type of the resumable chuck, e.e., \"text/plain\". resumableIdentifier : A unique identifier for the file contained in the request. resumableFilename : The original file name (since a bug in Firefox results in the file name not being transmitted in chunk multipart posts). resumableRelativePath : The file's relative path when selecting a directory (defaults to file name in all browsers except Chrome). resumableTotalChunks : The total number of chunks. Generated HTTP request code example: GET /ccd-api/22/chunkupload?resumableChunkNumber=2&resumableChunkSize=1048576&resumableCurrentChunkSize=1048576&resumableTotalSize=3309465&resumableType=text%2Fplain&resumableIdentifier=3309465-large-datatxt&resumableFilename=large-data.txt&resumableRelativePath=large-data.txt&resumableTotalChunks=3 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY This GET request checks if the data chunk is already on the server side. If the target file chunk is not found on the server, the client will issue a POST request to upload the actual data. Generated HTTP request code example: POST /ccd-api/22/chunkupload HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryMFjgApg56XGyeTnZ ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableChunkNumber\" 2 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableChunkSize\" 1048576 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableCurrentChunkSize\" 1048576 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableTotalSize\" 3309465 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableType\" text/plain ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableIdentifier\" 3309465-large-datatxt ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableFilename\" large-data.txt ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableRelativePath\" large-data.txt ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableTotalChunks\" 3 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"file\"; filename=\"blob\" Content-Type: application/octet-stream ------WebKitFormBoundaryMFjgApg56XGyeTnZ-- Each chunk upload POST will get a 200 status code from response if everything works fine. And finally the md5checkSum string of the reassemabled file will be returned once the whole file has been uploaded successfully. In this example, the POST request that uploads the third chunk will response this: b1db7511ee293d297e3055d9a7b46c5e List all dataset files of a user API Endpoint URI pattern: GET https:///ccd-api/{userId}/dataset Generated HTTP request code example: GET /ccd-api/22/dataset HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Accept: application/json A JSON formatted list of all the input dataset files that are associated with user 22 will be returned. [ { \"id\": 8, \"name\": \"data_small.txt\", \"creationTime\": 1467132449000, \"lastModifiedTime\": 1467132449000, \"fileSize\": 278428, \"md5checkSum\": \"ed5f27a2cf94fe3735a5d9ed9191c382\", \"fileSummary\": { \"variableType\": \"continuous\", \"fileDelimiter\": \"tab\", \"numOfRows\": 302, \"numOfColumns\": 123 } }, { \"id\": 10, \"name\": \"large-data.txt\", \"creationTime\": 1467134048000, \"lastModifiedTime\": 1467134048000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\", \"fileSummary\": { \"variableType\": null, \"fileDelimiter\": null, \"numOfRows\": null, \"numOfColumns\": null } }, { \"id\": 11, \"name\": \"Lung-tetrad_hv (copy).txt\", \"creationTime\": 1467140415000, \"lastModifiedTime\": 1467140415000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\", \"fileSummary\": { \"variableType\": \"continuous\", \"fileDelimiter\": \"tab\", \"numOfRows\": 302, \"numOfColumns\": 608 } } ] You can also specify the response format as XML in your request Generated HTTP request code example: GET /ccd-api/22/dataset HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Accept: application/xml And the response will look like this: 8 data_small.txt 2016-06-28T12:47:29-04:00 2016-06-28T12:47:29-04:00 278428 ed5f27a2cf94fe3735a5d9ed9191c382 tab 123 302 continuous 10 large-data.txt 2016-06-28T13:14:08-04:00 2016-06-28T13:14:08-04:00 3309465 b1db7511ee293d297e3055d9a7b46c5e 11 Lung-tetrad_hv (copy).txt 2016-06-28T15:00:15-04:00 2016-06-28T15:00:15-04:00 3309465 b1db7511ee293d297e3055d9a7b46c5e tab 608 302 continuous Form the above output, we can also tell that data file with ID 10 doesn't have all the fileSummary field values set, we'll cover this in the dataset summarization section. Get the detail information of a dataset file based on ID API Endpoint URI pattern: GET https:///ccd-api/{userId}/dataset/{id} Generated HTTP request code example: GET /ccd-api/22/dataset/8 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY And the resulting response looks like this: { \"id\": 8, \"name\": \"data_small.txt\", \"creationTime\": 1467132449000, \"lastModifiedTime\": 1467132449000, \"fileSize\": 278428, \"fileSummary\": { \"md5checkSum\": \"ed5f27a2cf94fe3735a5d9ed9191c382\", \"variableType\": \"continuous\", \"fileDelimiter\": \"tab\", \"numOfRows\": 302, \"numOfColumns\": 123 } } Delete physical dataset file and all records from database for a given file ID API Endpoint URI pattern: DELETE https:///ccd-api/{userId}/dataset/{id} Generated HTTP request code example: DELETE /ccd-api/22/dataset/8 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY And this will result a HTTP 204 No Content status in response on success, which means the server successfully processed the deletion request but there's no content to response. Summarize dataset file So from the first example we can tell that file with ID 10 doesn't have variableType , fileDelimiter , numOfRows , and numOfColumns specified under fileSummary . Among these attributes, variableType and fileDelimiter` are the ones that users will need to provide during this summarization process. Before we can go ahead to run the desired algorithm with the newly uploaded data file, we'll need to summarize the data by specifing the variable type and file delimiter. Required Fields Description id The data file ID variableType discrete or continuous fileDelimiter tab or comma API Endpoint URI pattern: POST https:///ccd-api/{userId}/dataset/summarize Generated HTTP request code example: POST /ccd-api/22/dataset/summarize HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: application/json { \"id\": 1, \"variableType\": \"continuous\", \"fileDelimiter\": \"comma\" } This POST request will summarize the dataset file and generate a response (JSON or XML) like below: { \"id\": 10, \"name\": \"large-data.txt\", \"creationTime\": 1467134048000, \"lastModifiedTime\": 1467134048000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\", \"fileSummary\": { \"variableType\": \"continuous\", \"fileDelimiter\": \"tab\", \"numOfRows\": 302, \"numOfColumns\": 608 } } List all prior knowledge files of a given user API Endpoint URI pattern: GET https:///ccd-api/{userId}/priorknowledge Generated HTTP request code example: GET /ccd-api/22/priorknowledge HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Accept: application/json A JSON formatted list of all the input dataset files that are associated with user 22 will be returned. [ { \"id\": 9, \"name\": \"data_small.prior\", \"creationTime\": 1467132449000, \"lastModifiedTime\": 1467132449000, \"fileSize\": 278428, \"md5checkSum\": \"ed5f27a2cf94fe3735a5d9ed9191c382\" }, { \"id\": 12, \"name\": \"large-data.prior\", \"creationTime\": 1467134048000, \"lastModifiedTime\": 1467134048000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\" } ] Get the detail information of a prior knowledge file based on ID API Endpoint URI pattern: GET https:///ccd-api/{userId}/priorknowledge/{id} Generated HTTP request code example: GET /ccd-api/22/priorknowledge/9 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY And the resulting response looks like this: { \"id\": 9, \"name\": \"data_small.prior\", \"creationTime\": 1467132449000, \"lastModifiedTime\": 1467132449000, \"fileSize\": 278428, \"md5checkSum\": \"ed5f27a2cf94fe3735a5d9ed9191c382\" } Delete physical prior knowledge file and all records from database for a given file ID API Endpoint URI pattern: DELETE https:///ccd-api/{userId}/priorknowledge/{id} Generated HTTP request code example: DELETE /ccd-api/22/priorknowledge/9 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY And this will result a HTTP 204 No Content status in response on success, which means the server successfully processed the deletion request but there's no content to response. 2. Causal Discovery Once the data file is uploaded and summaried, you can start running a Causal Discovery Algorithm on the uploaded data file. List all the available causal discovery algorithms API Endpoint URI pattern: GET https:///ccd-api/{userId}/algorithms Generated HTTP request code example: GET /ccd-api/22/algorithms HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY [ { \"id\": 1, \"name\": \"FGESc\", \"description\": \"FGES continuous\" }, { \"id\": 2, \"name\": \"FGESd\", \"description\": \"FGES discrete\" }, { \"id\": 3, \"name\": \"GFCIc\", \"description\": \"GFCI continuous\" }, { \"id\": 4, \"name\": \"GFCId\", \"description\": \"GFCI discrete\" } ] Currently we support \"FGES continuous\", \"FGES discrete\", \"GFCI continuous\", and \"GFCI discrete\". They also share a common JSON structure as of their input, for example: Input JSON Fields Description datasetFileId The dataset file ID, integer priorKnowledgeFileId The optional prior knowledge file ID, integer dataValidation Algorithm specific input data validation flags, JSON object algorithmParameters Algorithm specific parameters, JSON object jvmOptions Advanced Options For Java Virtual Machine (JVM), JSON object. Currently only support maxHeapSize (Gigabyte, max value is 100) hpcParameters Parameters for High-Performance Computing, JSON array of key-value objects. Currently only support wallTime Below are the data validation flags and parameters that you can use for each algorithm. FGES continuous Data validation: Parameters Description Default Value skipNonzeroVariance Skip check for zero variance variables false skipUniqueVarName Skip check for unique variable names false Algorithm parameters: Parameters Description Default Value faithfulnessAssumed Yes if (one edge) faithfulness should be assumed true maxDegree The maximum degree of the output graph 100 penaltyDiscount Penalty discount 4.0 verbose Print additional information true FGES discrete Data validation: Parameters Description Default Value skipUniqueVarName Skip check for unique variable names false skipCategoryLimit Skip 'limit number of categories' check false Algorithm parameters: Parameters Description Default Value structurePrior Structure prior coefficient 1.0 samplePrior Sample prior 1.0 maxDegree The maximum degree of the output graph 100 faithfulnessAssumed Yes if (one edge) faithfulness should be assumed true verbose Print additional information true GFCI continuous Data validation: Parameters Description Default Value skipNonzeroVariance Skip check for zero variance variables false skipUniqueVarName Skip check for unique variable names false Algorithm parameters: Parameters Description Default Value alpha Cutoff for p values (alpha) 0.01 penaltyDiscount Penalty discount 4.0 maxDegree The maximum degree of the output graph 100 faithfulnessAssumed Yes if (one edge) faithfulness should be assumed false verbose Print additional information true GFCI discrete Data validation: Parameters Description Default Value skipUniqueVarName Skip check for unique variable names false skipCategoryLimit Skip 'limit number of categories' check false Algorithm parameters: Parameters Description Default Value alpha Cutoff for p values (alpha) 0.01 structurePrior Structure prior coefficient 1.0 samplePrior Sample prior 1.0 maxDegree The maximum degree of the output graph 100 faithfulnessAssumed Yes if (one edge) faithfulness should be assumed false verbose Print additional information true Add a new job to run the desired algorithm on a given data file This is a POST request and the algorithm details and data file id will need to be specified in the POST body as a JSON when you make the request. API Endpoint URI pattern: POST https:///ccd-api/{userId}/jobs/FGESc Generated HTTP request code example: POST /ccd-api/22/jobs/FGESc HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: application/json { \"datasetFileId\": 8, \"priorKnowledgeFileId\": 9, \"dataValidation\": { \"skipNonzeroVariance\": true, \"skipUniqueVarName\": true }, \"algorithmParameters\": { \"penaltyDiscount\": 5.0, \"maxDegree\": 100 }, \"jvmOptions\": { \"maxHeapSize\": 100 }, \"hpcParameters\": [ { \"key\":\"wallTime\", \"value\":1 } ] } In this example, we are running the \"FGES continuous\" algorithm on the file of ID 8. We also set the wallTime as 1 hour. And this call will return the job info with a 201 Created response status code. { \"id\": 5, \"algorithmName\": \"FGESc\", \"status\": 0, \"addedTime\": 1472742564355, \"resultFileName\": \"FGESc_data_small.txt_1472742564353.txt\", \"errorResultFileName\": \"error_FGESc_data_small.txt_1472742564353.txt\" } From this response we can tell that the job ID is 5, and the result file name will be FGESc_data_small.txt_1472742564353.txt if everything goes well. If something is wrong an error result file with name error_FGEsc_data_small.txt_1472742564353.txt will be created. When you need to run \"FGES discrete\", just send the request to a different endpont URI: API Endpoint URI pattern: POST https:///ccd-api/{userId}/jobs/FGESd Generated HTTP request code example: POST /ccd-api/22/jobs/FGESd HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: application/json { \"datasetFileId\": 10, \"priorKnowledgeFileId\": 12, \"dataValidation\": { \"skipUniqueVarName\": true, \"skipCategoryLimit\": true }, \"algorithmParameters\": { \"structurePrior\": 1.0, \"samplePrior\": 1.0, \"maxDegree\": 102 }, \"jvmOptions\": { \"maxHeapSize\": 100 } } List all running jobs API Endpoint URI pattern: GET https:///ccd-api/{userId}/jobs Generated HTTP request code example: GET /ccd-api/22/jobs/ HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: application/json Then you'll see the information of all jobs that are currently running: [ { \"id\": 32, \"algorithmName\": \"FGESc\", \"addedTime\": 1468436085000 }, { \"id\": 33, \"algorithmName\": \"FGESd\", \"addedTime\": 1468436087000 } ] Check the job status for a given job ID Once the new job is submitted, it takes time and resources to run the algorithm on the server. During the waiting, you can check the status of a given job ID: API Endpoint URI pattern: GET https:///ccd-api/{userId}/jobs/{id} Generated HTTP request code example: GET /ccd-api/22/jobs/32 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY This will either return \"Pending\" or \"Completed\". Cancel a running job Sometimes you may want to cancel a submitted job. API Endpoint URI pattern: DELETE https:///ccd-api/{userId}/jobs/{id} Generated HTTP request code example: DELETE /ccd-api/22/jobs/8 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY This call will response either \"Job 8 has been canceled\" or \"Unable to cancel job 8\". It's not guranteed that the system can always cencal a job successfully. 3. Result Management List all result files generated by the algorithm API Endpoint URI pattern: GET https:///ccd-api/{userId}/results Generated HTTP request code example: GET /ccd-api/22/results HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY The response to this request will look like this: [ { \"name\": \"FGESc_sim_data_20vars_100cases.csv_1466171729046.txt\", \"creationTime\": 1466171732000, \"lastModifiedTime\": 1466171732000, \"fileSize\": 1660 }, { \"name\": \"FGESc_data_small.txt_1466172140585.txt\", \"creationTime\": 1466172145000, \"lastModifiedTime\": 1466172145000, \"fileSize\": 39559 } ] Download a specific result file generated by the algorithm based on file name API Endpoint URI pattern: GET https:///ccd-api/{userId}/results/{result_file_name} Generated HTTP request code example: GET /ccd-api/22/results/FGESc_data_small.txt_1466172140585.txt HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY On success, you will get the result file back as text file content. If there's a typo in file name of the that file doesn't exist, you'll get either a JSON or XML message based on the accept header in your request: The response to this request will look like this: { \"timestamp\": 1467210996233, \"status\": 404, \"error\": \"Not Found\", \"message\": \"Resource not found.\", \"path\": \"/22/results/FGESc_data_small.txt_146172140585.txt\" } Compare algorithm result files Since we can list all the algorithm result files, based on the results, we can also choose multiple files and run a comparison. API Endpoint URI pattern: POST https:///ccd-api/{userId}/results/compare The request body is a JSON that contains an array of result files to be compared. Generated HTTP request code example: POST /ccd-api/22/results/compare HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY { \"resultFiles\": [ \"FGESc_sim_data_20vars_100cases.csv_1466171729046.txt\", \"FGESc_data_small.txt_1467305104859.txt\" ] } When you specify multiple file names, use the !! as a delimiter. This request will generate a result comparison file with the following content (shortened version): FGESc_sim_data_20vars_100cases.csv_1466171729046.txt FGESc_data_small.txt_1467305104859.txt Edges In All Same End Point NR4A2,FOS 0 0 X5,X17 0 0 MMP11,ASB5 0 0 X12,X8 0 0 hsa_miR_654_3p,hsa_miR_337_3p 0 0 RND1,FGA 0 0 HHLA2,UBXN10 0 0 HS6ST2,RND1 0 0 SCRG1,hsa_miR_377 0 0 CDH3,diag 0 0 SERPINI2,FGG 0 0 hsa_miR_451,hsa_miR_136_ 0 0 From this comparison, you can see if the two algorithm graphs have common edges and endpoints. List all the comparison files API Endpoint URI pattern: GET https:///ccd-api/{userId}/results/comparisons Generated HTTP request code example: GET /ccd-api/22/results/comparisons HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY The response will show a list of comparison files: [ { \"name\": \"result_comparison_1467385923407.txt\", \"creationTime\": 1467385923000, \"lastModifiedTime\": 1467385923000, \"fileSize\": 7505 }, { \"name\": \"result_comparison_1467387034358.txt\", \"creationTime\": 1467387034000, \"lastModifiedTime\": 1467387034000, \"fileSize\": 7505 }, { \"name\": \"result_comparison_1467388042261.txt\", \"creationTime\": 1467388042000, \"lastModifiedTime\": 1467388042000, \"fileSize\": 7533 } ] Download a specific comparison file based on file name API Endpoint URI pattern: GET https:///ccd-api/{userId}/results/comparisons/{comparison_file_name} Generated HTTP request code example: GET /ccd-api/22/results/comparisons/result_comparison_1467388042261.txt HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Then it returns the content of that comparison file (shorted version): FGESc_sim_data_20vars_100cases.csv_1466171729046.txt FGESc_data_small.txt_1467305104859.txt Edges In All Same End Point NR4A2,FOS 0 0 X5,X17 0 0 MMP11,ASB5 0 0 X12,X8 0 0 hsa_miR_654_3p,hsa_miR_337_3p 0 0 RND1,FGA 0 0 HHLA2,UBXN10 0 0 HS6ST2,RND1 0 0 SCRG1,hsa_miR_377 0 0 CDH3,diag 0 0 SERPINI2,FGG 0 0","title":"Causal REST API"},{"location":"causal-rest-api/#causal-rest-api-v008","text":"This RESTful API is designed for causal web. And it implements the JAX-RS specifications using Jersey. Table of Contents Installation Prerequisites Dependencies Configuration Start the API Server API Usage and Examples Getting JSON Web Token(JWT) 1. Data Management Upload small data file Resumable data file upload List all dataset files of a user Get the detail information of a dataset file based on ID Delete physical dataset file and all records from database for a given file ID Summarize dataset file List all prior knowledge files of a given user Get the detail information of a prior knowledge file based on ID Delete physical prior knowledge file and all records from database for a given file ID 2. Causal Discovery List all the available causal discovery algorithms Add a new job to run the desired algorithm on a given data file List all running jobs Check the job status for a given job ID Cancel a running job 3. Result Management List all result files generated by the algorithm Download a specific result file generated by the algorithm based on file name Compare algorithm result files List all the comparison files Download a specific comparison file based on file name","title":"Causal REST API v0.0.8"},{"location":"causal-rest-api/#installation","text":"The following installation instructions are supposed to be used by the server admin who deploys this API server. API users can skip this section and just start reading from the API Usage and Examples section.","title":"Installation"},{"location":"causal-rest-api/#prerequisites","text":"You must have the following installed to build/install Causal REST API: Oracle Java SE Development Kit 8 Maven 3.x","title":"Prerequisites"},{"location":"causal-rest-api/#dependencies","text":"If you want to run this API server and expose the API to your users, you'll first need to have the Causal Web Application installed and running. Your API users will use this web app to create their user accounts before they can consume the API. Note: currently new users can also be created using Auth0 login option, but the API doesn't work for these users. In order to build the API server, you'll need the released version of ccd-commons-0.3.1 by going to the repo and checkout this specific release version: git clone https://github.com/bd2kccd/ccd-commons.git cd ccd-commons git checkout tags/v0.3.1 mvn clean install You'll also need to download released ccd-db-0.6.3 : git clone https://github.com/bd2kccd/ccd-db.git cd ccd-db git checkout tags/v0.6.3 mvn clean install Then you can go get and install causal-rest-api : git clone https://github.com/bd2kccd/causal-rest-api.git cd causal-rest-api mvn clean package","title":"Dependencies"},{"location":"causal-rest-api/#configuration","text":"There are 4 configuration files to configure located at causal-rest-api/src/main/resources : - application-hsqldb.properties : HSQLDB database configurations (for testing only). - application-mysql.properties : MySQL database configurations - application-slurm.properties : Slurm setting for HPC - application.properties : Spring Boot application settings - causal.properties : Data file directory path and folder settings Befor editing the causal.properties file, you need to create a workspace for the application to work in. Create a directory called workspace, for an example /home/zhy19/ccd/workspace . Inside the workspace directory, create another folder called lib . Then build the jar file of Tetred using the latest development branch . After that, copy the jar file to the lib folder created earlier.","title":"Configuration"},{"location":"causal-rest-api/#start-the-api-server","text":"Once you have all the settings configured, go to causal-rest-api/target and you will find the jar file named causal-rest-api.jar . Then simply run java -jar causal-rest-api.jar","title":"Start the API Server"},{"location":"causal-rest-api/#api-usage-and-examples","text":"In the following sections, we'll demonstrate the API usage with examples using the API server that is running on Pittsburgh Super Computing. The API base URI is https:// >/ccd-api. This API requires user to be authenticated. Before using this API, the user creates an account in the Causal Web App.","title":"API Usage and Examples"},{"location":"causal-rest-api/#getting-json-web-tokenjwt","text":"After registration in Causal Web App, the email and password can be used to authenticate against the Causal REST API to get the access token (we use JWT) via HTTP Basic Auth . API Endpoint URI pattern: GET https:///ccd-api/jwt In basic auth, the user provides the username and password, which the HTTP client concatenates (username + \":\" + password), and base64 encodes it. This encoded string is then sent using a Authorization header with the \"Basic\" schema. For instance user email demo@pitt.edu whose password is 123 . POST /ccd-api/jwt HTTP/1.1 Host: Authorization: Basic ZGVtb0BwaXR0LmVkdToxMjM= Once the request is processed successfully, the user ID together with a JWT will be returned in the response for further API queries. { \"userId\": 22, \"jwt\": \"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA0Mjg1OTcsImlhdCI6MTQ3NTg0NjgyODU5N30.FcE7aEpg0u2c-gUVugIjJkzjhlDu5qav_XHtgLu3c6E\", \"issuedTime\": 1475846828597, \"lifetime\": 3600, \"expireTime\": 1475850428597, \"wallTime\": [ 1, 3, 6 ] } We'll need to use this userId in the URI path of all subsequent requests. And this jwt expires in 3600 seconds(1 hour), so the API consumer will need to request for another JWT otherwise the API query to other API endpoints will be denied. And this JWT will need to be sent via the HTTP Authorization header as well, but using the Bearer schema. The wallTime field is designed for users who want to specify the the maximum CPU time when Slurm handles the jobs on PSC. Normally, a job is expected to finish before the specified maximum walltime. After the walltime reaches the maximum, the job terminates regardless whether the job processes are still running or not. In this example, you can pick 1 hour, 3 or 6 hours as the wallTime. Note: querying the JWT endpoint again before the current JWT expires will generate a new JWT, which makes the old JWT expired automatically. And this newly generated JWT will be valid in another hour unless there's another new JWT being queried. Since this API is developed with Jersey, which supports WADL . So you can view the generated WADL by going to https:///ccd-api/application.wadl?detail=true and see all resource available in the application. Accessing to this endpoint doesn't require authentication. Basically, all the API usage examples are grouped into three categories: Data Management Causal Discovery Result Management And all the following examples will be issued by user 22 whose password is 123 .","title":"Getting JSON Web Token(JWT)"},{"location":"causal-rest-api/#1-data-management","text":"","title":"1. Data Management"},{"location":"causal-rest-api/#upload-small-data-file","text":"At this point, you can upload two types of data files: tabular dataset file(either tab delimited or comma delimited) and prior knowledge file. API Endpoint URI pattern: POST https:///ccd-api/{userId}/dataset/upload This is a multipart file upload via an HTML form, and the client is required to use name=\"file\" to name their file upload field in their form. Generated HTTP request code example: POST /ccd-api/22/dataset/upload HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW ----WebKitFormBoundary7MA4YWxkTrZu0gW Content-Disposition: form-data; name=\"file\"; filename=\"\" Content-Type: ----WebKitFormBoundary7MA4YWxkTrZu0gW If the Authorization header is not provided, the response will look like this: { \"timestamp\": 1465414501443, \"status\": 401, \"error\": \"Unauthorized\", \"message\": \"User credentials are required.\", \"path\": \"/22/dataset/upload\" } This POST request will upload the dataset file to the target server location and add corresponding records into database. And the response will contain the following pieces: { \"id\": 6, \"name\": \"Lung-tetrad_hv.txt\", \"creationTime\": 1466622267000, \"lastModifiedTime\": 1466622267000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\", \"fileSummary\": { \"variableType\": null, \"fileDelimiter\": null, \"numOfRows\": null, \"numOfColumns\": null } } The prior knowledge file upload uses a similar API endpoint: POST https:///ccd-api/{userId}/priorknowledge/upload Due to there's no need to summarize a prior knowledge file, the response of a successful prior knowledge file upload will look like: { \"id\": 6, \"name\": \"Lung-tetrad_hv.txt\", \"creationTime\": 1466622267000, \"lastModifiedTime\": 1466622267000, \"fileSize\": 3309465, \"md5checkSum\": \"ugdb7511rt293d29ke3055d9a7b46c9k\" }","title":"Upload small data file"},{"location":"causal-rest-api/#resumable-data-file-upload","text":"In addition to the regular file upload described in Example 6, we also provide the option of stable and resumable large file upload. It requires the client side to have a resumable upload implementation. We currently support client integrated with Resumable.js , whihc provides multiple simultaneous, stable and resumable uploads via the HTML5 File API. You can also create your own client as long as al the following parameters are set correctly. API Endpoint URI pattern: GET https:///ccd-api/{userId}/chunkupload POST https:///ccd-api/{userId}/chunkupload In this example, the data file is splited into 3 chunks. The upload of each chunk consists of a GET request and a POST request. To handle the state of upload chunks, a number of extra parameters are sent along with all requests: resumableChunkNumber : The index of the chunk in the current upload. First chunk is 1 (no base-0 counting here). resumableChunkSize : The general chunk size. Using this value and resumableTotalSize you can calculate the total number of chunks. Please note that the size of the data received in the HTTP might be lower than resumableChunkSize of this for the last chunk for a file. resumableCurrentChunkSize : The size of the current resumable chuck. resumableTotalSize : The total file size. resumableType : The file type of the resumable chuck, e.e., \"text/plain\". resumableIdentifier : A unique identifier for the file contained in the request. resumableFilename : The original file name (since a bug in Firefox results in the file name not being transmitted in chunk multipart posts). resumableRelativePath : The file's relative path when selecting a directory (defaults to file name in all browsers except Chrome). resumableTotalChunks : The total number of chunks. Generated HTTP request code example: GET /ccd-api/22/chunkupload?resumableChunkNumber=2&resumableChunkSize=1048576&resumableCurrentChunkSize=1048576&resumableTotalSize=3309465&resumableType=text%2Fplain&resumableIdentifier=3309465-large-datatxt&resumableFilename=large-data.txt&resumableRelativePath=large-data.txt&resumableTotalChunks=3 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY This GET request checks if the data chunk is already on the server side. If the target file chunk is not found on the server, the client will issue a POST request to upload the actual data. Generated HTTP request code example: POST /ccd-api/22/chunkupload HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryMFjgApg56XGyeTnZ ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableChunkNumber\" 2 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableChunkSize\" 1048576 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableCurrentChunkSize\" 1048576 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableTotalSize\" 3309465 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableType\" text/plain ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableIdentifier\" 3309465-large-datatxt ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableFilename\" large-data.txt ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableRelativePath\" large-data.txt ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"resumableTotalChunks\" 3 ------WebKitFormBoundaryMFjgApg56XGyeTnZ Content-Disposition: form-data; name=\"file\"; filename=\"blob\" Content-Type: application/octet-stream ------WebKitFormBoundaryMFjgApg56XGyeTnZ-- Each chunk upload POST will get a 200 status code from response if everything works fine. And finally the md5checkSum string of the reassemabled file will be returned once the whole file has been uploaded successfully. In this example, the POST request that uploads the third chunk will response this: b1db7511ee293d297e3055d9a7b46c5e","title":"Resumable data file upload"},{"location":"causal-rest-api/#list-all-dataset-files-of-a-user","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/dataset Generated HTTP request code example: GET /ccd-api/22/dataset HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Accept: application/json A JSON formatted list of all the input dataset files that are associated with user 22 will be returned. [ { \"id\": 8, \"name\": \"data_small.txt\", \"creationTime\": 1467132449000, \"lastModifiedTime\": 1467132449000, \"fileSize\": 278428, \"md5checkSum\": \"ed5f27a2cf94fe3735a5d9ed9191c382\", \"fileSummary\": { \"variableType\": \"continuous\", \"fileDelimiter\": \"tab\", \"numOfRows\": 302, \"numOfColumns\": 123 } }, { \"id\": 10, \"name\": \"large-data.txt\", \"creationTime\": 1467134048000, \"lastModifiedTime\": 1467134048000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\", \"fileSummary\": { \"variableType\": null, \"fileDelimiter\": null, \"numOfRows\": null, \"numOfColumns\": null } }, { \"id\": 11, \"name\": \"Lung-tetrad_hv (copy).txt\", \"creationTime\": 1467140415000, \"lastModifiedTime\": 1467140415000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\", \"fileSummary\": { \"variableType\": \"continuous\", \"fileDelimiter\": \"tab\", \"numOfRows\": 302, \"numOfColumns\": 608 } } ] You can also specify the response format as XML in your request Generated HTTP request code example: GET /ccd-api/22/dataset HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Accept: application/xml And the response will look like this: 8 data_small.txt 2016-06-28T12:47:29-04:00 2016-06-28T12:47:29-04:00 278428 ed5f27a2cf94fe3735a5d9ed9191c382 tab 123 302 continuous 10 large-data.txt 2016-06-28T13:14:08-04:00 2016-06-28T13:14:08-04:00 3309465 b1db7511ee293d297e3055d9a7b46c5e 11 Lung-tetrad_hv (copy).txt 2016-06-28T15:00:15-04:00 2016-06-28T15:00:15-04:00 3309465 b1db7511ee293d297e3055d9a7b46c5e tab 608 302 continuous Form the above output, we can also tell that data file with ID 10 doesn't have all the fileSummary field values set, we'll cover this in the dataset summarization section.","title":"List all dataset files of a user"},{"location":"causal-rest-api/#get-the-detail-information-of-a-dataset-file-based-on-id","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/dataset/{id} Generated HTTP request code example: GET /ccd-api/22/dataset/8 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY And the resulting response looks like this: { \"id\": 8, \"name\": \"data_small.txt\", \"creationTime\": 1467132449000, \"lastModifiedTime\": 1467132449000, \"fileSize\": 278428, \"fileSummary\": { \"md5checkSum\": \"ed5f27a2cf94fe3735a5d9ed9191c382\", \"variableType\": \"continuous\", \"fileDelimiter\": \"tab\", \"numOfRows\": 302, \"numOfColumns\": 123 } }","title":"Get the detail information of a dataset file based on ID"},{"location":"causal-rest-api/#delete-physical-dataset-file-and-all-records-from-database-for-a-given-file-id","text":"API Endpoint URI pattern: DELETE https:///ccd-api/{userId}/dataset/{id} Generated HTTP request code example: DELETE /ccd-api/22/dataset/8 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY And this will result a HTTP 204 No Content status in response on success, which means the server successfully processed the deletion request but there's no content to response.","title":"Delete physical dataset file and all records from database for a given file ID"},{"location":"causal-rest-api/#summarize-dataset-file","text":"So from the first example we can tell that file with ID 10 doesn't have variableType , fileDelimiter , numOfRows , and numOfColumns specified under fileSummary . Among these attributes, variableType and fileDelimiter` are the ones that users will need to provide during this summarization process. Before we can go ahead to run the desired algorithm with the newly uploaded data file, we'll need to summarize the data by specifing the variable type and file delimiter. Required Fields Description id The data file ID variableType discrete or continuous fileDelimiter tab or comma API Endpoint URI pattern: POST https:///ccd-api/{userId}/dataset/summarize Generated HTTP request code example: POST /ccd-api/22/dataset/summarize HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: application/json { \"id\": 1, \"variableType\": \"continuous\", \"fileDelimiter\": \"comma\" } This POST request will summarize the dataset file and generate a response (JSON or XML) like below: { \"id\": 10, \"name\": \"large-data.txt\", \"creationTime\": 1467134048000, \"lastModifiedTime\": 1467134048000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\", \"fileSummary\": { \"variableType\": \"continuous\", \"fileDelimiter\": \"tab\", \"numOfRows\": 302, \"numOfColumns\": 608 } }","title":"Summarize dataset file"},{"location":"causal-rest-api/#list-all-prior-knowledge-files-of-a-given-user","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/priorknowledge Generated HTTP request code example: GET /ccd-api/22/priorknowledge HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Accept: application/json A JSON formatted list of all the input dataset files that are associated with user 22 will be returned. [ { \"id\": 9, \"name\": \"data_small.prior\", \"creationTime\": 1467132449000, \"lastModifiedTime\": 1467132449000, \"fileSize\": 278428, \"md5checkSum\": \"ed5f27a2cf94fe3735a5d9ed9191c382\" }, { \"id\": 12, \"name\": \"large-data.prior\", \"creationTime\": 1467134048000, \"lastModifiedTime\": 1467134048000, \"fileSize\": 3309465, \"md5checkSum\": \"b1db7511ee293d297e3055d9a7b46c5e\" } ]","title":"List all prior knowledge files of a given user"},{"location":"causal-rest-api/#get-the-detail-information-of-a-prior-knowledge-file-based-on-id","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/priorknowledge/{id} Generated HTTP request code example: GET /ccd-api/22/priorknowledge/9 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY And the resulting response looks like this: { \"id\": 9, \"name\": \"data_small.prior\", \"creationTime\": 1467132449000, \"lastModifiedTime\": 1467132449000, \"fileSize\": 278428, \"md5checkSum\": \"ed5f27a2cf94fe3735a5d9ed9191c382\" }","title":"Get the detail information of a prior knowledge file based on ID"},{"location":"causal-rest-api/#delete-physical-prior-knowledge-file-and-all-records-from-database-for-a-given-file-id","text":"API Endpoint URI pattern: DELETE https:///ccd-api/{userId}/priorknowledge/{id} Generated HTTP request code example: DELETE /ccd-api/22/priorknowledge/9 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY And this will result a HTTP 204 No Content status in response on success, which means the server successfully processed the deletion request but there's no content to response.","title":"Delete physical prior knowledge file and all records from database for a given file ID"},{"location":"causal-rest-api/#2-causal-discovery","text":"Once the data file is uploaded and summaried, you can start running a Causal Discovery Algorithm on the uploaded data file.","title":"2. Causal Discovery"},{"location":"causal-rest-api/#list-all-the-available-causal-discovery-algorithms","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/algorithms Generated HTTP request code example: GET /ccd-api/22/algorithms HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY [ { \"id\": 1, \"name\": \"FGESc\", \"description\": \"FGES continuous\" }, { \"id\": 2, \"name\": \"FGESd\", \"description\": \"FGES discrete\" }, { \"id\": 3, \"name\": \"GFCIc\", \"description\": \"GFCI continuous\" }, { \"id\": 4, \"name\": \"GFCId\", \"description\": \"GFCI discrete\" } ] Currently we support \"FGES continuous\", \"FGES discrete\", \"GFCI continuous\", and \"GFCI discrete\". They also share a common JSON structure as of their input, for example: Input JSON Fields Description datasetFileId The dataset file ID, integer priorKnowledgeFileId The optional prior knowledge file ID, integer dataValidation Algorithm specific input data validation flags, JSON object algorithmParameters Algorithm specific parameters, JSON object jvmOptions Advanced Options For Java Virtual Machine (JVM), JSON object. Currently only support maxHeapSize (Gigabyte, max value is 100) hpcParameters Parameters for High-Performance Computing, JSON array of key-value objects. Currently only support wallTime Below are the data validation flags and parameters that you can use for each algorithm. FGES continuous Data validation: Parameters Description Default Value skipNonzeroVariance Skip check for zero variance variables false skipUniqueVarName Skip check for unique variable names false Algorithm parameters: Parameters Description Default Value faithfulnessAssumed Yes if (one edge) faithfulness should be assumed true maxDegree The maximum degree of the output graph 100 penaltyDiscount Penalty discount 4.0 verbose Print additional information true FGES discrete Data validation: Parameters Description Default Value skipUniqueVarName Skip check for unique variable names false skipCategoryLimit Skip 'limit number of categories' check false Algorithm parameters: Parameters Description Default Value structurePrior Structure prior coefficient 1.0 samplePrior Sample prior 1.0 maxDegree The maximum degree of the output graph 100 faithfulnessAssumed Yes if (one edge) faithfulness should be assumed true verbose Print additional information true GFCI continuous Data validation: Parameters Description Default Value skipNonzeroVariance Skip check for zero variance variables false skipUniqueVarName Skip check for unique variable names false Algorithm parameters: Parameters Description Default Value alpha Cutoff for p values (alpha) 0.01 penaltyDiscount Penalty discount 4.0 maxDegree The maximum degree of the output graph 100 faithfulnessAssumed Yes if (one edge) faithfulness should be assumed false verbose Print additional information true GFCI discrete Data validation: Parameters Description Default Value skipUniqueVarName Skip check for unique variable names false skipCategoryLimit Skip 'limit number of categories' check false Algorithm parameters: Parameters Description Default Value alpha Cutoff for p values (alpha) 0.01 structurePrior Structure prior coefficient 1.0 samplePrior Sample prior 1.0 maxDegree The maximum degree of the output graph 100 faithfulnessAssumed Yes if (one edge) faithfulness should be assumed false verbose Print additional information true","title":"List all the available causal discovery algorithms"},{"location":"causal-rest-api/#add-a-new-job-to-run-the-desired-algorithm-on-a-given-data-file","text":"This is a POST request and the algorithm details and data file id will need to be specified in the POST body as a JSON when you make the request. API Endpoint URI pattern: POST https:///ccd-api/{userId}/jobs/FGESc Generated HTTP request code example: POST /ccd-api/22/jobs/FGESc HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: application/json { \"datasetFileId\": 8, \"priorKnowledgeFileId\": 9, \"dataValidation\": { \"skipNonzeroVariance\": true, \"skipUniqueVarName\": true }, \"algorithmParameters\": { \"penaltyDiscount\": 5.0, \"maxDegree\": 100 }, \"jvmOptions\": { \"maxHeapSize\": 100 }, \"hpcParameters\": [ { \"key\":\"wallTime\", \"value\":1 } ] } In this example, we are running the \"FGES continuous\" algorithm on the file of ID 8. We also set the wallTime as 1 hour. And this call will return the job info with a 201 Created response status code. { \"id\": 5, \"algorithmName\": \"FGESc\", \"status\": 0, \"addedTime\": 1472742564355, \"resultFileName\": \"FGESc_data_small.txt_1472742564353.txt\", \"errorResultFileName\": \"error_FGESc_data_small.txt_1472742564353.txt\" } From this response we can tell that the job ID is 5, and the result file name will be FGESc_data_small.txt_1472742564353.txt if everything goes well. If something is wrong an error result file with name error_FGEsc_data_small.txt_1472742564353.txt will be created. When you need to run \"FGES discrete\", just send the request to a different endpont URI: API Endpoint URI pattern: POST https:///ccd-api/{userId}/jobs/FGESd Generated HTTP request code example: POST /ccd-api/22/jobs/FGESd HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: application/json { \"datasetFileId\": 10, \"priorKnowledgeFileId\": 12, \"dataValidation\": { \"skipUniqueVarName\": true, \"skipCategoryLimit\": true }, \"algorithmParameters\": { \"structurePrior\": 1.0, \"samplePrior\": 1.0, \"maxDegree\": 102 }, \"jvmOptions\": { \"maxHeapSize\": 100 } }","title":"Add a new job to run the desired algorithm on a given data file"},{"location":"causal-rest-api/#list-all-running-jobs","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/jobs Generated HTTP request code example: GET /ccd-api/22/jobs/ HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Content-Type: application/json Then you'll see the information of all jobs that are currently running: [ { \"id\": 32, \"algorithmName\": \"FGESc\", \"addedTime\": 1468436085000 }, { \"id\": 33, \"algorithmName\": \"FGESd\", \"addedTime\": 1468436087000 } ]","title":"List all running jobs"},{"location":"causal-rest-api/#check-the-job-status-for-a-given-job-id","text":"Once the new job is submitted, it takes time and resources to run the algorithm on the server. During the waiting, you can check the status of a given job ID: API Endpoint URI pattern: GET https:///ccd-api/{userId}/jobs/{id} Generated HTTP request code example: GET /ccd-api/22/jobs/32 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY This will either return \"Pending\" or \"Completed\".","title":"Check the job status for a given job ID"},{"location":"causal-rest-api/#cancel-a-running-job","text":"Sometimes you may want to cancel a submitted job. API Endpoint URI pattern: DELETE https:///ccd-api/{userId}/jobs/{id} Generated HTTP request code example: DELETE /ccd-api/22/jobs/8 HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY This call will response either \"Job 8 has been canceled\" or \"Unable to cancel job 8\". It's not guranteed that the system can always cencal a job successfully.","title":"Cancel a running job"},{"location":"causal-rest-api/#3-result-management","text":"","title":"3. Result Management"},{"location":"causal-rest-api/#list-all-result-files-generated-by-the-algorithm","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/results Generated HTTP request code example: GET /ccd-api/22/results HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY The response to this request will look like this: [ { \"name\": \"FGESc_sim_data_20vars_100cases.csv_1466171729046.txt\", \"creationTime\": 1466171732000, \"lastModifiedTime\": 1466171732000, \"fileSize\": 1660 }, { \"name\": \"FGESc_data_small.txt_1466172140585.txt\", \"creationTime\": 1466172145000, \"lastModifiedTime\": 1466172145000, \"fileSize\": 39559 } ]","title":"List all result files generated by the algorithm"},{"location":"causal-rest-api/#download-a-specific-result-file-generated-by-the-algorithm-based-on-file-name","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/results/{result_file_name} Generated HTTP request code example: GET /ccd-api/22/results/FGESc_data_small.txt_1466172140585.txt HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY On success, you will get the result file back as text file content. If there's a typo in file name of the that file doesn't exist, you'll get either a JSON or XML message based on the accept header in your request: The response to this request will look like this: { \"timestamp\": 1467210996233, \"status\": 404, \"error\": \"Not Found\", \"message\": \"Resource not found.\", \"path\": \"/22/results/FGESc_data_small.txt_146172140585.txt\" }","title":"Download a specific result file generated by the algorithm based on file name"},{"location":"causal-rest-api/#compare-algorithm-result-files","text":"Since we can list all the algorithm result files, based on the results, we can also choose multiple files and run a comparison. API Endpoint URI pattern: POST https:///ccd-api/{userId}/results/compare The request body is a JSON that contains an array of result files to be compared. Generated HTTP request code example: POST /ccd-api/22/results/compare HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY { \"resultFiles\": [ \"FGESc_sim_data_20vars_100cases.csv_1466171729046.txt\", \"FGESc_data_small.txt_1467305104859.txt\" ] } When you specify multiple file names, use the !! as a delimiter. This request will generate a result comparison file with the following content (shortened version): FGESc_sim_data_20vars_100cases.csv_1466171729046.txt FGESc_data_small.txt_1467305104859.txt Edges In All Same End Point NR4A2,FOS 0 0 X5,X17 0 0 MMP11,ASB5 0 0 X12,X8 0 0 hsa_miR_654_3p,hsa_miR_337_3p 0 0 RND1,FGA 0 0 HHLA2,UBXN10 0 0 HS6ST2,RND1 0 0 SCRG1,hsa_miR_377 0 0 CDH3,diag 0 0 SERPINI2,FGG 0 0 hsa_miR_451,hsa_miR_136_ 0 0 From this comparison, you can see if the two algorithm graphs have common edges and endpoints.","title":"Compare algorithm result files"},{"location":"causal-rest-api/#list-all-the-comparison-files","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/results/comparisons Generated HTTP request code example: GET /ccd-api/22/results/comparisons HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY The response will show a list of comparison files: [ { \"name\": \"result_comparison_1467385923407.txt\", \"creationTime\": 1467385923000, \"lastModifiedTime\": 1467385923000, \"fileSize\": 7505 }, { \"name\": \"result_comparison_1467387034358.txt\", \"creationTime\": 1467387034000, \"lastModifiedTime\": 1467387034000, \"fileSize\": 7505 }, { \"name\": \"result_comparison_1467388042261.txt\", \"creationTime\": 1467388042000, \"lastModifiedTime\": 1467388042000, \"fileSize\": 7533 } ]","title":"List all the comparison files"},{"location":"causal-rest-api/#download-a-specific-comparison-file-based-on-file-name","text":"API Endpoint URI pattern: GET https:///ccd-api/{userId}/results/comparisons/{comparison_file_name} Generated HTTP request code example: GET /ccd-api/22/results/comparisons/result_comparison_1467388042261.txt HTTP/1.1 Host: Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY Then it returns the content of that comparison file (shorted version): FGESc_sim_data_20vars_100cases.csv_1466171729046.txt FGESc_data_small.txt_1467305104859.txt Edges In All Same End Point NR4A2,FOS 0 0 X5,X17 0 0 MMP11,ASB5 0 0 X12,X8 0 0 hsa_miR_654_3p,hsa_miR_337_3p 0 0 RND1,FGA 0 0 HHLA2,UBXN10 0 0 HS6ST2,RND1 0 0 SCRG1,hsa_miR_377 0 0 CDH3,diag 0 0 SERPINI2,FGG 0 0","title":"Download a specific comparison file based on file name"},{"location":"causal-web/","text":"Causal Web Application Quick Start and User Guide Causal web is a Java web-based application that allows users to run causal modeling algorithms on their dataset. Creating Your Account You can create a new account by clicking \"Create an account\" link on the login page. Fill your information in the signup page. Make sure to read the Terms & Conditions agreement and check the agree box before clicking the signup button. Upon finishing registration, the system will send out an email with an activation link. Go to your email accont and click on that link, then the Causal Web shows a confirmation message. Login to Causal Web Application Input your email address and password that you want to use to register with the Causal Web system. Check the \"Remember Me\" checkbox if you would like the browser automatically log you in next time you visit. Here we go! You are now in the Causal Web application. Uploading Your Dataset Click on the Data Management link on the navigation bar on the left side. There is a sub menu that will appear. Click on the Import Data link. You can EITHER drag & drop dataset file(s) into the dash-surrounded canvas OR you can click on the Browse button and choose the dataset file(s) you would like to upload to the Causal Web application. For testing purposes download this dataset: Retention.txt and upload it. The Import Data panel shows the dataset upload progress as a percentage along with MD5 checksums (confirms that an uploaded file's contents are unchanged after upload) for each of uploaded files. You can also pause the upload of files and resume later. In the case of a disrupted connection, you can resume the upload by repeating the previous steps. The Causal Web application will detect the unfinished upload and resume from the latest point of the last attempt. Once all your dataset file(s) are all uploaded, the progress bar will show the (completed) sign. Summarizing Your Dataset Before any analysis can proceed, the datasets need to be summarized. Specifically, you must indicate the delimiter used in the data file (tab vs. comma), and the types of variables found in the file. Once this is done, the Causal Web application will determine the number of columns (features) and rows (records) in the dataset. Click on the Data Management menu on the navigation bar on the left side. The sub menu will slowly appear. Click on the Datasets menu. The dataset page shows a list of datasets and their attributes. On the second Summarized column from the right, the yellow warning buttons indicate that the system has not yet summarized. Click on the dataset's name's link to see the dataset information. From this stage, the data summary information is missing: the dataset needs to be summarized before conducting causal analysis. From the dataset page, click on the yellow warning button to summarize a dataset. The data summarization page shows information of the dataset, its basic information, and additional information that will be determined after summarization (a number of rows and columns). The bottom panel has two radio boxes for you to choose variable type (continuous, discrete, or mixed), and delimiter (tab or comma). The Retention.txt dataset described above is tab-delimited and contains continous variables. Once the dataset is summarized, the dataset page changes its sign to be a green button. Click to see the additional information of this summarized dataset. Click on the dataset's name's link to see the additional information. Annotating Your Dataset On the Datasets main page, the blue icon is for viewing and entering annotations. Click the annotation icon, and you can add new annotation, just click the \"New annotation\" button. The application will pop up the annotation form. You can also add another annotation on top of the exisiting annotation. Uploading the Prior Knowledge Click on the Data Management menu on the navigation bar on the left side. There is a sub menu that will appear. Click on the Import Data menu. You can EITHER drag & drop prior knowledge file(s) into the dash-surrounded canvas OR you can click on the Browse button and choose the prior knowledge file(s) you would like to upload to the CCD Web application. Note that the prior knowledge file needs to have .prior file extension. Executing an Analysis on Your Dataset Click on the Causal Discovery menu on the navigation bar on the left side. The sub menu will slowly appear. FGES and GFCI are the currently supported algorithms. FGES algorithm can handle Continuous, Discrete, and Mixed data files. GFCI algorithm can handle Continuous, Discrete, and Mixed data files as well. The Dataset drop-down box contains a list of datasets that you have uploaded. If those datasets are already uploaded and they are not displayed in the dataset drop-down box, it means that the Data Summarization process to be reviewed in the first place prior to execute a causal FGES (Continuous) analysis. If a prior knowledge file needs to be included in the analysis, Prior Knowledge File drop-down box contains a list of knowledge files. Before clicking the Next button, the data validation parameters need to be input. Here, the FGES Continuous algorithm page allows user to modify its parameters. The first one is Penalty Discount and its default value is 2. The second one is Search Maximum Degree and its default value is 100. The third one is Faithfulness Assumed and its default value is checked. The fifth one is Verbose output and its default value is checked. Click Next to proceed or click Advanced Options (JVM) for the JVM customization. Expert Mode : the JVM parameters allow users to customize JVM parameters such how much maximum memory (in Gigabyte scale) the process would allocate (e.g. 4). This is the summary page before the FGES job analysis is put into the queue. Click on the number 1 (Select Dataset) or number 2 (Set Parameters) to go back to modify the parameters. Once, everything is set. Click the Run Algorithm! button. The application will redirect to the Job Queue page. The analysis job is added to the queue. The Queued status means that it waits for the scheduler to run it once the executing slot is available. However, the Job Queue page does not currently automatically update the jobs' status (at least in this development stage). Refresh the Job Queue page from time to time to see the latest jobs' status. Once the job slot is available, the queued job is then executed and its status changes to Running . When the job is finished, it is automatically removed from the Job Queue page. The result of the analysis is added to the Results page. In case the queued or running job needs to be killed or removed, click the Remove button on the first column on the Job Queue page from the right. The Remove Job confirmation page is popped up. Click Yes to kill the job or No to cancel the kill operation. After the job cancellation is confirmed, the job's status changes to Kill Request . The scheduler will take care of removing of the job from the queue or killing a job in the server. If the running job was killed or any error happened during the process, the error result will appear in the Results page. Its background is highlighted in red. If there is an error, you will see the details of the error by clicking on error result link. Reviewing Your Results Click on the Results menu on the navigation bar on the left side. Click on the Algorithm Results menu. The Algorithm Results page shows a list of results, their creation time and their size. In the first column from the right, the green Save buttons provide the ability for users to download results to their local computers. Click on the result's name's link to see a causal graph of the result. Check the result files on their checkboxes to compare the results . Note : a number of comparing datasets can be more than two files. The results page details the graph, the original dataset, and its parameters. Click on the View Full Screen button to see the causal graph in more detail. Based on the nature of your data, sometimes you may see the generated graph (PAG) containing dashed links in addition to solid links. For example: If an edge is dashed that means there is no latent confounder. Otherwise, there is possibly a latent confounder. If an edge is green that means it is definitely direct. Otherwise, it is possibly direct. Comparing Your Results Click on the Results menu on the navigation bar on the left side. To compare two results click on the Algorithm Results item on the left. Select at least two results (place a checkmark next to the results) and click on Compare. Now click on the Result Comparisions item on the left. The Result Comparisons page shows a list of results, their creation time and their size. On the first column from the right, the green Save buttons provide the ability for users to download results to their local computers. Click on the result's name's the link to see the detail of the result comparisons. The Result Comparisons page shows the datasets compared, and the table of edges, their mutual appearance in all comparing datasets, and their mutual endpoint types. Downloading Your Result And Comparision Result On the first column from the right of the Algorithm Results page, the green Save buttons provide the ability for users to download results to their local computers. On the first column from the right of the Result Comparisions page, the green Save buttons provide the ability for users to download result comparisons to their local computers. Submit Your Feedback Click the Feedback menu on the navigation menu bar on the left. The Feedback page shows the email (optional), and the text area for the user feedback (required). Once, the feedback is filled, click the Send Feedback button. The green Thank you for you feedback! banner shows that the feedback submitted successfully.","title":"Causal Web"},{"location":"causal-web/#causal-web-application-quick-start-and-user-guide","text":"Causal web is a Java web-based application that allows users to run causal modeling algorithms on their dataset.","title":"Causal Web Application Quick Start and User Guide"},{"location":"causal-web/#creating-your-account","text":"You can create a new account by clicking \"Create an account\" link on the login page. Fill your information in the signup page. Make sure to read the Terms & Conditions agreement and check the agree box before clicking the signup button. Upon finishing registration, the system will send out an email with an activation link. Go to your email accont and click on that link, then the Causal Web shows a confirmation message.","title":"Creating Your Account"},{"location":"causal-web/#login-to-causal-web-application","text":"Input your email address and password that you want to use to register with the Causal Web system. Check the \"Remember Me\" checkbox if you would like the browser automatically log you in next time you visit. Here we go! You are now in the Causal Web application.","title":"Login to Causal Web Application"},{"location":"causal-web/#uploading-your-dataset","text":"Click on the Data Management link on the navigation bar on the left side. There is a sub menu that will appear. Click on the Import Data link. You can EITHER drag & drop dataset file(s) into the dash-surrounded canvas OR you can click on the Browse button and choose the dataset file(s) you would like to upload to the Causal Web application. For testing purposes download this dataset: Retention.txt and upload it. The Import Data panel shows the dataset upload progress as a percentage along with MD5 checksums (confirms that an uploaded file's contents are unchanged after upload) for each of uploaded files. You can also pause the upload of files and resume later. In the case of a disrupted connection, you can resume the upload by repeating the previous steps. The Causal Web application will detect the unfinished upload and resume from the latest point of the last attempt. Once all your dataset file(s) are all uploaded, the progress bar will show the (completed) sign.","title":"Uploading Your Dataset"},{"location":"causal-web/#summarizing-your-dataset","text":"Before any analysis can proceed, the datasets need to be summarized. Specifically, you must indicate the delimiter used in the data file (tab vs. comma), and the types of variables found in the file. Once this is done, the Causal Web application will determine the number of columns (features) and rows (records) in the dataset. Click on the Data Management menu on the navigation bar on the left side. The sub menu will slowly appear. Click on the Datasets menu. The dataset page shows a list of datasets and their attributes. On the second Summarized column from the right, the yellow warning buttons indicate that the system has not yet summarized. Click on the dataset's name's link to see the dataset information. From this stage, the data summary information is missing: the dataset needs to be summarized before conducting causal analysis. From the dataset page, click on the yellow warning button to summarize a dataset. The data summarization page shows information of the dataset, its basic information, and additional information that will be determined after summarization (a number of rows and columns). The bottom panel has two radio boxes for you to choose variable type (continuous, discrete, or mixed), and delimiter (tab or comma). The Retention.txt dataset described above is tab-delimited and contains continous variables. Once the dataset is summarized, the dataset page changes its sign to be a green button. Click to see the additional information of this summarized dataset. Click on the dataset's name's link to see the additional information.","title":"Summarizing Your Dataset"},{"location":"causal-web/#annotating-your-dataset","text":"On the Datasets main page, the blue icon is for viewing and entering annotations. Click the annotation icon, and you can add new annotation, just click the \"New annotation\" button. The application will pop up the annotation form. You can also add another annotation on top of the exisiting annotation.","title":"Annotating Your Dataset"},{"location":"causal-web/#uploading-the-prior-knowledge","text":"Click on the Data Management menu on the navigation bar on the left side. There is a sub menu that will appear. Click on the Import Data menu. You can EITHER drag & drop prior knowledge file(s) into the dash-surrounded canvas OR you can click on the Browse button and choose the prior knowledge file(s) you would like to upload to the CCD Web application. Note that the prior knowledge file needs to have .prior file extension.","title":"Uploading the Prior Knowledge"},{"location":"causal-web/#executing-an-analysis-on-your-dataset","text":"Click on the Causal Discovery menu on the navigation bar on the left side. The sub menu will slowly appear. FGES and GFCI are the currently supported algorithms. FGES algorithm can handle Continuous, Discrete, and Mixed data files. GFCI algorithm can handle Continuous, Discrete, and Mixed data files as well. The Dataset drop-down box contains a list of datasets that you have uploaded. If those datasets are already uploaded and they are not displayed in the dataset drop-down box, it means that the Data Summarization process to be reviewed in the first place prior to execute a causal FGES (Continuous) analysis. If a prior knowledge file needs to be included in the analysis, Prior Knowledge File drop-down box contains a list of knowledge files. Before clicking the Next button, the data validation parameters need to be input. Here, the FGES Continuous algorithm page allows user to modify its parameters. The first one is Penalty Discount and its default value is 2. The second one is Search Maximum Degree and its default value is 100. The third one is Faithfulness Assumed and its default value is checked. The fifth one is Verbose output and its default value is checked. Click Next to proceed or click Advanced Options (JVM) for the JVM customization. Expert Mode : the JVM parameters allow users to customize JVM parameters such how much maximum memory (in Gigabyte scale) the process would allocate (e.g. 4). This is the summary page before the FGES job analysis is put into the queue. Click on the number 1 (Select Dataset) or number 2 (Set Parameters) to go back to modify the parameters. Once, everything is set. Click the Run Algorithm! button. The application will redirect to the Job Queue page. The analysis job is added to the queue. The Queued status means that it waits for the scheduler to run it once the executing slot is available. However, the Job Queue page does not currently automatically update the jobs' status (at least in this development stage). Refresh the Job Queue page from time to time to see the latest jobs' status. Once the job slot is available, the queued job is then executed and its status changes to Running . When the job is finished, it is automatically removed from the Job Queue page. The result of the analysis is added to the Results page. In case the queued or running job needs to be killed or removed, click the Remove button on the first column on the Job Queue page from the right. The Remove Job confirmation page is popped up. Click Yes to kill the job or No to cancel the kill operation. After the job cancellation is confirmed, the job's status changes to Kill Request . The scheduler will take care of removing of the job from the queue or killing a job in the server. If the running job was killed or any error happened during the process, the error result will appear in the Results page. Its background is highlighted in red. If there is an error, you will see the details of the error by clicking on error result link.","title":"Executing an Analysis on Your Dataset"},{"location":"causal-web/#reviewing-your-results","text":"Click on the Results menu on the navigation bar on the left side. Click on the Algorithm Results menu. The Algorithm Results page shows a list of results, their creation time and their size. In the first column from the right, the green Save buttons provide the ability for users to download results to their local computers. Click on the result's name's link to see a causal graph of the result. Check the result files on their checkboxes to compare the results . Note : a number of comparing datasets can be more than two files. The results page details the graph, the original dataset, and its parameters. Click on the View Full Screen button to see the causal graph in more detail. Based on the nature of your data, sometimes you may see the generated graph (PAG) containing dashed links in addition to solid links. For example: If an edge is dashed that means there is no latent confounder. Otherwise, there is possibly a latent confounder. If an edge is green that means it is definitely direct. Otherwise, it is possibly direct.","title":"Reviewing Your Results"},{"location":"causal-web/#comparing-your-results","text":"Click on the Results menu on the navigation bar on the left side. To compare two results click on the Algorithm Results item on the left. Select at least two results (place a checkmark next to the results) and click on Compare. Now click on the Result Comparisions item on the left. The Result Comparisons page shows a list of results, their creation time and their size. On the first column from the right, the green Save buttons provide the ability for users to download results to their local computers. Click on the result's name's the link to see the detail of the result comparisons. The Result Comparisons page shows the datasets compared, and the table of edges, their mutual appearance in all comparing datasets, and their mutual endpoint types.","title":"Comparing Your Results"},{"location":"causal-web/#downloading-your-result-and-comparision-result","text":"On the first column from the right of the Algorithm Results page, the green Save buttons provide the ability for users to download results to their local computers. On the first column from the right of the Result Comparisions page, the green Save buttons provide the ability for users to download result comparisons to their local computers.","title":"Downloading Your Result And Comparision Result"},{"location":"causal-web/#submit-your-feedback","text":"Click the Feedback menu on the navigation menu bar on the left. The Feedback page shows the email (optional), and the text area for the user feedback (required). Once, the feedback is filled, click the Send Feedback button. The green Thank you for you feedback! banner shows that the feedback submitted successfully.","title":"Submit Your Feedback"},{"location":"ccd-annotations-cytoscape/","text":"ccd-annotations-cytoscape Installation The source code for the plugin is available from the project site: (https://github.com/bd2kccd/ccd-annotations-cytoscape) To install the plugin compile the source code or download a release then start the Cytoscape application and click on apps->install from file and select the jar file. Using the Plugin Create new annotations Search for existing annotations Set auto label placement","title":"CCD Annotations Cytoscape Plugin"},{"location":"ccd-annotations-cytoscape/#ccd-annotations-cytoscape","text":"","title":"ccd-annotations-cytoscape"},{"location":"ccd-annotations-cytoscape/#installation","text":"The source code for the plugin is available from the project site: (https://github.com/bd2kccd/ccd-annotations-cytoscape) To install the plugin compile the source code or download a release then start the Cytoscape application and click on apps->install from file and select the jar file.","title":"Installation"},{"location":"ccd-annotations-cytoscape/#using-the-plugin","text":"Create new annotations Search for existing annotations Set auto label placement","title":"Using the Plugin"},{"location":"cytoscape-tetrad/","text":"cytoscape-tetrad Displaying Tetrad Networks in Cytoscape The Cytoscape application has significant power and flexibility to display networks. This webpage describes how to load a plugin into Cytoscape that will allow you to import and display a Tetrad network (graph) that you have saved from Tetrad. Cytoscape can be download for free from http://www.cytoscape.org/ Download the latest version of the plugin from github To install the plugin start the Cytoscape application and click on Apps --> App Manager --> Install Apps from the Cytoscape menu. Using the Plugin Put a graph box on the Tetrad workspace and select the graph type \u201cgraph\u201d. Double click on the graph box to display the graph in Tetrad. Within the graph display box, click on File --> Save JSON. In Cytoscape, select the File --> Import --> Network --> Tetrad option and select the file that you saved previously from Tetrad. Apply a layout in Cytoscape. By default Cytoscape doesn't apply a layout so the initial rendering will look like a single node. Apply a layout by selecting Layout in the top menu and then choosing a layout to see your graph (e.g., Layouts --> Prefuse Force Directed Layout).","title":"Cytoscape Tetrad Plugin"},{"location":"cytoscape-tetrad/#cytoscape-tetrad","text":"","title":"cytoscape-tetrad"},{"location":"cytoscape-tetrad/#displaying-tetrad-networks-in-cytoscape","text":"The Cytoscape application has significant power and flexibility to display networks. This webpage describes how to load a plugin into Cytoscape that will allow you to import and display a Tetrad network (graph) that you have saved from Tetrad. Cytoscape can be download for free from http://www.cytoscape.org/ Download the latest version of the plugin from github To install the plugin start the Cytoscape application and click on Apps --> App Manager --> Install Apps from the Cytoscape menu.","title":"Displaying Tetrad Networks in Cytoscape"},{"location":"cytoscape-tetrad/#using-the-plugin","text":"Put a graph box on the Tetrad workspace and select the graph type \u201cgraph\u201d. Double click on the graph box to display the graph in Tetrad. Within the graph display box, click on File --> Save JSON. In Cytoscape, select the File --> Import --> Network --> Tetrad option and select the file that you saved previously from Tetrad. Apply a layout in Cytoscape. By default Cytoscape doesn't apply a layout so the initial rendering will look like a single node. Apply a layout by selecting Layout in the top menu and then choosing a layout to see your graph (e.g., Layouts --> Prefuse Force Directed Layout).","title":"Using the Plugin"},{"location":"py-causal/","text":"py-causal Python APIs for causal modeling algorithms developed by the University of Pittsburgh/Carnegie Mellon University Center for Causal Discovery . Note: This project uses a very old version of Tetrad and a method of connecting Python to Java, Javabridge, that's proven sometimes buggy and hard to install on some platforms, and so we are no longer recommending it. Please consider using py-tetrad instead. Py-tetrad uses JPype to bridge Python and Java, which has already shown itself to be much easier to install and use cross-platform. Also, it allows one to use the most recent version of Tetrad, and it has been well-tested. This code is distributed under the LGPL 2.1 license. Requirements: Python 2.7 and 3.6 javabridge>=1.0.11 pandas numpy JDK 1.8 pydot (Optional) GraphViz (Optional) Docker Image A pre-installed py-causal Docker image is available at Docker Hub Installation overview: To install on existing Python installation, we have found two approaches to be useful: * Direct python installation with pip, possibly including use of Jupyter . This approach is likely best for users who have Python installed and are familiar with installing Python modules. * Installation via Anaconda Directions for both approaches are given below... Installation with pip If you do not have pip installed already, try these instructions . Once pip is installed, execute these commands pip install -U numpy pip install -U pandas pip install -U javabridge pip install -U pydot pip install -U GraphViz Note: you also need to install the GraphViz engine by following these instructions . We have observed that on some OS X installations, pydot may provide the following response Couldn't import dot_parser, loading of dot files will not be possible. If you see this, try the following pip uninstall pydot pip install pyparsing==1.5.7 pip install pydot Then, from within the py-causal directory, run the following command: python setup.py install or use the pip command: pip install git+git://github.com/bd2kccd/py-causal After running this command, enter a python shell and attempt the follwing import import pandas as pd import pydot from tetrad import search as s Finally, try to run the python example python py-causal-fges-continuous-example.py Be sure to run this from within the py-causal directory. This program will create a file named tetrad.svg, which should be viewable in any SVG capable program. If you see a causal graph, everything is working correctly. Running Jupyter/IPython We have found Jupyter notebooks to be helpful. (Those who have run IPython in the past should know that Jupyter is simply a new name for IPython). To add Jupyter to your completed python install, simply run pip -U jupyter jupyter notebook and then load one of the Jupyter notebooks found in this installation. Anaconda/Jupyter Installing Python with Anaconda and Jupyter may be easier for some users: Download and install Anaconda conda install python-javabridge For OS X, this default install does not seem to work well. try the following instead: conda install --channel https://conda.anaconda.org/david_baddeley python-javabridge Then run the following to configure anacoda conda install pandas conda install numpy conda install pydot conda install graphviz conda install -c https://conda.anaconda.org/chirayu pycausal jupyter notebook and then load one of the Jupyter notebooks.","title":"Py-causal"},{"location":"py-causal/#py-causal","text":"Python APIs for causal modeling algorithms developed by the University of Pittsburgh/Carnegie Mellon University Center for Causal Discovery . Note: This project uses a very old version of Tetrad and a method of connecting Python to Java, Javabridge, that's proven sometimes buggy and hard to install on some platforms, and so we are no longer recommending it. Please consider using py-tetrad instead. Py-tetrad uses JPype to bridge Python and Java, which has already shown itself to be much easier to install and use cross-platform. Also, it allows one to use the most recent version of Tetrad, and it has been well-tested. This code is distributed under the LGPL 2.1 license.","title":"py-causal"},{"location":"py-causal/#requirements","text":"Python 2.7 and 3.6 javabridge>=1.0.11 pandas numpy JDK 1.8 pydot (Optional) GraphViz (Optional)","title":"Requirements:"},{"location":"py-causal/#docker-image","text":"A pre-installed py-causal Docker image is available at Docker Hub","title":"Docker Image"},{"location":"py-causal/#installation-overview","text":"To install on existing Python installation, we have found two approaches to be useful: * Direct python installation with pip, possibly including use of Jupyter . This approach is likely best for users who have Python installed and are familiar with installing Python modules. * Installation via Anaconda Directions for both approaches are given below...","title":"Installation overview:"},{"location":"py-causal/#installation-with-pip","text":"If you do not have pip installed already, try these instructions . Once pip is installed, execute these commands pip install -U numpy pip install -U pandas pip install -U javabridge pip install -U pydot pip install -U GraphViz Note: you also need to install the GraphViz engine by following these instructions . We have observed that on some OS X installations, pydot may provide the following response Couldn't import dot_parser, loading of dot files will not be possible. If you see this, try the following pip uninstall pydot pip install pyparsing==1.5.7 pip install pydot Then, from within the py-causal directory, run the following command: python setup.py install or use the pip command: pip install git+git://github.com/bd2kccd/py-causal After running this command, enter a python shell and attempt the follwing import import pandas as pd import pydot from tetrad import search as s Finally, try to run the python example python py-causal-fges-continuous-example.py Be sure to run this from within the py-causal directory. This program will create a file named tetrad.svg, which should be viewable in any SVG capable program. If you see a causal graph, everything is working correctly.","title":"Installation with pip"},{"location":"py-causal/#running-jupyteripython","text":"We have found Jupyter notebooks to be helpful. (Those who have run IPython in the past should know that Jupyter is simply a new name for IPython). To add Jupyter to your completed python install, simply run pip -U jupyter jupyter notebook and then load one of the Jupyter notebooks found in this installation.","title":"Running Jupyter/IPython"},{"location":"py-causal/#anacondajupyter","text":"Installing Python with Anaconda and Jupyter may be easier for some users: Download and install Anaconda conda install python-javabridge For OS X, this default install does not seem to work well. try the following instead: conda install --channel https://conda.anaconda.org/david_baddeley python-javabridge Then run the following to configure anacoda conda install pandas conda install numpy conda install pydot conda install graphviz conda install -c https://conda.anaconda.org/chirayu pycausal jupyter notebook and then load one of the Jupyter notebooks.","title":"Anaconda/Jupyter"},{"location":"r-causal/","text":"r-causal R Wrapper for Tetrad Library Note 2023-03-06: This version of RCausal uses an older version of Tetrad from at least 5 years ago. However, we have updated our Python integration to a much better version--see https://github.com/cmu-phil/py-tetrad . Updating our R integration is one of the next projects we will take up. News 2023-04-05: We have put forward a proposal to replace the r-causal functionality using the py-tetrad functionality, here: https://github.com/cmu-phil/py-tetrad/tree/main/pytetrad/R . The installation procedure for this is still somewhat complicated, and we will try to simplify it. If you try it and and have difficulties, please let us know. Once you have it installed, it is very easy and intuitive to use. By the way, rcausal has not been maintained for some time now, as the tireless maintainer has since moved on to different work :-)... but going back through some of the issues posted for r-causal gives some hints as to additional functionality that pytetrad/R should have. We'll try to get caught up. R Library Requirement R >= 3.3.0, stringr , rJava , Docker As an alternative to installing the library and getting rJava working with your installation (i.e., does not work well on mac) we have a Docker image Installation Install the R library requirements: install.packages(\"stringr\") install.packages(\"rJava\") Install r-causal from github: library(devtools) install_github(\"bd2kccd/r-causal\") Example Continuous Dataset library(rcausal) data(\"charity\") #Load the charity dataset tetradrunner.getAlgorithmDescription(algoId = 'fges') tetradrunner.getAlgorithmParameters(algoId = 'fges',scoreId = 'fisher-z') #Compute FGES search tetradrunner <- tetradrunner(algoId = 'fges',df = charity,scoreId = 'fisher-z', dataType = 'continuous',alpha=0.1,faithfulnessAssumed=TRUE,maxDegree=-1,verbose=TRUE) tetradrunner$nodes #Show the result's nodes tetradrunner$edges #Show the result's edges Discrete Dataset library(rcausal) data(\"audiology\") #Load the charity dataset tetradrunner.getAlgorithmParameters(algoId = 'fges',scoreId = 'bdeu') #Compute FGES search tetradrunner <- tetradrunner(algoId = 'fges',df = audiology,scoreId = 'bdeu',dataType = 'discrete', alpha=0.1,faithfulnessAssumed=TRUE,maxDegree=-1,verbose=TRUE) tetradrunner$nodes #Show the result's nodes tetradrunner$edges #Show the result's edges Prior Knowledge Create PriorKnowledge Object forbid <- list(c('TangibilityCondition','Impact')) # List of forbidden directed edges require <- list(c('Sympathy','TangibilityCondition')) # List of required directed edges forbiddenWithin <- c('TangibilityCondition','Imaginability') class(forbiddenWithin) <- 'forbiddenWithin' # Make this tier forbidden within temporal <- list(forbiddenWithin, c('Sympathy','AmountDonated'),c('Impact')) # List of temporal node tiers prior <- priorKnowledge(forbiddirect = forbid, requiredirect = require, addtemporal = temporal) fgs <- fgs(df = charity, penaltydiscount = 2, depth = -1, ignoreLinearDependence = TRUE, heuristicSpeedup = TRUE, numOfThreads = 2, verbose = TRUE, priorKnowledge = prior) Load Knowledge File # knowledge file: audiology.prior # /knowledge # forbiddirect # class tymp # class age_gt_60 # class notch_at_4k # # requiredirect # history_noise class # # addtemporal # 0* bser late_wave_poor tymp notch_at_4k o_ar_c ar_c airBoneGap air bone o_ar_u airBoneGap # 1 history_noise history_dizziness history_buzzing history_roaring history_recruitment history_fluctuating history_heredity history_nausea # 2 class prior <- priorKnowledgeFromFile('audiology.prior') fgs.discrete <- fgs.discrete(df=audiology,structurePrior=1.0,samplePrior=1.0, depth = -1, heuristicSpeedup = TRUE, numOfThreads = 2,verbose = TRUE, priorKnowledge = prior) Plot a DOT graph library(DOT) graph_dot <- tetradrunner.tetradGraphToDot(tetradrunner$graph) dot(graph_dot) Useful rJava Trouble-shooting Installation in Mac OS X Links http://stackoverflow.com/questions/26948777/how-can-i-make-rjava-use-the-newer-version-of-java-on-osx/32544358#32544358","title":"R-causal"},{"location":"r-causal/#r-causal","text":"R Wrapper for Tetrad Library Note 2023-03-06: This version of RCausal uses an older version of Tetrad from at least 5 years ago. However, we have updated our Python integration to a much better version--see https://github.com/cmu-phil/py-tetrad . Updating our R integration is one of the next projects we will take up. News 2023-04-05: We have put forward a proposal to replace the r-causal functionality using the py-tetrad functionality, here: https://github.com/cmu-phil/py-tetrad/tree/main/pytetrad/R . The installation procedure for this is still somewhat complicated, and we will try to simplify it. If you try it and and have difficulties, please let us know. Once you have it installed, it is very easy and intuitive to use. By the way, rcausal has not been maintained for some time now, as the tireless maintainer has since moved on to different work :-)... but going back through some of the issues posted for r-causal gives some hints as to additional functionality that pytetrad/R should have. We'll try to get caught up.","title":"r-causal"},{"location":"r-causal/#r-library-requirement","text":"R >= 3.3.0, stringr , rJava ,","title":"R Library Requirement"},{"location":"r-causal/#docker","text":"As an alternative to installing the library and getting rJava working with your installation (i.e., does not work well on mac) we have a Docker image","title":"Docker"},{"location":"r-causal/#installation","text":"Install the R library requirements: install.packages(\"stringr\") install.packages(\"rJava\") Install r-causal from github: library(devtools) install_github(\"bd2kccd/r-causal\")","title":"Installation"},{"location":"r-causal/#example","text":"","title":"Example"},{"location":"r-causal/#continuous-dataset","text":"library(rcausal) data(\"charity\") #Load the charity dataset tetradrunner.getAlgorithmDescription(algoId = 'fges') tetradrunner.getAlgorithmParameters(algoId = 'fges',scoreId = 'fisher-z') #Compute FGES search tetradrunner <- tetradrunner(algoId = 'fges',df = charity,scoreId = 'fisher-z', dataType = 'continuous',alpha=0.1,faithfulnessAssumed=TRUE,maxDegree=-1,verbose=TRUE) tetradrunner$nodes #Show the result's nodes tetradrunner$edges #Show the result's edges","title":"Continuous Dataset"},{"location":"r-causal/#discrete-dataset","text":"library(rcausal) data(\"audiology\") #Load the charity dataset tetradrunner.getAlgorithmParameters(algoId = 'fges',scoreId = 'bdeu') #Compute FGES search tetradrunner <- tetradrunner(algoId = 'fges',df = audiology,scoreId = 'bdeu',dataType = 'discrete', alpha=0.1,faithfulnessAssumed=TRUE,maxDegree=-1,verbose=TRUE) tetradrunner$nodes #Show the result's nodes tetradrunner$edges #Show the result's edges","title":"Discrete Dataset"},{"location":"r-causal/#prior-knowledge","text":"","title":"Prior Knowledge"},{"location":"r-causal/#create-priorknowledge-object","text":"forbid <- list(c('TangibilityCondition','Impact')) # List of forbidden directed edges require <- list(c('Sympathy','TangibilityCondition')) # List of required directed edges forbiddenWithin <- c('TangibilityCondition','Imaginability') class(forbiddenWithin) <- 'forbiddenWithin' # Make this tier forbidden within temporal <- list(forbiddenWithin, c('Sympathy','AmountDonated'),c('Impact')) # List of temporal node tiers prior <- priorKnowledge(forbiddirect = forbid, requiredirect = require, addtemporal = temporal) fgs <- fgs(df = charity, penaltydiscount = 2, depth = -1, ignoreLinearDependence = TRUE, heuristicSpeedup = TRUE, numOfThreads = 2, verbose = TRUE, priorKnowledge = prior)","title":"Create PriorKnowledge Object"},{"location":"r-causal/#load-knowledge-file","text":"# knowledge file: audiology.prior # /knowledge # forbiddirect # class tymp # class age_gt_60 # class notch_at_4k # # requiredirect # history_noise class # # addtemporal # 0* bser late_wave_poor tymp notch_at_4k o_ar_c ar_c airBoneGap air bone o_ar_u airBoneGap # 1 history_noise history_dizziness history_buzzing history_roaring history_recruitment history_fluctuating history_heredity history_nausea # 2 class prior <- priorKnowledgeFromFile('audiology.prior') fgs.discrete <- fgs.discrete(df=audiology,structurePrior=1.0,samplePrior=1.0, depth = -1, heuristicSpeedup = TRUE, numOfThreads = 2,verbose = TRUE, priorKnowledge = prior)","title":"Load Knowledge File"},{"location":"r-causal/#plot-a-dot-graph","text":"library(DOT) graph_dot <- tetradrunner.tetradGraphToDot(tetradrunner$graph) dot(graph_dot)","title":"Plot a DOT graph"},{"location":"r-causal/#useful-rjava-trouble-shooting-installation-in-mac-os-x-links","text":"http://stackoverflow.com/questions/26948777/how-can-i-make-rjava-use-the-newer-version-of-java-on-osx/32544358#32544358","title":"Useful rJava Trouble-shooting Installation in Mac OS X Links"},{"location":"tetrad-express/","text":"Tetrad Express Description A Cytoscape application (plugin) for running a simple causal model search. Purpose Provide a basic user-friendly interface for running a simple search algorithm from Tetrad. Workflow Below are the workflows you can peform: Workflow 1: Simple Search This is the simplest workflow to run a simple search. Figure 1 shows the same workflow in Tetrad. Import data. Select algorithm. Set algorithm parameters. Run search. Display graph. Figure 1. Workflow 2: Add Knowledge This workflow is to add additional knowledge to the dataset. Figure 2 shows the same workflow in Tetrad. Import data. Select knowledge type. Set knowledge. Select algorithm. Set algorithm parameters. Run search. Display graph. Figure 2. Workflow 3: Apply Data Transformation This workflow is to apply data transformation to the dataset. Figure 3 shows the same workflow in Tetrad. Import data. Edit the data: Select a data transformation. Select algorithm. Set algorithm parameters. Run search. Display graph. Figure 3.","title":"Tetrad Express"},{"location":"tetrad-express/#tetrad-express","text":"","title":"Tetrad Express"},{"location":"tetrad-express/#description","text":"A Cytoscape application (plugin) for running a simple causal model search.","title":"Description"},{"location":"tetrad-express/#purpose","text":"Provide a basic user-friendly interface for running a simple search algorithm from Tetrad.","title":"Purpose"},{"location":"tetrad-express/#workflow","text":"Below are the workflows you can peform:","title":"Workflow"},{"location":"tetrad-express/#workflow-1-simple-search","text":"This is the simplest workflow to run a simple search. Figure 1 shows the same workflow in Tetrad. Import data. Select algorithm. Set algorithm parameters. Run search. Display graph. Figure 1.","title":"Workflow 1: Simple Search"},{"location":"tetrad-express/#workflow-2-add-knowledge","text":"This workflow is to add additional knowledge to the dataset. Figure 2 shows the same workflow in Tetrad. Import data. Select knowledge type. Set knowledge. Select algorithm. Set algorithm parameters. Run search. Display graph. Figure 2.","title":"Workflow 2: Add Knowledge"},{"location":"tetrad-express/#workflow-3-apply-data-transformation","text":"This workflow is to apply data transformation to the dataset. Figure 3 shows the same workflow in Tetrad. Import data. Edit the data: Select a data transformation. Select algorithm. Set algorithm parameters. Run search. Display graph. Figure 3.","title":"Workflow 3: Apply Data Transformation"},{"location":"tetrad/","text":"Tetrad Introduction Tetrad is a program which creates, simulates data from, estimates, tests, predicts with, and searches for causal and statistical models. The aim of the program is to provide sophisticated methods in a friendly interface requiring very little statistical sophistication of the user and no programming knowledge. It is not intended to replace flexible statistical programming systems such as Matlab, Splus or R. Tetrad is open-source, free software that performs many of the functions in commercial programs such as Netica, Hugin, LISREL, EQS and other programs, and many discovery functions these commercial programs do not perform. Tetrad User Manual The Tetrad User Manual is a comprehensive guide to get you started and become profecient on using these tools for causal inference. Tetrad Tutorial The Tetrad Tutorial describes the things you can do with Tetrad with a lot of examples.","title":"Tetrad"},{"location":"tetrad/#tetrad","text":"","title":"Tetrad"},{"location":"tetrad/#introduction","text":"Tetrad is a program which creates, simulates data from, estimates, tests, predicts with, and searches for causal and statistical models. The aim of the program is to provide sophisticated methods in a friendly interface requiring very little statistical sophistication of the user and no programming knowledge. It is not intended to replace flexible statistical programming systems such as Matlab, Splus or R. Tetrad is open-source, free software that performs many of the functions in commercial programs such as Netica, Hugin, LISREL, EQS and other programs, and many discovery functions these commercial programs do not perform.","title":"Introduction"},{"location":"tetrad/#tetrad-user-manual","text":"The Tetrad User Manual is a comprehensive guide to get you started and become profecient on using these tools for causal inference.","title":"Tetrad User Manual"},{"location":"tetrad/#tetrad-tutorial","text":"The Tetrad Tutorial describes the things you can do with Tetrad with a lot of examples.","title":"Tetrad Tutorial"}]} \ No newline at end of file diff --git a/docs/sitemap.xml.gz b/docs/sitemap.xml.gz index 94bc7d49bb9f0a12a338ea7cbb2768606ef4cdec..7df97b5b1af8055d11b892fd79d83ddacd29dabe 100644 GIT binary patch delta 15 WcmZ3$w1A0CzMF%iLwO^c6e9p1fCGX6 delta 15 WcmZ3$w1A0CzMF%CS!pAi6e9o}T>|_7 diff --git a/docs_src/causal-cmd.md b/docs_src/causal-cmd.md index eecebb1..db55446 100644 --- a/docs_src/causal-cmd.md +++ b/docs_src/causal-cmd.md @@ -13,8 +13,7 @@ Causal discovery algorithms allow a user to uncover the causal relationships bet Java 8 or higher is the only prerequisite to run the software. Note that by default Java will allocate the smaller of 1/4 system memory or 1GB to the Java virtual machine (JVM). If you run out of memory (heap memory space) running your analyses you should increase the memory allocated to the JVM with the following switch '-XmxXXG' where XX is the number of gigabytes of ram you allow the JVM to utilize. For example to allocate 8 gigabytes of ram you would add -Xmx8G immediately after the java command. -In this example, we'll use download the [Retention.txt](http://www.ccd.pitt.edu/wp-content/uploads/files/Retention.txt) file, which is a dataset containing information on college graduation and used in the publication of "What Do College Ranking Data Tell Us About Student Retention?" by Drudzel and Glymour, 1994. - +In this example, we'll use download the [Retention.txt](https://raw.githubusercontent.com/bd2kccd/causal-cmd/development/dist/Retention.txt) Keep in mind that causal-cmd has different switches for different algorithms. To start, type the following command in your terminal: ```` @@ -103,7 +102,7 @@ usage: java -jar Causal-cmd Project-1.10.0.jar --algorithm fges --data-type cont --verbose Yes if verbose output should be printed or logged ```` -In this example, we'll be running the FGES algorith on the dataset `Retention.txt`. +In this example, we'll be running the FGES algorithm on the dataset `Retention.txt`. ````bash $ java -jar causal-cmd-1.10.0-jar-with-dependencies.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic-score diff --git a/docs_src/py-causal.md b/docs_src/py-causal.md index 293019f..b6349a2 100644 --- a/docs_src/py-causal.md +++ b/docs_src/py-causal.md @@ -95,7 +95,7 @@ Anaconda/Jupyter Installing Python with Anaconda and Jupyter may be easier for some users: -* [Download and install Anaconda](https://www.continuum.io/downloads) +* [Download and install Anaconda](https://www.anaconda.com/) * conda install python-javabridge For OS X, this default install does not seem to work well. try the following instead: