Causal REST API v0.0.8

This RESTful API is designed for causal web. And it implements the JAX-RS specifications using Jersey.

Table of Contents

Installation
Prerequisites
Dependencies
Configuration
Start the API Server
API Usage and Examples
Getting JSON Web Token(JWT)
1. Data Management
2. Causal Discovery
3. Result Management

Installation

The following installation instructions are supposed to be used by the server admin who deploys this API server. API users can skip this section and just start reading from the API Usage and Examples section.

Prerequisites

You must have the following installed to build/install Causal REST API:

Dependencies

If you want to run this API server and expose the API to your users, you'll first need to have the Causal Web Application installed and running. Your API users will use this web app to create their user accounts before they can consume the API.

Note: currently new users can also be created using Auth0 login option, but the API doesn't work for these users.

In order to build the API server, you'll need the released version of ccd-commons-0.3.1 by going to the repo and checkout this specific release version:

git clone https://github.com/bd2kccd/ccd-commons.git
cd ccd-commons
git checkout tags/v0.3.1
mvn clean install

You'll also need to download released ccd-db-0.6.3:

git clone https://github.com/bd2kccd/ccd-db.git
cd ccd-db
git checkout tags/v0.6.3
mvn clean install

Then you can go get and install causal-rest-api:

git clone https://github.com/bd2kccd/causal-rest-api.git
cd causal-rest-api
mvn clean package

Configuration

There are 4 configuration files to configure located at causal-rest-api/src/main/resources: - application-hsqldb.properties: HSQLDB database configurations (for testing only). - application-mysql.properties: MySQL database configurations - application-slurm.properties: Slurm setting for HPC - application.properties: Spring Boot application settings - causal.properties: Data file directory path and folder settings

Befor editing the causal.properties file, you need to create a workspace for the application to work in. Create a directory called workspace, for an example /home/zhy19/ccd/workspace. Inside the workspace directory, create another folder called lib. Then build the jar file of Tetred using the latest development branch. After that, copy the jar file to the lib folder created earlier.

Start the API Server

Once you have all the settings configured, go to causal-rest-api/target and you will find the jar file named causal-rest-api.jar. Then simply run

java -jar causal-rest-api.jar

API Usage and Examples

In the following sections, we'll demonstrate the API usage with examples using the API server that is running on Pittsburgh Super Computing. The API base URI is https://>/ccd-api.

This API requires user to be authenticated. Before using this API, the user creates an account in the Causal Web App.

Getting JSON Web Token(JWT)

After registration in Causal Web App, the email and password can be used to authenticate against the Causal REST API to get the access token (we use JWT) via HTTP Basic Auth.

API Endpoint URI pattern:

GET https://<hostname>/ccd-api/jwt

In basic auth, the user provides the username and password, which the HTTP client concatenates (username + ":" + password), and base64 encodes it. This encoded string is then sent using a Authorization header with the "Basic" schema. For instance user email demo@pitt.edu whose password is 123.

POST /ccd-api/jwt HTTP/1.1
Host: <hostname>
Authorization: Basic ZGVtb0BwaXR0LmVkdToxMjM=

Once the request is processed successfully, the user ID together with a JWT will be returned in the response for further API queries.

{
  "userId": 22,
  "jwt": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA0Mjg1OTcsImlhdCI6MTQ3NTg0NjgyODU5N30.FcE7aEpg0u2c-gUVugIjJkzjhlDu5qav_XHtgLu3c6E",
  "issuedTime": 1475846828597,
  "lifetime": 3600,
  "expireTime": 1475850428597,
  "wallTime": [
    1,
    3,
    6
  ]
}

We'll need to use this userId in the URI path of all subsequent requests. And this jwt expires in 3600 seconds(1 hour), so the API consumer will need to request for another JWT otherwise the API query to other API endpoints will be denied. And this JWT will need to be sent via the HTTP Authorization header as well, but using the Bearer schema.

The wallTime field is designed for users who want to specify the the maximum CPU time when Slurm handles the jobs on PSC. Normally, a job is expected to finish before the specified maximum walltime. After the walltime reaches the maximum, the job terminates regardless whether the job processes are still running or not. In this example, you can pick 1 hour, 3 or 6 hours as the wallTime.

Note: querying the JWT endpoint again before the current JWT expires will generate a new JWT, which makes the old JWT expired automatically. And this newly generated JWT will be valid in another hour unless there's another new JWT being queried.

Since this API is developed with Jersey, which supports WADL. So you can view the generated WADL by going to https://<hostname>/ccd-api/application.wadl?detail=true and see all resource available in the application. Accessing to this endpoint doesn't require authentication.

Basically, all the API usage examples are grouped into three categories:

Data Management
Causal Discovery
Result Management

And all the following examples will be issued by user 22 whose password is 123.

1. Data Management

Upload small data file

At this point, you can upload two types of data files: tabular dataset file(either tab delimited or comma delimited) and prior knowledge file.

API Endpoint URI pattern:

POST https://<hostname>/ccd-api/{userId}/dataset/upload

This is a multipart file upload via an HTML form, and the client is required to use name="file" to name their file upload field in their form.

Generated HTTP request code example:

POST /ccd-api/22/dataset/upload HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW

----WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="file"; filename=""
Content-Type: 


----WebKitFormBoundary7MA4YWxkTrZu0gW

If the Authorization header is not provided, the response will look like this:

{
  "timestamp": 1465414501443,
  "status": 401,
  "error": "Unauthorized",
  "message": "User credentials are required.",
  "path": "/22/dataset/upload"
}

This POST request will upload the dataset file to the target server location and add corresponding records into database. And the response will contain the following pieces:

{
    "id": 6,
    "name": "Lung-tetrad_hv.txt",
    "creationTime": 1466622267000,
    "lastModifiedTime": 1466622267000,
    "fileSize": 3309465,
    "md5checkSum": "b1db7511ee293d297e3055d9a7b46c5e",
    "fileSummary": {
      "variableType": null,
      "fileDelimiter": null,
      "numOfRows": null,
      "numOfColumns": null
    }
  }

The prior knowledge file upload uses a similar API endpoint:

POST https://<hostname>/ccd-api/{userId}/priorknowledge/upload

Due to there's no need to summarize a prior knowledge file, the response of a successful prior knowledge file upload will look like:

{
    "id": 6,
    "name": "Lung-tetrad_hv.txt",
    "creationTime": 1466622267000,
    "lastModifiedTime": 1466622267000,
    "fileSize": 3309465,
    "md5checkSum": "ugdb7511rt293d29ke3055d9a7b46c9k"
  }

Resumable data file upload

In addition to the regular file upload described in Example 6, we also provide the option of stable and resumable large file upload. It requires the client side to have a resumable upload implementation. We currently support client integrated with Resumable.js, whihc provides multiple simultaneous, stable and resumable uploads via the HTML5 File API. You can also create your own client as long as al the following parameters are set correctly.

API Endpoint URI pattern:

GET https://<hostname>/ccd-api/{userId}/chunkupload

POST https://<hostname>/ccd-api/{userId}/chunkupload

In this example, the data file is splited into 3 chunks. The upload of each chunk consists of a GET request and a POST request. To handle the state of upload chunks, a number of extra parameters are sent along with all requests:

resumableChunkNumber: The index of the chunk in the current upload. First chunk is 1 (no base-0 counting here).
resumableChunkSize: The general chunk size. Using this value and resumableTotalSize you can calculate the total number of chunks. Please note that the size of the data received in the HTTP might be lower than resumableChunkSize of this for the last chunk for a file.
resumableCurrentChunkSize: The size of the current resumable chuck.
resumableTotalSize: The total file size.
resumableType: The file type of the resumable chuck, e.e., "text/plain".
resumableIdentifier: A unique identifier for the file contained in the request.
resumableFilename: The original file name (since a bug in Firefox results in the file name not being transmitted in chunk multipart posts).
resumableRelativePath: The file's relative path when selecting a directory (defaults to file name in all browsers except Chrome).
resumableTotalChunks: The total number of chunks.

Generated HTTP request code example:

GET /ccd-api/22/chunkupload?resumableChunkNumber=2&resumableChunkSize=1048576&resumableCurrentChunkSize=1048576&resumableTotalSize=3309465&resumableType=text%2Fplain&resumableIdentifier=3309465-large-datatxt&resumableFilename=large-data.txt&resumableRelativePath=large-data.txt&resumableTotalChunks=3 HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY

This GET request checks if the data chunk is already on the server side. If the target file chunk is not found on the server, the client will issue a POST request to upload the actual data.

Generated HTTP request code example:

POST /ccd-api/22/chunkupload HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryMFjgApg56XGyeTnZ

------WebKitFormBoundaryMFjgApg56XGyeTnZ
Content-Disposition: form-data; name="resumableChunkNumber"

2
------WebKitFormBoundaryMFjgApg56XGyeTnZ
Content-Disposition: form-data; name="resumableChunkSize"

1048576
------WebKitFormBoundaryMFjgApg56XGyeTnZ
Content-Disposition: form-data; name="resumableCurrentChunkSize"

1048576
------WebKitFormBoundaryMFjgApg56XGyeTnZ
Content-Disposition: form-data; name="resumableTotalSize"

3309465
------WebKitFormBoundaryMFjgApg56XGyeTnZ
Content-Disposition: form-data; name="resumableType"

text/plain
------WebKitFormBoundaryMFjgApg56XGyeTnZ
Content-Disposition: form-data; name="resumableIdentifier"

3309465-large-datatxt
------WebKitFormBoundaryMFjgApg56XGyeTnZ
Content-Disposition: form-data; name="resumableFilename"

large-data.txt
------WebKitFormBoundaryMFjgApg56XGyeTnZ
Content-Disposition: form-data; name="resumableRelativePath"

large-data.txt
------WebKitFormBoundaryMFjgApg56XGyeTnZ
Content-Disposition: form-data; name="resumableTotalChunks"

3
------WebKitFormBoundaryMFjgApg56XGyeTnZ
Content-Disposition: form-data; name="file"; filename="blob"
Content-Type: application/octet-stream


------WebKitFormBoundaryMFjgApg56XGyeTnZ--

Each chunk upload POST will get a 200 status code from response if everything works fine.

And finally the md5checkSum string of the reassemabled file will be returned once the whole file has been uploaded successfully. In this example, the POST request that uploads the third chunk will response this:

b1db7511ee293d297e3055d9a7b46c5e

List all dataset files of a user

API Endpoint URI pattern:

GET https://<hostname>/ccd-api/{userId}/dataset

Generated HTTP request code example:

GET /ccd-api/22/dataset HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY
Accept: application/json

A JSON formatted list of all the input dataset files that are associated with user 22 will be returned.

[
  {
    "id": 8,
    "name": "data_small.txt",
    "creationTime": 1467132449000,
    "lastModifiedTime": 1467132449000,
    "fileSize": 278428,
    "md5checkSum": "ed5f27a2cf94fe3735a5d9ed9191c382",
    "fileSummary": {
      "variableType": "continuous",
      "fileDelimiter": "tab",
      "numOfRows": 302,
      "numOfColumns": 123
    }
  },
  {
    "id": 10,
    "name": "large-data.txt",
    "creationTime": 1467134048000,
    "lastModifiedTime": 1467134048000,
    "fileSize": 3309465,
    "md5checkSum": "b1db7511ee293d297e3055d9a7b46c5e",
    "fileSummary": {
      "variableType": null,
      "fileDelimiter": null,
      "numOfRows": null,
      "numOfColumns": null
    }
  },
  {
    "id": 11,
    "name": "Lung-tetrad_hv (copy).txt",
    "creationTime": 1467140415000,
    "lastModifiedTime": 1467140415000,
    "fileSize": 3309465,
    "md5checkSum": "b1db7511ee293d297e3055d9a7b46c5e",
    "fileSummary": {
      "variableType": "continuous",
      "fileDelimiter": "tab",
      "numOfRows": 302,
      "numOfColumns": 608
    }
  }
]

You can also specify the response format as XML in your request

Generated HTTP request code example:

GET /ccd-api/22/dataset HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY
Accept: application/xml

And the response will look like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<datasetFileDTOes>
    <datasetFile>
        <id>8</id>
        <name>data_small.txt</name>
        <creationTime>2016-06-28T12:47:29-04:00</creationTime>
        <lastModifiedTime>2016-06-28T12:47:29-04:00</lastModifiedTime>
        <fileSize>278428</fileSize>
        <md5checkSum>ed5f27a2cf94fe3735a5d9ed9191c382</md5checkSum>
        <fileSummary>
            <fileDelimiter>tab</fileDelimiter>
            <numOfColumns>123</numOfColumns>
            <numOfRows>302</numOfRows>
            <variableType>continuous</variableType>
        </fileSummary>
    </datasetFile>
    <datasetFile>
        <id>10</id>
        <name>large-data.txt</name>
        <creationTime>2016-06-28T13:14:08-04:00</creationTime>
        <lastModifiedTime>2016-06-28T13:14:08-04:00</lastModifiedTime>
        <fileSize>3309465</fileSize>
        <md5checkSum>b1db7511ee293d297e3055d9a7b46c5e</md5checkSum>
        <fileSummary>
            <variableType xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
            <fileDelimiter xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
            <numOfRows xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
            <numOfColumns xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
        </fileSummary>
    </datasetFile>
    <datasetFile>
        <id>11</id>
        <name>Lung-tetrad_hv (copy).txt</name>
        <creationTime>2016-06-28T15:00:15-04:00</creationTime>
        <lastModifiedTime>2016-06-28T15:00:15-04:00</lastModifiedTime>
        <fileSize>3309465</fileSize>
        <md5checkSum>b1db7511ee293d297e3055d9a7b46c5e</md5checkSum>
        <fileSummary>
            <fileDelimiter>tab</fileDelimiter>
            <numOfColumns>608</numOfColumns>
            <numOfRows>302</numOfRows>
            <variableType>continuous</variableType>
        </fileSummary>
    </datasetFile>
</datasetFileDTOes>

Form the above output, we can also tell that data file with ID 10 doesn't have all the fileSummary field values set, we'll cover this in the dataset summarization section.

Get the detail information of a dataset file based on ID

API Endpoint URI pattern:

GET https://<hostname>/ccd-api/{userId}/dataset/{id}

Generated HTTP request code example:

GET /ccd-api/22/dataset/8 HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY

And the resulting response looks like this:

{
  "id": 8,
  "name": "data_small.txt",
  "creationTime": 1467132449000,
  "lastModifiedTime": 1467132449000,
  "fileSize": 278428,
  "fileSummary": {
    "md5checkSum": "ed5f27a2cf94fe3735a5d9ed9191c382",
    "variableType": "continuous",
    "fileDelimiter": "tab",
    "numOfRows": 302,
    "numOfColumns": 123
  }
}

Delete physical dataset file and all records from database for a given file ID

API Endpoint URI pattern:

DELETE https://<hostname>/ccd-api/{userId}/dataset/{id}

Generated HTTP request code example:

DELETE /ccd-api/22/dataset/8 HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY

And this will result a HTTP 204 No Content status in response on success, which means the server successfully processed the deletion request but there's no content to response.

Summarize dataset file

So from the first example we can tell that file with ID 10 doesn't have variableType, fileDelimiter, numOfRows, and numOfColumns specified under fileSummary. Among these attributes, variableTypeandfileDelimiter` are the ones that users will need to provide during this summarization process.

Before we can go ahead to run the desired algorithm with the newly uploaded data file, we'll need to summarize the data by specifing the variable type and file delimiter.

Required Fields	Description
id	The data file ID
variableType	discrete or continuous
fileDelimiter	tab or comma

API Endpoint URI pattern:

POST https://<hostname>/ccd-api/{userId}/dataset/summarize

Generated HTTP request code example:

POST /ccd-api/22/dataset/summarize HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY
Content-Type: application/json

{
    "id": 1,
    "variableType": "continuous",
    "fileDelimiter": "comma"
}

This POST request will summarize the dataset file and generate a response (JSON or XML) like below:

{
  "id": 10,
  "name": "large-data.txt",
  "creationTime": 1467134048000,
  "lastModifiedTime": 1467134048000,
  "fileSize": 3309465,
  "md5checkSum": "b1db7511ee293d297e3055d9a7b46c5e",
  "fileSummary": {
    "variableType": "continuous",
    "fileDelimiter": "tab",
    "numOfRows": 302,
    "numOfColumns": 608
  }
}

List all prior knowledge files of a given user

API Endpoint URI pattern:

GET https://<hostname>/ccd-api/{userId}/priorknowledge

Generated HTTP request code example:

GET /ccd-api/22/priorknowledge HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY
Accept: application/json

A JSON formatted list of all the input dataset files that are associated with user 22 will be returned.

[
  {
    "id": 9,
    "name": "data_small.prior",
    "creationTime": 1467132449000,
    "lastModifiedTime": 1467132449000,
    "fileSize": 278428,
    "md5checkSum": "ed5f27a2cf94fe3735a5d9ed9191c382"
  },
  {
    "id": 12,
    "name": "large-data.prior",
    "creationTime": 1467134048000,
    "lastModifiedTime": 1467134048000,
    "fileSize": 3309465,
    "md5checkSum": "b1db7511ee293d297e3055d9a7b46c5e"
  }
]

Get the detail information of a prior knowledge file based on ID

API Endpoint URI pattern:

GET https://<hostname>/ccd-api/{userId}/priorknowledge/{id}

Generated HTTP request code example:

GET /ccd-api/22/priorknowledge/9 HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY

And the resulting response looks like this:

{
  "id": 9,
  "name": "data_small.prior",
  "creationTime": 1467132449000,
  "lastModifiedTime": 1467132449000,
  "fileSize": 278428,
  "md5checkSum": "ed5f27a2cf94fe3735a5d9ed9191c382"
}

Delete physical prior knowledge file and all records from database for a given file ID

API Endpoint URI pattern:

DELETE https://<hostname>/ccd-api/{userId}/priorknowledge/{id}

Generated HTTP request code example:

DELETE /ccd-api/22/priorknowledge/9 HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY

And this will result a HTTP 204 No Content status in response on success, which means the server successfully processed the deletion request but there's no content to response.

2. Causal Discovery

Once the data file is uploaded and summaried, you can start running a Causal Discovery Algorithm on the uploaded data file.

List all the available causal discovery algorithms

API Endpoint URI pattern:

GET https://<hostname>/ccd-api/{userId}/algorithms

Generated HTTP request code example:

GET /ccd-api/22/algorithms HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY

[
  {
    "id": 1,
    "name": "FGESc",
    "description": "FGES continuous"
  },
  {
    "id": 2,
    "name": "FGESd",
    "description": "FGES discrete"
  },
  {
    "id": 3,
    "name": "GFCIc",
    "description": "GFCI continuous"
  },
  {
    "id": 4,
    "name": "GFCId",
    "description": "GFCI discrete"
  }
]

Currently we support "FGES continuous", "FGES discrete", "GFCI continuous", and "GFCI discrete". They also share a common JSON structure as of their input, for example:

Input JSON Fields	Description
`datasetFileId`	The dataset file ID, integer
`priorKnowledgeFileId`	The optional prior knowledge file ID, integer
`dataValidation`	Algorithm specific input data validation flags, JSON object
`algorithmParameters`	Algorithm specific parameters, JSON object
`jvmOptions`	Advanced Options For Java Virtual Machine (JVM), JSON object. Currently only support `maxHeapSize` (Gigabyte, max value is 100)
`hpcParameters`	Parameters for High-Performance Computing, JSON array of key-value objects. Currently only support `wallTime`

Below are the data validation flags and parameters that you can use for each algorithm.

FGES continuous

Data validation:

Parameters	Description	Default Value
`skipNonzeroVariance`	Skip check for zero variance variables	false
`skipUniqueVarName`	Skip check for unique variable names	false

Algorithm parameters:

Parameters	Description	Default Value
`faithfulnessAssumed`	Yes if (one edge) faithfulness should be assumed	true
`maxDegree`	The maximum degree of the output graph	100
`penaltyDiscount`	Penalty discount	4.0
`verbose`	Print additional information	true

FGES discrete

Data validation:

Parameters	Description	Default Value
`skipUniqueVarName`	Skip check for unique variable names	false
`skipCategoryLimit`	Skip 'limit number of categories' check	false

Algorithm parameters:

Parameters	Description	Default Value
`structurePrior`	Structure prior coefficient	1.0
`samplePrior`	Sample prior	1.0
`maxDegree`	The maximum degree of the output graph	100
`faithfulnessAssumed`	Yes if (one edge) faithfulness should be assumed	true
`verbose`	Print additional information	true

GFCI continuous

Data validation:

Parameters	Description	Default Value
`skipNonzeroVariance`	Skip check for zero variance variables	false
`skipUniqueVarName`	Skip check for unique variable names	false

Algorithm parameters:

Parameters	Description	Default Value
`alpha`	Cutoff for p values (alpha)	0.01
`penaltyDiscount`	Penalty discount	4.0
`maxDegree`	The maximum degree of the output graph	100
`faithfulnessAssumed`	Yes if (one edge) faithfulness should be assumed	false
`verbose`	Print additional information	true

GFCI discrete

Data validation:

Parameters	Description	Default Value
`skipUniqueVarName`	Skip check for unique variable names	false
`skipCategoryLimit`	Skip 'limit number of categories' check	false

Algorithm parameters:

Parameters	Description	Default Value
`alpha`	Cutoff for p values (alpha)	0.01
`structurePrior`	Structure prior coefficient	1.0
`samplePrior`	Sample prior	1.0
`maxDegree`	The maximum degree of the output graph	100
`faithfulnessAssumed`	Yes if (one edge) faithfulness should be assumed	false
`verbose`	Print additional information	true

Add a new job to run the desired algorithm on a given data file

This is a POST request and the algorithm details and data file id will need to be specified in the POST body as a JSON when you make the request.

API Endpoint URI pattern:

POST https://<hostname>/ccd-api/{userId}/jobs/FGESc

Generated HTTP request code example:

POST /ccd-api/22/jobs/FGESc HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY
Content-Type: application/json

{
    "datasetFileId": 8,
    "priorKnowledgeFileId": 9,
    "dataValidation": {
      "skipNonzeroVariance": true,
      "skipUniqueVarName": true
    },
    "algorithmParameters": {
      "penaltyDiscount": 5.0,
      "maxDegree": 100
    },
    "jvmOptions": {
      "maxHeapSize": 100
    },
    "hpcParameters": [
       {
        "key":"wallTime",
        "value":1
       }
    ]
}

In this example, we are running the "FGES continuous" algorithm on the file of ID 8. We also set the wallTime as 1 hour. And this call will return the job info with a 201 Created response status code.

{
  "id": 5,
  "algorithmName": "FGESc",
  "status": 0,
  "addedTime": 1472742564355,
  "resultFileName": "FGESc_data_small.txt_1472742564353.txt",
  "errorResultFileName": "error_FGESc_data_small.txt_1472742564353.txt"
}

From this response we can tell that the job ID is 5, and the result file name will be FGESc_data_small.txt_1472742564353.txt if everything goes well. If something is wrong an error result file with name error_FGEsc_data_small.txt_1472742564353.txt will be created.

When you need to run "FGES discrete", just send the request to a different endpont URI:

API Endpoint URI pattern:

POST https://<hostname>/ccd-api/{userId}/jobs/FGESd

Generated HTTP request code example:

POST /ccd-api/22/jobs/FGESd HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY
Content-Type: application/json

{
    "datasetFileId": 10,
    "priorKnowledgeFileId": 12,
    "dataValidation": {
      "skipUniqueVarName": true,
      "skipCategoryLimit": true
    },
    "algorithmParameters": {
      "structurePrior": 1.0,
      "samplePrior": 1.0,
      "maxDegree": 102
    },
    "jvmOptions": {
      "maxHeapSize": 100
    }
}

List all running jobs

API Endpoint URI pattern:

GET https://<hostname>/ccd-api/{userId}/jobs

Generated HTTP request code example:

GET /ccd-api/22/jobs/ HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY
Content-Type: application/json

Then you'll see the information of all jobs that are currently running:

[
  {
    "id": 32,
    "algorithmName": "FGESc",
    "addedTime": 1468436085000
  },
  {
    "id": 33,
    "algorithmName": "FGESd",
    "addedTime": 1468436087000
  }
]

Check the job status for a given job ID

Once the new job is submitted, it takes time and resources to run the algorithm on the server. During the waiting, you can check the status of a given job ID:

API Endpoint URI pattern:

GET https://<hostname>/ccd-api/{userId}/jobs/{id}

Generated HTTP request code example:

GET /ccd-api/22/jobs/32 HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY

This will either return "Pending" or "Completed".

Cancel a running job

Sometimes you may want to cancel a submitted job.

API Endpoint URI pattern:

DELETE https://<hostname>/ccd-api/{userId}/jobs/{id}

Generated HTTP request code example:

DELETE /ccd-api/22/jobs/8 HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY

This call will response either "Job 8 has been canceled" or "Unable to cancel job 8". It's not guranteed that the system can always cencal a job successfully.

3. Result Management

List all result files generated by the algorithm

API Endpoint URI pattern:

GET https://<hostname>/ccd-api/{userId}/results

Generated HTTP request code example:

GET /ccd-api/22/results HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY

The response to this request will look like this:

[
  {
    "name": "FGESc_sim_data_20vars_100cases.csv_1466171729046.txt",
    "creationTime": 1466171732000,
    "lastModifiedTime": 1466171732000,
    "fileSize": 1660
  },
  {
    "name": "FGESc_data_small.txt_1466172140585.txt",
    "creationTime": 1466172145000,
    "lastModifiedTime": 1466172145000,
    "fileSize": 39559
  }
]

Download a specific result file generated by the algorithm based on file name

API Endpoint URI pattern:

GET https://<hostname>/ccd-api/{userId}/results/{result_file_name}

Generated HTTP request code example:

GET /ccd-api/22/results/FGESc_data_small.txt_1466172140585.txt HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY

On success, you will get the result file back as text file content. If there's a typo in file name of the that file doesn't exist, you'll get either a JSON or XML message based on the accept header in your request:

The response to this request will look like this:

{
  "timestamp": 1467210996233,
  "status": 404,
  "error": "Not Found",
  "message": "Resource not found.",
  "path": "/22/results/FGESc_data_small.txt_146172140585.txt"
}

Compare algorithm result files

Since we can list all the algorithm result files, based on the results, we can also choose multiple files and run a comparison.

API Endpoint URI pattern:

POST https://<hostname>/ccd-api/{userId}/results/compare

The request body is a JSON that contains an array of result files to be compared.

Generated HTTP request code example:

POST /ccd-api/22/results/compare HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY

{
  "resultFiles": [
    "FGESc_sim_data_20vars_100cases.csv_1466171729046.txt",
    "FGESc_data_small.txt_1467305104859.txt"
  ]
}

When you specify multiple file names, use the !! as a delimiter. This request will generate a result comparison file with the following content (shortened version):

FGESc_sim_data_20vars_100cases.csv_1466171729046.txt  FGESc_data_small.txt_1467305104859.txt
Edges In All  Same End Point
NR4A2,FOS 0 0
X5,X17  0 0
MMP11,ASB5  0 0
X12,X8  0 0
hsa_miR_654_3p,hsa_miR_337_3p 0 0
RND1,FGA  0 0
HHLA2,UBXN10  0 0
HS6ST2,RND1 0 0
SCRG1,hsa_miR_377 0 0
CDH3,diag 0 0
SERPINI2,FGG  0 0
hsa_miR_451,hsa_miR_136_  0 0

From this comparison, you can see if the two algorithm graphs have common edges and endpoints.

List all the comparison files

API Endpoint URI pattern:

GET https://<hostname>/ccd-api/{userId}/results/comparisons

Generated HTTP request code example:

GET /ccd-api/22/results/comparisons HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY

The response will show a list of comparison files:

[
  {
    "name": "result_comparison_1467385923407.txt",
    "creationTime": 1467385923000,
    "lastModifiedTime": 1467385923000,
    "fileSize": 7505
  },
  {
    "name": "result_comparison_1467387034358.txt",
    "creationTime": 1467387034000,
    "lastModifiedTime": 1467387034000,
    "fileSize": 7505
  },
  {
    "name": "result_comparison_1467388042261.txt",
    "creationTime": 1467388042000,
    "lastModifiedTime": 1467388042000,
    "fileSize": 7533
  }
]

Download a specific comparison file based on file name

API Endpoint URI pattern:

GET https://<hostname>/ccd-api/{userId}/results/comparisons/{comparison_file_name}

Generated HTTP request code example:

GET /ccd-api/22/results/comparisons/result_comparison_1467388042261.txt HTTP/1.1
Host: <hostname>
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczovL2Nsb3VkLmNjZC5waXR0LmVkdS8iLCJuYW1lIjoiemh5MTkiLCJleHAiOjE0NzU4NTA2NzY4MDQsImlhdCI6MTQ3NTg0NzA3NjgwNH0.8azVEoNPfETczXb-vn7dfyDd98eRt7iiLBXehGpPGzY

Then it returns the content of that comparison file (shorted version):

FGESc_sim_data_20vars_100cases.csv_1466171729046.txt  FGESc_data_small.txt_1467305104859.txt
Edges In All  Same End Point
NR4A2,FOS 0 0
X5,X17  0 0
MMP11,ASB5  0 0
X12,X8  0 0
hsa_miR_654_3p,hsa_miR_337_3p 0 0
RND1,FGA  0 0
HHLA2,UBXN10  0 0
HS6ST2,RND1 0 0
SCRG1,hsa_miR_377 0 0
CDH3,diag 0 0
SERPINI2,FGG  0 0