-
Notifications
You must be signed in to change notification settings - Fork 5
/
building-apps.qmd
280 lines (201 loc) · 10.3 KB
/
building-apps.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
---
title: "Building Apps"
---
Now that we've learned about running apps using `dx run` and about what we can do with JSON, we can now tackle building our own apps on the platform.
:::{.callout}
## It's all about scale
There is a fine line between what we should do within an app and what we should do within a workflow.
It helps to think of the workflow as being composed of modular apps that can be reused.
If you have a complex workflow that can run on a single worker,
Otherwise, it is also worth learning more about WDL (Workflow Description Language) which will help you customize your workflow and specify app componenets at the same time, or learning about running Nextflow workflows on the platform.
:::
## What is an Applet/App?
Both applets and apps are executables that can be run on the DNAnexus platform. These executables might be well known bioinformatics software (such as samtools or PLINK), or they can also be a web app, such as a Plotly Dash app or a Shiny App. One example of this is the LocusZoom app, which takes a GWAS result file as input and makes an explorable visualization of the results.
## The Applet Build Process
The applet build process is below (@fig-build)
:::
![Applet Building Process.](images/applet-build-process.png){#fig-build}
:::
In Short:
A. Build Applet Skeleton using `dx-app-wizard`. Specify inputs and outputs.
B. Add more details to Applet skeleton, including software environment and shell script.
C. Build applet in your project using `dx build`.
## The Applet Specification
A minimal applet on the platform needs the following:
1. An binary executable, or a Docker image that contains the software we want to run.
2. A JSON document (`dxapp.json`) that contains the input/output specifications, the instance specification, and the source of the software
3. A script (in either bash or python) that executes the software on the inputs, and contains instructions for registering outputs.
Applets have a special structure, as you can see below. When you create them via the `dx-app-wizard` (a dx-toolkit utility), they will have the following directory structure (@fig-directory):
::: {#fig-directory}
```.
└── my_app
├── Readme.developer.md
├── Readme.md
├── dxapp.json **
├── resources
├── src
│ └── my_app.sh **
└── test
```
Directory structure of an app. The two starred files (`dxapp.json` and `src/my_app`) are the bare minimum needed for an app. Generated automatically when you run `dx-app-wizard`.
:::
## Visualizing the Pieces of an App
:::{}
![The multiple parts of an applet. Note the arrows connect the `inputSpec` and `outputSpec` specifications in `dxapp.json` with inputs and outputs used in `samtools_subset.sh`](images/app_structure.png){#fig-appstructure}
:::
In @fig-appstructure, we can see the multiple parts of the app. In short, we will need to decide on our inputs and outputs and their required data types.
That means when we specify our app we need to do the following things:
1. Specify a software environment by either including an executable or using a Docker Image
2. Specify both inputs and outputs and their datatypes in `dxapp.json`
3. Process the inputs from `dxapp.json` in our shell script to generate outputs.
4. In our shell script, upload output files and register as outputs
## Part A: Jumpstarting our app using `dx-app-wizard`
When we call `dx-app-wizard`, we'll get an interactive wizard that will help us specify the basics of our app. Specifically, it will let us specify inputs, outputs, and options such as instance type.
We can run `dx-app-wizard` on the command line:
```bash
$ dx-app-wizard
DNAnexus App Wizard, API v1.0.0
[...]
The name of your app must be unique on the DNAnexus platform. After creating your app for the
first time, you will be able to publish new versions using the same app name. App names are
restricted to alphanumeric characters (a-z, A-Z, 0-9), and the characters ".", "\_", and "-".
App Name: samtools_subset # <1>
The title, if provided, is what is shown as the name of your app on the website. It can be
any valid UTF-8 string.
Title []: Samtools Subset # <2>
The summary of your app is a short phrase or one-line description of what your app does. It
can be any UTF-8 human-readable string.
Summary []: Subsets a bam file. # <3>
You can publish multiple versions of your app, and the version of your app is a string with
which to tag a particular version. We encourage the use of Semantic Versioning for labeling your
apps (see http://semver.org/ for more details).
Version [0.0.1]: # <4>
```
1. Unique applet name here.
2. Human readable name here.
3. Description of what the applet does
4. Put a version number here.
### Input Specification in `dx-app-wizard`
Here's a walkthrough of setting up inputs for an app:
```bash
Input Specification
You will now be prompted for each input parameter to your app. Each parameter should have a unique
name that uses only the underscore "_" and alphanumeric characters, and does not start with a
number.
1st input name (<ENTER> to finish): mappings_bam #<1>
Label (optional human-readable name) []: BAM file #<2>
Your input parameter must be of one of the following classes:
applet array:file array:record file int
array:applet array:float array:string float record
array:boolean array:int boolean hash string
Choose a class (<TAB> twice for choices): file #<3>
This is an optional parameter [y/n]: n #<4>
2nd input name (<ENTER> to finish): mappings_bai
Label (optional human-readable name) []: Bam Index file
Choose a class (<TAB> twice for choices): file
This is an optional parameter [y/n]: n
3rd input name (<ENTER> to finish): region
Label (optional human-readable name) []:
Choose a class (<TAB> twice for choices): string
This is an optional parameter [y/n]: n
4th input name (<ENTER> to finish): # <5>
```
1. Input name
2. Human readable name
3. Use the `file` class as input type
4. Optional parameter?
5. Hit <enter> when done with inputs
::: {.callout}
## Datatypes used in `inputSpec` and `outputSpec`
The datatypes used in an app are listed below.
```
applet array:file array:record file int
array:applet array:float array:string float record
array:boolean array:int boolean hash string
```
A couple of data types of note: `record` type is used for referring to pheno Datasets.
The `array` types work as you might expect, but remember when you specifying an array of inputs, you'll need multiple input lines.
:::
### Output Specification
We can fill the output specification similarly:
``` bash
Output Specification
You will now be prompted for each output parameter of your app. Each parameter should have a unique
name that uses only the underscore "_" and alphanumeric characters, and does not start with a
number.
1st output name (<ENTER> to finish): out_bam #<1>
Label (optional human-readable name) []: Out BAM#<2>
Choose a class (<TAB> twice for choices): file #<3>
2nd output name (<ENTER> to finish): #<4>
```
1. Output name
2. Human readable name
3. Class of output (see above)
4. `<Enter>` when finished
### The Rest
```bash
Timeout Policy
Set a timeout policy for your app. Any single entry point of the app that runs longer than
the specified timeout will fail with a TimeoutExceeded error. Enter an int greater than 0 with a
single-letter suffix (m=minutes, h=hours, d=days) (e.g. "48h").
Timeout policy [48h]: #<1>
Template Options
You can write your app in any programming language, but we provide templates for the
following supported languages: Python, bash
Programming language:
Programming language: bash #<2>
Access Permissions
If you request these extra permissions for your app, users will see this fact when launching your
app, and certain other restrictions will apply. For more information, see
https://documentation.dnanexus.com/developer/apps/app-permissions.
Access to the Internet (other than accessing the DNAnexus API).
Will this app need access to the Internet? [y/N]: # <3>
Direct access to the parent project. This is not needed if your app specifies outputs,
which will be copied into the project after it's done running.
Will this app need access to the parent project? [y/N]: # <4>
```
1. Timeout before the app quits.
2. You have the option of `bash` or `python` here. All other languages (such as R) will need to be wrapped in a bash script.
3. Does your app need permission to access external internet? One example would be if your app accessed an external annotation server.
4. Usually you will not need this access, since the inputs/outputs are handled by the apps.
```bash
Default instance type: The instance type you select here will apply to all entry points in
your app unless you override it. See
https://documentation.dnanexus.com/developer/api/running-analyses/instance-types for more
information.
Choose an instance type for your app [mem1_ssd1_v2_x4]: # <1>
*** Generating DNAnexus App Template... ***
Your app specification has been written to the dxapp.json file. You can specify more app options by
editing this file directly (see https://documentation.dnanexus.com/developer for complete
documentation).
Created files:
samtools-subset-test/Readme.developer.md
samtools-subset-test/Readme.md
samtools-subset-test/dxapp.json
samtools-subset-test/resources/
samtools-subset-test/src/
samtools-subset-test/src/samtools-subset-test.sh
samtools-subset-test/test/
App directory created! See https://documentation.dnanexus.com/developer for tutorials on how to
modify these files, or run "dx build samtools-subset-test" or "dx build --create-app
samtools-subset-test" while logged in with dx
```
1. This is the default instance type that is used when the `--instance-type` option is not set by the user. Much more about instance types [here](https://documentation.dnanexus.com/developer/api/running-analyses/instance-types).
## Part B: Specifying our script and environment
Now that our skeleton is built, we can modify our script. If we look at the `inputSpec` portion of our `dxapp.json` file, we'll see this:
```bash
"inputSpec": [
{
"name": "mappings_bam", #<1>
"label": "BAM file",
"class": "file",
"optional": false,
"patterns": [
"*"
],
"help": ""
},
...
]
```
1. The name of our BAM file input. We'll use this variable (`$mappings_bam`) directly in our shell script file