-
Notifications
You must be signed in to change notification settings - Fork 5
/
06-JSON.qmd
408 lines (285 loc) · 11.5 KB
/
06-JSON.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
# Working with JSON on the DNAnexus Platform {#sec-json}
:::{.callout-note}
## Preparing for this Chapter
You will not need to login to the platform for this chapter.
You'll want to `cd` into the `JSON` folder in your project.
```
cd JSON/
```
You'll also need to [install `jq`](https://stedolan.github.io/jq/download/) if it's not yet on your system. If you're on Ubuntu/WSL, I recommend installing via `apt install`. If you're on Mac, I recommend installing via `brew install`.
You can check if `jq` is already installed by typing
`which jq`
:::
## Learning Objectives
By the end of this chapter, you should be able to:
- **Define** and **Explain** what JSON is and its elements and structures
- **Explain** how JSON is used on the DNAnexus platform
- **Explain** the basic structure of a JSON file
- **Generate** JSON output from `dx find data` and `dx find jobs`
- **Execute** simple `jq` commands to extract information from a JSON file
- **Execute** advanced `jq` filters using conditionals to process output from `dx find files` or `dx find jobs`.
## What is JSON?
JSON is short for **J**ava**S**cript **O**bject **N**otation. It is a format used for storing information on the web and for interacting with APIs.
## How is JSON used on the DNAnexus Platform?
JSON is used in multiple ways on the DNAnexus Platform, including:
- Submitting Jobs with complex parameters/inputs
- Specifying parameters of an app or workflow (`dxapp.json` and `dxworkflow.json`)
- Output of commands such as `dx find data` or `dx find jobs` with the `--json` flag
- Extracting environment variables from `dx env`
Underneath it all, all interactions with the DNAnexus API server are JSON submissions.
You can see that JSON is used in many places on the DNAnexus platforms, and for many purposes. So having basic knowledge of JSON can be really helpful.
## Elements of a JSON file
Here are the main elements of a JSON file:
- **Key:Value Pair**. Example: `"name": "Ted Laderas"`. In this example, our key is "name" and our value is "Ted Laderas"
- **List `[]`** - a collection of values. All values have to be the same data type. Example: `["mom", "dad"]`
- **Object** `{}` - A collection of key/value pairs, enclosed with curly brackets (`{}`)
Here's the example we're going to use. We'll do most of our processing of JSON on our own machine.
```{bash}
#| eval: false
#| filename: "json_data/example.json"
{
"report_html": {
"dnanexus_link": "file-G4x7GX80VBzQy64k4jzgjqgY"
},
"stats_txt": {
"dnanexus_link": "file-G4x7GXQ0VBzZxFxz4fqV120B"
},
"users": ["laderast", "ted", "tladeras"]
}
```
:::{.callout-note}
## Check Yourself
What does the `names` value contain in the following JSON? Is it a list, object or key:value pair?
```
{
"names": ["Ted", "Lisa", "George"]
}
```
:::
:::{.callout-note collapse="true"}
## Answer
It is a list. We know this because the value contains a `[]`.
```
{
"names": ["Ted", "Lisa", "George"]
}
```
:::
## Nestedness
JSON wouldn't be helpful if it were only limited to a single level or key:values. Values can be lists or objects as well. For example, in our example JSON, we can see that the value of `report_html` is a JSON object:
```
"report_html": {
"dnanexus_link": "file-G4x7GX80VBzQy64k4jzgjqgY"
}
```
The object is:
```
{
"dnanexus_link": "file-G4x7GX80VBzQy64k4jzgjqgY"
}
```
When we work with extracting information, we'll have to take this nested structure in mind.
## Outputting JSON with `dx find` commands
We already encountered the `dx find data` command, which we used in the batch processing chapter.
If we use the `--json` option, then the file information will be outputted in json format. This command will return a list of JSON file objects.
For example:
```{bash}
#| eval: false
#| filename: 05-JSON/dx-find-data-json.sh
dx find data --path ted_demo:data/ --json
```
The output will look like this:
```
[
{
"project": "project-GGyyqvj0yp6B82ZZ9y23Zf6q",
"id": "file-FvQGZb00bvyQXzG3250XGbgz",
"describe": {
"id": "file-FvQGZb00bvyQXzG3250XGbgz",
"project": "project-GGyyqvj0yp6B82ZZ9y23Zf6q",
"class": "file",
"name": "small-celegans-sample.fastq",
"state": "closed",
"folder": "/json_data",
"modified": 1665003035646,
"size": 16801690
}
},
{
"project": "project-GGyyqvj0yp6B82ZZ9y23Zf6q",
"id": "file-B5Q8z8V5g3bX5qQ9y9YQ006k",
"describe": {
"id": "file-B5Q8z8V5g3bX5qQ9y9YQ006k",
"project": "project-GGyyqvj0yp6B82ZZ9y23Zf6q",
"class": "file",
"name": "NC_001422.fasta",
"state": "closed",
"folder": "/json_data",
"modified": 1665003035645,
"size": 5539
}
}
]
```
:::{.callout-note}
### Test your knowledge
What is returned when we run this code? Is it a JSON object, or a list of JSON objects?
```{bash}
#| eval: false
#| filename: 05-JSON/dx-find-jobs-json.sh
dx find jobs --json
```
:::
:::{.callout-note collapse="true"}
## Answer
It's hard to tell at first, but We are returning a list of JSON objects, each of which corresponds to a single job run within our project.
:::
## Learning `jq` gradually
As you can see, JSON can be very complicated to process and extract information from, depending on how many levels you go deep in a JSON document. That's why `jq` exists
`jq` is a utility that is made to process JSON. All `jq` commands have this format:
```{bash}
#| eval: false
jq '<filter>' <JSON file>
```
Filters are the heart of processing data using `jq`. They let you extract JSON values or keys and process them with conditionals to filter data down. For example, you can do something like the following:
1. Select all elements where the job status is failed
2. For each of these elements, output the job-status id
You can see how `jq` can be extremely powerful.
You can also pipe JSON from standard output into `jq`. This will be really helpful for us when we start using pipes of data files from `dx find data`.
## Our simplest filter: `.`
One of the biggest uses for `jq` is for more readable formatting. Oftentimes, the JSON returned by an API call is really hard to read. It can be returned as a single line of text, and it is really hard for humans to see the actual structure of the JSON response.
If we run `jq .` on a JSON file, we'll see that it makes it much more readable.
```{bash}
#| eval: false
#| filename: JSON/jq-simple.sh
jq '.' json_data/example.json
```
## Getting the keys
We can extract the keys from the top level JSON by using `'keys'` as our filter.
```{bash}
#| eval: false
#| filename: JSON/jq-keys.sh
jq 'keys' json_data/example.json
```
## Extracting a value from a container: `jq .report_html`
So, say we want to extract the value from the `report_html` key in the above.
We can specify the key that we're interested in to extract the value from that key.
```{bash}
#| eval: false
#| filename: JSON/jq-report.sh
jq '.report_html' json_data/example.json
```
:::{.callout-note}
## Try it out
This is the JSON file we're going to be working with, in `json_data/example.json`.
```{bash}
#| eval: false
#| filename: "json_data/example.json"
{
"report_html": {
"dnanexus_link": "file-G4x7GX80VBzQy64k4jzgjqgY"
},
"stats_txt": {
"dnanexus_link": "file-G4x7GXQ0VBzZxFxz4fqV120B"
},
"users": ["laderast", "ted", "tladeras"]
}
```
In your terminal, try out:
```
jq '.stats_txt' json_data/example.json
```
What do you return?
:::
:::{.callout-note collapse="true"}
## Answer
```{bash}
#| eval: false
#| filename: JSON/jq-stats-txt.sh
jq '.stats_txt' json_data/example.json
```
We'll return the following JSON object, which contains a single key-value pair.
```
{
"dnanexus_link": "file-G4x7GXQ0VBzZxFxz4fqV120B"
}
```
:::
### Going one level deeper
We can extract the actual value associated with the `dnanexus_link` key within `report_html` by chaining onto our filter:
```{bash}
#| eval: false
#| filename: JSON/jq-nested.sh
jq '.report_html.dnanexus_link' json_data/example.json
```
:::{.callout-note}
## Try It Out
What is returned when you run this code?
```{bash}
#| eval: false
#| filename: JSON/jq-nested.sh
jq '.report_html.dnanexus_link' json_data/example.json
```
:::
:::{.callout-note collapse="true"}
## Answer
Running this command should return the value of `dnanexus-link` within `report_html`:
```
"file-G4x7GX80VBzQy64k4jzgjqgY"
```
:::
## Conditional Filters using `jq`
One natural use case for using `jq` on the DNAnexus platform is to rerun failed jobs.
Failed jobs can occur when using normal priority, which focuses on using spot instances. So, if we ran a series of jobs, we would want to restart these failed jobs.
This is a bit of code that would allow us to select those jobs that have failed.
```{bash}
#| eval: false
#| filename: JSON/dx-find-jobs-jq-clone.sh
dx find jobs --json |\
jq '.[] | select (.state | contains("failed")) | .id' |\
xargs -I% sh -c "dx run --clone %"
```
The second line contains the `jq` filter that does the magic. Remember, the filter is contained within the single quotes (`''`).
The last line contains `"dx run --clone %"`.
Let's take apart the different parts of the `jq` filter (@fig-jq-filter):
:::{#fig-jq-filter}
```{mermaid}
graph LR;
A[".[]"] --> B{"|"}
B --> C["select (.state | contains('failed'))"]
C --> D{"|"}
D --> E[".id"]
```
Taking apart the `jq` filter.
:::
Note that the pipes in this filter apply only to the `jq` filter, so don't mix them up with the other pipes in our overall Bash statement.
The first part of the filter, `.[]`, says that we want to process the list (remember, `dx find jobs` returns a list of objects).
The second part of the filter, `select (.state | contains('failed'))` will let us select objects in the list that have a `state` of `failed`. This list of objects is then passed on the next part of the filter.
The last part of the filter, `.id`, returns the the file ids for our failed jobs.
This is a basic pattern for selecting objects that meet a criteria, and can be really helpful when you want more control of your batch processing.
:::{.callout-note}
## Check Yourself
How would you modify the code below to terminate all jobs that had `state` `running` using `dx terminate`?
```{bash}
#| eval: false
dx find jobs --json |\
jq '.[] | select (.state | contains("failed")) | .id' |\
xargs -I% sh -c "dx run --clone %"
```
:::
:::{.callout-note collapse="true"}
## Answer
```{bash}
#| eval: false
dx find jobs --json | \
jq '.[] | select (.state | contains("running")) | .id' | \
xargs -I% sh -c "dx terminate %"
```
:::
## Using JSON as an Input
This section is made to help you in writing JSON files. If you build an app or a workflow, you will need to edit the `dxapp.json` or `dxworkflow.json` files to enable your executables to be runnable.
### Writing and modifying JSON
I know that JSON is supposed to be human readable. However, there are a lot of little quibbles that don't make it easily human writable.
I highly recommend using an editor such as [VS Code](https://code.visualstudio.com/), with the appropriate [JSON plugin](https://code.visualstudio.com/docs/languages/json). A JSON Visualizer such as the [JSON Crack Extension](https://marketplace.visualstudio.com/items?itemName=AykutSarac.jsoncrack-vscode) will be extremely helpful as well.
![JSON Visualizer Plugin](images/json_visualizer.png)
Using the visualizer plugin and this tutorial will help you write well formed JSON, and point out any issues you might have. It's easy to misplace a comma, or a bracket, and this tool helps you write well-formed JSON.