Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airr2akc refactor #18

Merged
merged 48 commits into from
Jan 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
336dea5
initial refactoring of airr2akc script: split into functions, use yam…
LonnekeScheffer Oct 25, 2024
39caaf2
update ak_airr to not refer to separate slot and enum files
LonnekeScheffer Oct 25, 2024
27821b9
remove duplicate code
LonnekeScheffer Oct 25, 2024
b02bf86
bugfix annotations: identifier formatting
LonnekeScheffer Oct 25, 2024
4258697
range value 'number' (used for several numeric values, including age,…
LonnekeScheffer Oct 25, 2024
751bb73
v2 of airr2akc script: read in from airr.yaml file instead of airr Sc…
LonnekeScheffer Oct 31, 2024
5ae9767
Minor updates
LonnekeScheffer Nov 4, 2024
31b9db2
first fully working version for refactored airr2akc.py
LonnekeScheffer Nov 8, 2024
956cb05
add generated ak_airr file
LonnekeScheffer Nov 8, 2024
c614983
updated source files for checking
LonnekeScheffer Nov 12, 2024
39318f1
updates airr2akc: add 'class name' as prefix to slots, so 'required' …
LonnekeScheffer Nov 12, 2024
84a3ca2
Merge branch 'main' into airr2akc_refactor
LonnekeScheffer Nov 12, 2024
5dde064
use V1p4 instead of V1_V since the '_' is removed in downstream proce…
LonnekeScheffer Nov 12, 2024
6a5a95e
bugfix: SampleProcessing ('composition' keyword using allOf)
LonnekeScheffer Nov 12, 2024
36f3fb8
new results after bugfix
LonnekeScheffer Nov 21, 2024
5388455
remove obsolete comments
LonnekeScheffer Nov 21, 2024
dfa78e3
Merge branch 'main' into airr2akc_refactor
LonnekeScheffer Nov 21, 2024
b39b05d
rerun make all after merging in master
LonnekeScheffer Nov 21, 2024
e4d89fb
add version prefix also to slot names to allow multiple airr versions…
LonnekeScheffer Nov 21, 2024
d1f3ac8
bring in airr standards as submodules
schristley Nov 22, 2024
772a1f4
fix error
schristley Nov 22, 2024
6b63bce
- option to remove version prefix
LonnekeScheffer Nov 25, 2024
12e787f
- better logging of conflicts, including log file
LonnekeScheffer Nov 25, 2024
b75cdf4
- log conflicts with other akc schema files
LonnekeScheffer Nov 26, 2024
1d65ce5
rerun AIRR schema output, there are conflicts
LonnekeScheffer Nov 26, 2024
04e90e4
- used updated input airr schema
LonnekeScheffer Nov 26, 2024
c62239d
added v1.5 and v2.0 to makefile, running with version/class prefixes …
LonnekeScheffer Nov 26, 2024
7231170
update makefile parameter, and output datetime in airr2akc log
LonnekeScheffer Nov 26, 2024
156a251
remove all 'null' from enums
LonnekeScheffer Nov 26, 2024
4b6934e
omit class prefixes for slots, but keep version prefix
LonnekeScheffer Nov 26, 2024
3eaf365
Merge branch 'main' into airr2akc_refactor
schristley Dec 2, 2024
5a2f7d1
resolve conflicts between airr linkml and existing linkml
LonnekeScheffer Dec 5, 2024
c4f09d8
Remove 'required' from airr2akc; creates conflicts
LonnekeScheffer Dec 5, 2024
f74dc49
Remove 'required' from airr2akc; creates conflicts
LonnekeScheffer Dec 5, 2024
2775946
minor fix
schristley Dec 10, 2024
bf71735
change enum/ontology naming convention
schristley Dec 10, 2024
35c08b6
add 'Ontology' and 'Enum' suffixes to airr LinkML, and resolve relate…
LonnekeScheffer Dec 12, 2024
17dc179
fix jq install
schristley Dec 12, 2024
bb41828
change name to prevent conflict
schristley Dec 13, 2024
983fdaf
correct initialization
schristley Dec 14, 2024
8fc0191
reorganize LibraryPreparationProcessing, rename to AIRRSequencingAssay
schristley Dec 14, 2024
25fa2a5
change name to AIRRSequencingAssay
schristley Dec 14, 2024
3485ef3
extra targets
schristley Jan 11, 2025
028e890
mapping rules
schristley Jan 11, 2025
8fb2c28
clean target
schristley Jan 11, 2025
66b9108
hack for Reference
schristley Jan 11, 2025
c56425e
updates
schristley Jan 11, 2025
1953fb9
generated files
schristley Jan 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[submodule "src/airr_schema/airr-standards-v1.5"]
path = src/airr_schema/airr-standards-v1.5
url = https://github.com/airr-community/airr-standards.git
[submodule "src/airr_schema/airr-standards-v2.0"]
path = src/airr_schema/airr-standards-v2.0
url = https://github.com/airr-community/airr-standards.git
3 changes: 1 addition & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@
FROM python:3.9

# Install jq so we can process JSON
RUN apt-get update
RUN apt-get install jq -y
RUN apt-get update && apt-get install jq -y

# https://stackoverflow.com/questions/53835198/integrating-python-poetry-with-docker
ENV YOUR_ENV=${YOUR_ENV} \
Expand Down
110 changes: 11 additions & 99 deletions Makefile.AIRR
Original file line number Diff line number Diff line change
@@ -1,105 +1,17 @@
AIRR_SCHEMA_DIR=src/ak_schema/schema/airr
AIRR_SCHEMA_OUTPUT_DIR=src/ak_schema/schema
SCRIPT_DIR=src/scripts/airr2akc
AIRR_SCHEMA_V1.5_INPUT=src/airr_schema/airr-standards-v1.5/specs/airr-schema-openapi3.yaml
AIRR_SCHEMA_V2.0_INPUT=src/airr_schema/airr-standards-v2.0/specs/airr-schema-openapi3.yaml
LOGFILE_DIR=logs

airr: repertoire rearrangement clone cell receptor germline
all: airr_schema airr_schema_1.5 airr_schema_2.0

repertoire: Study Subject Diagnosis Genotype MHCGenotype Sample PCRTarget CellProcessing NucleicAcidProcessing SequencingRun SequencingData DataProcessing Repertoire
airr_schema:
python $(SCRIPT_DIR)/airr2akc.py -o $(AIRR_SCHEMA_OUTPUT_DIR)/ak_airr.yaml -a $(AIRR_SCHEMA_V1.5_INPUT) -f $(AIRR_SCHEMA_OUTPUT_DIR) -l $(LOGFILE_DIR)/airr2akc.log

rearrangement: Rearrangement
airr_schema_1.5:
python $(SCRIPT_DIR)/airr2akc.py -o $(AIRR_SCHEMA_OUTPUT_DIR)/ak_airr_v1.5.yaml -a $(AIRR_SCHEMA_V1.5_INPUT) -f $(AIRR_SCHEMA_OUTPUT_DIR) -l $(LOGFILE_DIR)/airr2akc_v1.5.log -v

clone: Clone

cell: Cell Expression Reactivity

receptor: Receptor

germline: GermlineSet AlleleDescription SequenceDelineationV RearrangedSequence UnrearrangedSequence

Study:
python $(SCRIPT_DIR)/airr2akc.py Study > $(AIRR_SCHEMA_DIR)/ak_slot_Study.yaml
python $(SCRIPT_DIR)/airr2akc.py -e Study > $(AIRR_SCHEMA_DIR)/ak_enum_Study.yaml
Subject:
python $(SCRIPT_DIR)/airr2akc.py Subject > $(AIRR_SCHEMA_DIR)/ak_slot_Subject.yaml
python $(SCRIPT_DIR)/airr2akc.py -e Subject > $(AIRR_SCHEMA_DIR)/ak_enum_Subject.yaml
Diagnosis:
python $(SCRIPT_DIR)/airr2akc.py Diagnosis > $(AIRR_SCHEMA_DIR)/ak_slot_Diagnosis.yaml
python $(SCRIPT_DIR)/airr2akc.py -e Diagnosis > $(AIRR_SCHEMA_DIR)/ak_enum_Diagnosis.yaml
Genotype:
python $(SCRIPT_DIR)/airr2akc.py Genotype > $(AIRR_SCHEMA_DIR)/ak_slot_Genotype.yaml
python $(SCRIPT_DIR)/airr2akc.py -e Genotype > $(AIRR_SCHEMA_DIR)/ak_enum_Genotype.yaml
MHCGenotype:
python $(SCRIPT_DIR)/airr2akc.py MHCGenotype > $(AIRR_SCHEMA_DIR)/ak_slot_MHCGenotype.yaml
python $(SCRIPT_DIR)/airr2akc.py -e MHCGenotype > $(AIRR_SCHEMA_DIR)/ak_enum_MHCGenotype.yaml
Sample:
python $(SCRIPT_DIR)/airr2akc.py Sample > $(AIRR_SCHEMA_DIR)/ak_slot_Sample.yaml
python $(SCRIPT_DIR)/airr2akc.py -e Sample > $(AIRR_SCHEMA_DIR)/ak_enum_Sample.yaml
PCRTarget:
python $(SCRIPT_DIR)/airr2akc.py PCRTarget > $(AIRR_SCHEMA_DIR)/ak_slot_PCRTarget.yaml
python $(SCRIPT_DIR)/airr2akc.py -e PCRTarget > $(AIRR_SCHEMA_DIR)/ak_enum_PCRTarget.yaml
CellProcessing:
python $(SCRIPT_DIR)/airr2akc.py CellProcessing > $(AIRR_SCHEMA_DIR)/ak_slot_CellProcessing.yaml
python $(SCRIPT_DIR)/airr2akc.py -e CellProcessing > $(AIRR_SCHEMA_DIR)/ak_enum_CellProcessing.yaml
NucleicAcidProcessing:
python $(SCRIPT_DIR)/airr2akc.py NucleicAcidProcessing > $(AIRR_SCHEMA_DIR)/ak_slot_NucleicAcidProcessing.yaml
python $(SCRIPT_DIR)/airr2akc.py -e NucleicAcidProcessing > $(AIRR_SCHEMA_DIR)/ak_enum_NucleicAcidProcessing.yaml
SequencingRun:
python $(SCRIPT_DIR)/airr2akc.py SequencingRun > $(AIRR_SCHEMA_DIR)/ak_slot_SequencingRun.yaml
python $(SCRIPT_DIR)/airr2akc.py -e SequencingRun > $(AIRR_SCHEMA_DIR)/ak_enum_SequencingRun.yaml
SequencingData:
python $(SCRIPT_DIR)/airr2akc.py SequencingData > $(AIRR_SCHEMA_DIR)/ak_slot_SequencingData.yaml
python $(SCRIPT_DIR)/airr2akc.py -e SequencingData > $(AIRR_SCHEMA_DIR)/ak_enum_SequencingData.yaml
DataProcessing:
python $(SCRIPT_DIR)/airr2akc.py DataProcessing > $(AIRR_SCHEMA_DIR)/ak_slot_DataProcessing.yaml
python $(SCRIPT_DIR)/airr2akc.py -e DataProcessing > $(AIRR_SCHEMA_DIR)/ak_enum_DataProcessing.yaml
Repertoire:
python $(SCRIPT_DIR)/airr2akc.py Repertoire > $(AIRR_SCHEMA_DIR)/ak_slot_Repertoire.yaml
python $(SCRIPT_DIR)/airr2akc.py -e Repertoire > $(AIRR_SCHEMA_DIR)/ak_enum_Repertoire.yaml
#
# Rearrangement rules
#
Rearrangement:
python $(SCRIPT_DIR)/airr2akc.py Rearrangement > $(AIRR_SCHEMA_DIR)/ak_slot_Rearrangement.yaml
python $(SCRIPT_DIR)/airr2akc.py -e Rearrangement > $(AIRR_SCHEMA_DIR)/ak_enum_Rearrangement.yaml
#
# Clone rules
#
Clone:
python $(SCRIPT_DIR)/airr2akc.py Clone > $(AIRR_SCHEMA_DIR)/ak_slot_Clone.yaml
python $(SCRIPT_DIR)/airr2akc.py -e Clone > $(AIRR_SCHEMA_DIR)/ak_enum_Clone.yaml
#
# Cell rules
#
Cell:
python $(SCRIPT_DIR)/airr2akc.py Cell > $(AIRR_SCHEMA_DIR)/ak_slot_Cell.yaml
python $(SCRIPT_DIR)/airr2akc.py -e Cell > $(AIRR_SCHEMA_DIR)/ak_enum_Cell.yaml
Expression:
python $(SCRIPT_DIR)/airr2akc.py CellExpression > $(AIRR_SCHEMA_DIR)/ak_slot_CellExpression.yaml
python $(SCRIPT_DIR)/airr2akc.py -e CellExpression > $(AIRR_SCHEMA_DIR)/ak_enum_CellExpression.yaml
Reactivity:
python $(SCRIPT_DIR)/airr2akc.py ReceptorReactivity > $(AIRR_SCHEMA_DIR)/ak_slot_ReceptorReactivity.yaml
python $(SCRIPT_DIR)/airr2akc.py -e ReceptorReactivity > $(AIRR_SCHEMA_DIR)/ak_enum_ReceptorReactivity.yaml
#
# Rearrangement rules
#
Receptor:
python $(SCRIPT_DIR)/airr2akc.py Receptor > $(AIRR_SCHEMA_DIR)/ak_slot_Receptor.yaml
python $(SCRIPT_DIR)/airr2akc.py -e Receptor > $(AIRR_SCHEMA_DIR)/ak_enum_Receptor.yaml
#
# Germline rules
#
GermlineSet:
python $(SCRIPT_DIR)/airr2akc.py GermlineSet > $(AIRR_SCHEMA_DIR)/ak_slot_GermlineSet.yaml
python $(SCRIPT_DIR)/airr2akc.py -e GermlineSet > $(AIRR_SCHEMA_DIR)/ak_enum_GermlineSet.yaml
AlleleDescription:
python $(SCRIPT_DIR)/airr2akc.py AlleleDescription > $(AIRR_SCHEMA_DIR)/ak_slot_AlleleDescription.yaml
python $(SCRIPT_DIR)/airr2akc.py -e AlleleDescription > $(AIRR_SCHEMA_DIR)/ak_enum_AlleleDescription.yaml
RearrangedSequence:
python $(SCRIPT_DIR)/airr2akc.py RearrangedSequence > $(AIRR_SCHEMA_DIR)/ak_slot_RearrangedSequence.yaml
python $(SCRIPT_DIR)/airr2akc.py -e RearrangedSequence > $(AIRR_SCHEMA_DIR)/ak_enum_RearrangedSequence.yaml
UnrearrangedSequence:
python $(SCRIPT_DIR)/airr2akc.py UnrearrangedSequence > $(AIRR_SCHEMA_DIR)/ak_slot_UnrearrangedSequence.yaml
python $(SCRIPT_DIR)/airr2akc.py -e UnrearrangedSequence > $(AIRR_SCHEMA_DIR)/ak_enum_UnrearrangedSequence.yaml
SequenceDelineationV:
python $(SCRIPT_DIR)/airr2akc.py SequenceDelineationV > $(AIRR_SCHEMA_DIR)/ak_slot_SequenceDelineationV.yaml
python $(SCRIPT_DIR)/airr2akc.py -e SequenceDelineationV > $(AIRR_SCHEMA_DIR)/ak_enum_SequenceDelineationV.yaml
airr_schema_2.0:
python $(SCRIPT_DIR)/airr2akc.py -o $(AIRR_SCHEMA_OUTPUT_DIR)/ak_airr_v2.0.yaml -a $(AIRR_SCHEMA_V2.0_INPUT) -f $(AIRR_SCHEMA_OUTPUT_DIR) -l $(LOGFILE_DIR)/airr2akc_v2.0.log -v

9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,15 @@ AIRR Knowledge Data Model

## Developer Documentation

This repository now contains submodules. When doing a `git clone`, those submodules are
not automatically populated, and additional command is required.

```
git clone https://airr-knowledge.github.com/ak-schema
cd ak-schema
git submodule update --init --recursive
```

Use the docker container to have a consistent development environment.

* `docker pull airrknowledge/ak-schema:tag`: pull published container for specific tagged version.
Expand Down
4 changes: 2 additions & 2 deletions examples/iedb/example.json
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,6 @@
"specimen": "example:specimen-3480642",
"type": "TCellReceptorEpitopeBindingAssay",
"assay_type": "OBI:1110179",
"value": "Positive",
"epitope": "iedb_epitope:20788",
"tcell_receptors": [
"iedb_receptor:20833",
Expand Down Expand Up @@ -800,7 +799,8 @@
"iedb_receptor:24245",
"iedb_receptor:24246",
"iedb_receptor:24247"
]
],
"value": "Positive"
}
},
"datasets": {
Expand Down
2 changes: 1 addition & 1 deletion examples/iedb/example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,6 @@ assays:
specimen: example:specimen-3480642
type: TCellReceptorEpitopeBindingAssay
assay_type: OBI:1110179
value: Positive
epitope: iedb_epitope:20788
tcell_receptors:
- iedb_receptor:20833
Expand Down Expand Up @@ -781,6 +780,7 @@ assays:
- iedb_receptor:24245
- iedb_receptor:24246
- iedb_receptor:24247
value: Positive
datasets:
example:dataset-3480642:
akc_id: example:dataset-3480642
Expand Down
4 changes: 2 additions & 2 deletions examples/iedb/tsv/assays.tsv

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions logs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.log
Loading
Loading