Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning up the custom Makefile #2973

Closed
gouttegd opened this issue Jul 13, 2023 · 5 comments · Fixed by #3028
Closed

Cleaning up the custom Makefile #2973

gouttegd opened this issue Jul 13, 2023 · 5 comments · Fixed by #3028
Assignees
Labels

Comments

@gouttegd
Copy link
Collaborator

The uberon.Makefile is currently a big mess.

Since burning it down to the ground and starting anew is not an option (not that I wouldn’t want to do it, but more reasonable people on this project would disagree, and that’s probably for the best), we need to improve it incrementally instead.

Purging commented out code

It’s easy to think that code that is commented out is harmless, but I argue that it is not the case. The amount of commented code in the Makefile is one of the things that make it very hard to read. I can understand the reluctance to throw away old code just in case it could be useful again, but we should remember that everything is under version control anyway: we’ll never definitely lose anything.

So I propose to mercilessly erase commented out code in cases like the following examples:

##MAKEOBO= owltools $< --remove-axiom-annotations -o -f obo [email protected] && grep -v ^property_value: [email protected] | grep -v ^owl-axioms: > [email protected] && obo2obo [email protected] -o $@
MAKEOBO=  $(OWLTOOLS) $< --add-obo-shorthand-to-properties  -o -f obo --no-check $@.tmp1 && grep -v ^property_value: $@.tmp1 | perl -npe 's@relationship: dc-@property_value: dc-@' | grep -v ^owl-axioms: > $@.tmp && mv $@.tmp  $@

[…]

# Not used anywhere so commenting out
#$(TMPDIR)/phenoscape_homology.owl:
#       wget --no-check-certificate https://raw.githubusercontent.com/phenoscape/phenoscape-ontologies/master/demo/phenoscape_homology.owl -O $@

[…]

# First pass at making base module
# Currently this will be missing the temporary reflexivity axioms.
# should this include downward-injected axioms, e.g on ZFA?
#uberon-base.owl: $(TMPDIR)/unreasoned.owl
#       $(OWLTOOLS) $< --remove-imports-declarations --remove-axioms -t Declaration --set-ontology-id -v $(RELEASE)/$@ $(URIBASE)/uberon/$@ -o [email protected] && mv [email protected] $@

etc.

Purging obsolete comments

In addition to commented out code, there are also plain text comments that are clearly outdated. Again, they are not harmless. One can lose quite a bit of time just trying to figure out whether a given comment is actually saying something useful that one should be aware of, or if it is something that is only of interest for Uberon archeologists.

For example, there’s an entire blocks of comments dedicated on phenoscape and how to merge it with Uberon – but phenoscape as a separate entity is no longer a thing since two years ago!

Same punition as above: we should delete such comments. Again, they will remain in the history of the project for eternity, if really someone needed to consult them one day.

Obsolete mirroring and import rules

There are lots of custom rules to deal with mirrors and imports. I suspect many if not most of them should no longer be necessary especially after the switch to base merging. We should review those rules to check:

  1. if they are used at all; if they are not, they should be deleted;
  2. if they are used, then are they doing something useful? Are they supposed to be used? If no, again they should be deleted.

Obsolete rules related to Travis

We are no longer using Travis, so there’s no reason to keep Travis-specific rules around. We should whether those rules are still used as part of the GitHub-based CI checks, and if they are not, delete them.

Use of OWLTOOLS when a ROBOT equivalent already exists

There are rules that make use of OWLTOOLS for tasks that ROBOT is completely able to perform. For example:

$(TMPDIR)/unreasoned.owl: $(OWLSRC) $(BRIDGEDIR)/uberon-bridge-to-bfo.owl
        $(OWLTOOLS) $^ --merge-support-ontologies -o -f functional $@

This is merely equivalent to

$(TMPDIR)/unreasoned.owl: $(OWLSRC) $(BRIDGEDIR)/uberon-bridge-to-bfo.owl
        $(ROBOT) merge -i $< -i $(BRIDGEDIR)/uberon-bridge-to-bfo.owl --collapse-import-closure false -o $@

Merging rules that no longer need to be kept separate

In the previous example, the $(TMPDIR)/unreasoned.owl target is only used once elsewhere in the Makefile, as a dependency to the following rule:

$(TMPDIR)/materialized.owl: $(TMPDIR)/unreasoned.owl $(TMP_REFL)
        $(ROBOT) merge -i $< -i $(DEVELOPS_FROM_CHAIN) --collapse-import-closure false \
                relax \
                materialize -T $(CONFIGDIR)/basic_properties.txt -r elk \
                reason -r elk --exclude-duplicate-axioms true --equivalent-classes-allowed asserted-only \
                unmerge -i $(TMP_REFL) \
                unmerge -i $(DEVELOPS_FROM_CHAIN) \
                annotate -O $(URIBASE)/uberon/materialized.owl -V  $(RELEASE)/materialized.owl -o $@ 2>&1 > [email protected]

If $(TMPDIR)/unreasoned.owl is now itself generated by a ROBOT command and is not used anywhere else, then there’s no need for two separate rules and even for two separate commands! It can all be done in a single rule with a single ROBOT pipeline:

$(TMPDIR)/materialized.owl: $(OWLSRC) $(BRIDGEDIR)/uberon-bridge-to-bfo.owl $(TMP_REFL)
        $(ROBOT) merge -i $< -i $(BRIDGEDIR)/uberon-bridge-to-bfo.owl -i $(DEVELOPS_FROM_CHAIN) --collapse-import-closure false \
                relax \
                materialize -T $(CONFIGDIR)/basic_properties.txt -r elk \
                reason -r elk --exclude-duplicate-axioms true --equivalent-classes-allowed asserted-only \
                unmerge -i $(TMP_REFL) \
                unmerge -i $(DEVELOPS_FROM_CHAIN) \
                annotate -O $(URIBASE)/uberon/materialized.owl -V  $(RELEASE)/materialized.owl -o $@ 2>&1 > [email protected]

Reducing the number of times we spawn a new OWLTOOLS or ROBOT process reduces the number of times the ontology needs to be parsed from disk. In the case of Uberon I am willing to bet the gain speed wouldn’t be negligeable.

Potentially obsolete workarounds

There are several instances where we work around problems coming from imported ontologies. Nothing inherently wrong with that (sometimes we don’t have any choice), but we should check whether those workarounds are still needed.

For example:

# TODO: We should fix that upstream
# They are using too strict relationships, which are here replaced by more general ones
$(TMPDIR)/fixed-zfa.obo: $(TMPDIR)/mirror-zfa.obo
        if [ $(MIR) = true ] && [ $(IMP) = true ]; then perl -npe 's@RO:0002488@RO:0002496@;s@RO:0002492@RO:0002497@' $< > $@; fi

mirror/zfa.owl: $(TMPDIR)/fixed-zfa.obo
        if [ $(MIR) = true ] && [ $(IMP) = true ]; then $(OWLTOOLS_NO_CAT) $< -o -f ofn $@; fi

$(TMPDIR)/fixed-ehdaa2.obo: $(TMPDIR)/mirror-ehdaa2.obo | $(SCRIPTSDIR)/obo-grep.pl $(SCRIPTSDIR)/fix-ehdaa2-stages.pl
        if [ $(MIR) = true ] && [ $(IMP) = true ]; then $(SCRIPTSDIR)/obo-grep.pl -r 'id: (EHDAA2|AEO)' $< | $(SCRIPTSDIR)/fix-ehdaa2-stages.pl | grep -v ^alt_id > $@; fi

Pointless targets

Consider the following rule:

# TODO: do we need this intermediate step? Used for subsets
tmp/merged.owl: ext.owl
        $(OWLTOOLS) $< --merge-import-closure --set-ontology-id -v $(RELEASE)/$@ $(URIBASE)/uberon/$@ -o $(TMPDIR)/$@ &&\
        $(ROBOT) merge -i $(TMPDIR)/$@ --collapse-import-closure false annotate --ontology-iri $(ONTBASE)/$@ $(ANNOTATE_ONTOLOGY_VERSION) -o $@

First, it is again using OWLTOOLS where ROBOT would do just fine. But most importantly, the only point of that rule is to merge the imports. OK, this may be “used for subsets”, except that… at this point in the pipeline the imports have already been merged! They have been merged since the early steps of the pipeline. tmp/merged.owl is actually the same thing as ext.owl (modulo the differences caused by serialisation, e.g. different IDs for blank nodes). This entire step is useless: two invocations of OWLTOOLS/ROBOT, two loads of the same ontology… for nothing.

Bottom line

Without re-writing the Makefile scratch, I think that, step-by-step, we can considerably improvide its readibility and therefore its maintainability, all the while making the pipeline faster (by removing needless steps), without introducing any changes in the pipeline’s outputs.

@gouttegd gouttegd self-assigned this Jul 13, 2023
@gouttegd gouttegd added the tech label Jul 13, 2023
@gouttegd
Copy link
Collaborator Author

@matentzn @anitacaron Any thoughts or objections before I start tackling this?

@matentzn
Copy link
Contributor

Purging commented out code/comments

Its all in version control, follow your instincts on what to delete! I will support it as a principle, and may object to specific things being removed if I see them in a PR.

Obsolete mirroring and import rules

I did a refactoring at some point, but yes, you can remove if you know for a fact they are redundant. However, i know that the ones called "local-%" are not 100% redundant, but I never had the time to investigate in what way.

Generally agreed

Travis rules

Misnowner, these are just tests that should be run on the normal gitub action QC CI. If the all tests that are labelled as travis_test are called by test as well, then yes, delete all mentions of travis.

Use of OWLTOOLS when a ROBOT equivalent already exists

Hard yes. ready for burn down.

Remove pointless targets

I would trust your judgement.

Merging rules that no longer need to be kept separate

I will trust you and support you on this!

Potentially obsolete workarounds

Lots of work to investigate, would have that low in list of priorities, but yes.

In summary, all your suggestions are sane and make sense! As for the evalution, I would like to make sure that three files are kept an eye on during major refactorings:

  1. uberon.obo
  2. uberon-base.owl
  3. subsets/human-view.owl

As long as all changes to these three files are under control, I am happy!

@gouttegd
Copy link
Collaborator Author

gouttegd commented Jul 15, 2023

Travis rules

Misnowner, these are just tests that should be run on the normal gitub action QC CI.

I’m talking about those rules:

# This OWLTools call is designed for running in travis; does not clog stdout
ELKRUN= $(OWLTOOLS) $< $(QELK) --run-reasoner -r elk -u > $@.tmp || (tail -1000 $@.tmp && exit -1) && (tail -1000 $@.tmp && mv $@.tmp $@)

travis_test: $(TMPDIR)/is_ok

ttest-uberon: uberon.owl $(REPORTDIR)/bfo-check.txt
	$(ELKRUN)

ttest-ext: ext.owl
	$(ELKRUN)

ttest-tax: $(TMPDIR)/uberon-edit-plus-tax-equivs.owl
	$(ELKRUN)

They are not called from anywhere else in the Makefile and they are not explicitly called from the CI workflow. As far as I can tell, it’s dead code.

Use of OWLTOOLS when a ROBOT equivalent already exists

Hard yes. ready for burn down.

Filling my flame thrower right now.

Lots of work to investigate, would have that low in list of priorities

Yes, checking which workarounds are still needed and which can be jettisoned could prove to be quite a bit of work, I agree. This would clearly come after easier bits such as removing comments.

(Overall I’ve listed the subtasks more or less in the order in which I plan to tackle them.)

@matentzn
Copy link
Contributor

IMO, all this travis stuff can go!

@gouttegd
Copy link
Collaborator Author

gouttegd commented Aug 7, 2023

Related issues that can be fixed as part of the Makefile cleanup:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants