There are a number of relatively easy things that could and should be done next:

Extract document frequencies from all publication years (from ca. 1800 until 2020) to provide all data for TFIDF and allow time-matched TFIDF normalization.
Move more hard-coded constants to function header, defining sensible defaults.
Accelerate the plotting by vectorizing loop constructs or refactoring part of the code in C++ using Rcpp.
Match KNIME workflow output to ChEBI plot input (additional filters).
Interactive HTML5 visualization with tooltip revealing and linking to most abundant ChEBI entry in (log P/mass) bin.

Provide feedback

Saved searches