- Added type hints to all functions (e6721a1)
- Added CodeCov shield to track PyTest test code coverage (23d6456)
- Added more PyTest unit tests (e.g., 0c94995, 23d6456, 5a99d6b, f76535e, 94646ad, d5f5d4e, 918d18f, d1a8c6d, 194f31c)
- Added setuptools to required_installs to support pip installation beyond
pip 25.0
(94646ad) - Added pyproject.toml to support pip installation beyond
pip 25.0
(94646ad) - Added CITATION.bib to allow for even easier citation of glycowork (a64f694)
- Reworked user interface of the
glycoworkGUI
(77bbfa3)
- Bumped minimum supported Python version to 3.9 (3.8 is no longer supported, see https://devguide.python.org/versions/) (4960c5c)
- Switched docstring style to docments (https://nbdev.fast.ai/tutorials/best_practices.html#document-parameters-with-docments) (e6721a1)
- Removed
gdown
dependency; Will be handled by the standard library moduleurllib
for better retrieval of externally stored models/files (319981e, 35ed71a) - Switched pathing from
os
topathlib
(319981e)
- Added new named motifs to
motif_list
: DisialylLewisC,Sia(a2-3)Gal(b1-3)[Sia(a2-6)]GlcNAc
; RM2,Sia(a2-3)[GalNAc(b1-4)]Gal(b1-3)[Sia(a2-6)]GlcNAc
; DisialylLewisA,Sia(a2-3)Gal(b1-3)[Fuc(a1-4)][Sia(a2-6)]GlcNAc
(a64f694) - Added new curated glycomics dataset,
mouse_brain_GSL_PMID39375371
(b94744e)
- Changed
glycoproteomics_human_keratinocytes_PMID37956981
toglycoproteomics_human_keratinocytes_N_PMID37956981
(d5f5d4e) - Improved the description of blood group motifs in
motif_list
(including type 3 blood group antigens, ExtB, and parent motifs) (b94744e)
- Fixed the "Oglycan_core6" motif definition in
motif_list
to no longer overlap with core 2 structures (f394bda)
- Added
count_nested_brackets
helper function to monitor level of nesting in glycans (41bb1a1, d57b836) - Added dictionaries with lists of strings as values as a new supported data type for
DataFrameSerializer
(034b6ad) - Added
share_neighbor
helper function to check whether two nodes in a glycan graph share a neighbor (f394bda)
- Changed
resources.open_text
toresources.files
to preventDeprecationWarning
fromimportlib
(0c94995) lectin_specificity
now uses our customDataFrameSerializer
and is stored as a .json file rather than a .pkl file, to improve long-term stability across versions (034b6ad)
- Fixed DeprecationWarning in all data-loading functions that used
importlib.resources.open_text
or.content
(87ea2fc)
- Added the "random_state" keyword argument to
clr_transformation
to allow users to provide a reproducible RNG seed (b94744e) - Added the
JTKTest
class object (87ea2fc)
- For
replace_outliers_winsorization
, in small datasets, the 5% limit is dynamically changed to include at least one datapoint (23d6456) - Handled the edge case of strong differences in
cohen_d
with zero standard deviation; now outputting positive/negative infinity (23d6456) - Renamed
test_inter_vs_intra_group
tocompare_inter_vs_intra_group
, to avoid testing issues (23d6456) partial_corr
will now return a normal Spearman's correlation if no control features can be identified (241141b)
- Deprecated
hlm
,fast_two_sum
,two_sum
,expansion_sum
, andupdate_cf_for_m_n
, which will all be done in-line instead (e1afe33) - Deprecated
jtkdist
,jtkinit
,jtkstat
,jtkx
, which will all be done by the newJTKTest
(87ea2fc)
- Fixed DeprecationWarning in
calculate_permanova_stat
for calling nonzero on 0d arrays (23d6456) - Prevent possible division by zero in pseudo-F calculation in
calculate_permanova_stat
(23d6456) - Fixed DeprecationWarning in
jtkdist
for callingnp.sum(generator)
(23d6456) - Ensured that the input to
impute_and_normalize
are columns with floats, to avoid TypeWarnings during imputation (23d6456) - Fixed DeprecationWarning in
process_glm_results
to prevent DataFrameGroupBy.apply from operating on the grouping columns (23d6456) - Fixed RuntimeWarnings for JTK-related functions in case of imperfect input data (d5f5d4e)
- Ensured that
correct_multiple_testing
will return empty lists if the provided p-value list is also empty (ef3da9c)
- Added
get_random_glycan
to retrieve random glycan sequences (optionally of specific glycan type) (d1a8c6d) - Supported intramolecular modifications like lactonization in
glycan_to_composition
(8c69c2c)
- Changed
resources.open_text
toresources.files
to preventDeprecationWarning
fromimportlib
(0c94995) - The monosaccharide keys of the output dictionaries of
glycan_to_composition
are now alphabetically sorted (8c69c2c) - Modified
calculate_adduct_mass
to deal with a greater variety of adduct handling, such as "C2H4O2", "-H2O", "+Na" to add or subtract masses (8c69c2c) - Expanded
glycan_to_mass
andcomposition_to_mass
to deal with compositional building blocks that represent losses/gains in the molecule (like "-H2O") (8c69c2c) - Composition and mass functions now can correctly work with azide-modified monosaccharides such as Neu5Az (ef3da9c)
- In addition to chemical formulae, users can now also provide direct additional masses as floats with the same "adduct" keyword argument in
composition_to_mass
andglycan_to_mass
(d57b836) get_modification
will no longer return the 5Ac / 5Gc of Neu5Ac / Neu5Gc as part of the modification (0387d37)
- Fixed an edge case in
get_unique_topologies
, in which the absence of a universal replacer sometimes created an empty list that was attempted to be indexed (0c94995) - Made sure that
compositions_to_structures
always returns a DataFrame, even if no matches are found (0c94995) - Provided correct exact methyl masses in
mass_dict
(e3eeb32)
- Added "antennary_Fuc" as another inferred feature to
infer_features_from_composition
(a64f694) - Added "IdoA", "GalA", "Araf", "D-Fuc", "AllNAc", "Par", "Kdo", "GlcN", "Ido", "Col", "Tyv", "GalN", "QuiNAc", "Gul", and "Gal6S" to recognized WURCS2 tokens (52fc16e, f3cd8f0, 7551805, 35ed71a)
- Added the new "order_by" keyword argument to
choose_correct_isoform
to enforce strictly sorting branches by branch endings / linkages, if desired (918d18f) - Added "Col", "Ido", "Kdo", and "Gul" to supported GlycoCT monosaccharides (7551805, 35ed71a)
- GLYCAM is now another supported nomenclature in the Universal Input framework, enabled by the added
glycam_to_iupac
function, which is also integrated intocanonicalize_iupac
(2fb5dc6) - GlycoWorkBench (GlycanBuilder) is now another supported nomenclature in the Universal Input framework, enabled by the added
glycoworkbench_to_iupac
function, which is also integrated intocanonicalize_iupac
(ea1fdfc)
check_nomenclature
will now actually raise appropriate Exceptions, in case nomenclature is incompatible with glycowork, instead of print warnings (23d6456)- Supported triple-branch reordering in
find_isomorphs
andchoose_correct_isoform
(918d18f) - Improved
find_isomorphs
to swap neighboring branches with different levels of nesting (41bb1a1, 034b6ad) choose_correct_isoform
can now also be used with a single glycan sequence, in which case it internally callsfind_isomorphs
to generate material for choosing (918d18f)choose_correct_isoform
can now correctly handle more complex sequences than before (41bb1a1, 034b6ad, d1ff321)canonicalize_iupac
now can handle modifications such as Neu5,9Ac2 / Neu4,5Ac2 or multiple ones like in (6S)(4S)Gal, even if in the wrong order (034b6ad)canonicalize_iupac
now can handle even more typos (e.g., 'aa1-3' in specifying a linkage) (a64f694, 241141b)canonicalize_iupac
now can handle even more inconsistencies (e.g., mix of short-hand and expanded linkages)- Expanded
get_mono
to deal with some special WURCS2 tokens at the reducing end, of typeu2122h_2*NCC/3=O
(d57b836) canonicalize_iupac
will no longer convert things like "b1-3/4" into "b1-?", because narrow linkage ambiguities can now be properly handled (52fc16e)get_possible_linkages
andde_wildcard_glycoletter
now also support narrow linkage ambiguities like "b1-3/4" (52fc16e)canonicalize_iupac
will now no longer mess up branch formatting of the repeating unit in glycans of type "repeat" (9a94537)- Ensured that
canonicalize_iupac
works with lactonized glycans (i.e., containing something like "1,7lactone") (8c69c2c) find_matching_brackets_indices
has been renamed toget_matching_indices
and now takes multiple delimiter choices and returns a generator, including the level of nesting (basically what.draw.matches
used to do) (e1afe33)get_class
will now return "lipid/free" if glycans of type Neu5Ac(a2-3)Gal(b1-4)Glc are supplied (i.e., lacking 1Cer and -ol but still lactose-core based) (b99699c)expand_lib
now no longer modifies the input dictionary (65bd12c)get_possible_linkages
now returns a set instead of a list (a98461f)wurcs_to_iupac
now can also properly deal with ultra-narrow linkage wildcards (e.g., a2-3/6) (f3cd8f0)
- Fixed component inference in
parse_glycoform
in case of unexpected composition formats (0c94995) - Fixed an issue in
equal_repeats
, in which identical repeats sometimes were not returning True (0c94995)
- Natively support narrow linkage ambiguity in
categorical_node_match_wildcard
; that means you can use things like "Gal(b1-3/4)GlcNAc" withsubgraph_isomorphism
orcompare_glycans
(as well as all functions using these core functions) and it will only return True for "Gal(b1-3)GlcNAc", "Gal(b1-4)GlcNAc", and "Gal(b1-?)GlcNAc" (b94744e) - Added
build_wildcard_cache
for a central handling of wildcard mapping that can also be cached (a98461f) compare_glycans
now also has thereturn_matches
keyword argument that allows for a retrieval of the node mapping if the glycans are isomorphic (7c510c9)
- Ensured that
compare_glycans
is 100% order-specific, never matching something like ("Gal(b1-4)GlcNAc", "GlcNAc(b1-4)Gal") (5a99d6b) glycan_to_nxGraph
will now return an empty graph if the input is an empty string (4f1ccfa)get_possible_topologies
will now also produce a warning (and return the input) if an already defined topology is provided as a pre-calculated graph (3f22f14)- Negation in
subgraph_isomorphism
can now also be added for internal monosaccharides (e.g., "Neu5Ac(a2-3)!Gal(b1-4)GlcNAc") (7558d9b) - Functions with the
handle_negation
decorator can now be accessed without the decorator via.__wrapped__
(7558d9b)
- Fixed an edge case in which
subgraph_isomorphism
could erroneously return False if any of the matchings were in the wrong order, if "count = False" (f394bda) - Fixed an edge case in which negated motifs in
subgraph_isomorphism
sometimes wrongly returned False because the negated motif was present somewhere else in the glycan (but the intended motif was still there) (7558d9b)
- Added the "drawing" argument to
draw_hex
,hex_circumference
,add_bond
,add_sugar
, anddraw_bracket
to avoid having to operate on global variables (918d18f) - Added the option to provide your own existing glycan .pdb structures to
GlycoDraw
when usingdraw_method='chem3d'
with the new keyword argumentpdb_file
(9d082a6)
matches
can now also use[]
as delimiters (f76535e)- Support easy import of
GlycoDraw
, viafrom glycowork import GlycoDraw
(d5f5d4e) - Renamed
hex
todraw_hex
, to avoid overwriting the built-inhex
(918d18f) - Changed keyword argument "hex" to "hex_codes" in
add_colours_to_map
(838c708) get_highlight_attribute
now internally usesmotif.graph.subgraph_isomorphism
for pattern retrieval, ensuring up-to-date functionality (4f1ccfa)get_coordinates_and_labels
now internally usesmotif.processing.choose_correct_isoform
to reorder the glycan for drawing (41bb1a1)- Improved console drawing quality controlled by
display_svg_with_matplotlib
and image quality in Excel cells usingplot_glycans_excel
(a64f694) draw_chem2d
anddraw_chem3d
will now detect whether the user is in a Jupyter environment and, if not, plot to the Matplotlib console (c3a7f64)process_per_residue
now will re-order theper_residue
list in the same way as the glycan is re-ordered for drawing withGlycoDraw
(7c510c9)
- Deprecated
hex_circumference
, the functionality is now available withindraw_hex
with the new keyword argument "outline_only" (4f1ccfa) - Deprecated
multiple_branches
,multiple_branch_branches
,branch_order
, andreorder_for_drawing
accordingly (41bb1a1) - Deprecated
matches
, which will now be done by.processing.get_matching_indices
that has been reworked
- Made sure
scale_in_range
never divides by zero, if value range is zero (f76535e) - Made sure that monosaccharides that were never observed but are still SNFG-defined (like TalNAc vs 6dTalNAc) can still be drawn with
GlycoDraw
(ef24af4)
get_glycanova
will now raise a ValueError if fewer than three groups are provided in the input data (f76535e)- Improved console drawing quality controlled by
display_svg_with_matplotlib
and image quality in Excel cells usingplot_glycans_excel
(a64f694) - The "periods" argument in
get_jtk
is now a keyword argument and has a default value of [12, 24] (87ea2fc) specify_linkages
can now also handle super-narrow linkage wildcards like Galb3/4 (f394bda)get_SparCC
will now limit the number of eligible controls for "partial_correlations=True" to sample_size//5, capped at 5 (241141b)
- Fixed a FutureWarning in
get_lectin_array
by avoiding DataFrame.groupby with axis=1 (f76535e) - Fixed a RuntimeWarning in
get_biodiversity
by handling statistical tests of identical alpha diversity values between groups (f76535e) - Made sure that the TSNE perplexity fits the sample size in
plot_embeddings
(d5f5d4e) - Fixed an edge case in which user-provided embeddings as DataFrames were misformatted in
plot_embeddings
(d5f5d4e) - Supported the case where no labels are provided to
plot_embeddings
(d5f5d4e) - Fixed a potential format mismatch in
get_meta_analysis
if random-effects meta-analyses were performed (d5f5d4e) - Fixed an issue where variance-filtered rows could cause problems in
get_differential_expression
if "monte_carlo = True" (ef3da9c) - Fixed an issue in
get_differential_expression
if "sets = True" that caused indexing issues under certain conditions (ef3da9c) - Ensured that "effect_size_variance = True" in
get_differential_expression
always formats variances correctly (ef3da9c) - Ensured that the combination of "grouped_BH = True", "paired = False", and CLR/ALR in
get_differential_expression
works even when negative values are present (87ea2fc)
- Improved tracing in
try_matching
for complicated branching cases (f394bda) - Ensured that
format_retrieved_matches
outputs the identified motifs in the canonical IUPAC representation (7558d9b)
- Deprecated
process_pattern
; will be done in-line instead (f394bda) - Deprecated
expand_pattern
; will be handled byspecify_linkages
and improvements insubgraph_isomorphism
instead (f394bda) - Deprecated
filter_dealbreakers
; will be handled by improvements insubgraph_isomorphism
instead (65bd12c)
- Fixed an issue in
get_match_batch
, in which precompiled patterns caused issues inget_match
(194f31c)
- Added
get_size_branching_features
to create glycan size and branching level features for downstream analysis (d57b836) - Added the "size_branch" option in the "feature_set" keyword argument of
annotate_dataset
andquantify_motifs
, to analyze glycans by size or level of branching (d57b836)
- Fixed an issue in
clean_up_heatmap
in which, occasionally, duplicate strings were introduced in the output (e3eeb32)
- Added classification-AUROC, multilabel-accuracy, multilabel-MCC, regression-MAE, and regression-R2 as metrics to
train_model
(#66) - Added the "return_metrics" keyword argument to
train_model
that can additionally return all training and validation metrics (#66)
- Weigh metric calculation by batch-size (correctly handling the last batch) in
train_model
(#66) - Best performances in
train_model
are now taken from the overall best model (lowest loss), not from best-model-per-metric (#66)
- Fixed an indexing issue in
train_ml_model
if "additional_features_train" / "additional_features_test" were used (b94744e)
- Changed
resources.open_text
toresources.files
to preventDeprecationWarning
fromimportlib
(d1a8c6d)
- In
prep_model
, thehidden_dim
argument can now also be used to modify the protein embedding size of a newly defined LectinOracle model (d1ff321)
- Fixed DeprecationWarning in
distance_from_embeddings
to prevent DataFrameGroupBy.apply from operating on the grouping columns (94646ad) - Fixed an issue in
distance_from_metric
where networks were indexed incorrectly based on presented DataFrame order (d2f5d55)
- Made sure in
network_alignment
that only nodes that are virtual in all aligned networks stay virtual (918d18f) choose_leaves_to_extend
will now correctly return no leaf node glycan if the target composition cannot be reached from any of the leaf nodes in a network (918d18f)
- Fixed an issue in
find_shared_virtuals
in which no shared nodes were found because of graph comparisons (d2f5d55)