Cannot generate clusters with learnnrnoisy #15

fburdet · 2022-06-22T11:11:38Z

Hello,

We could successfully run SAFRAN applymax and explore the results with LinkExplorer, thanks again for your help!

Now we would like to run the non-redundant algorithms.

What we did is
run calcjacc with this config
PATH_TRAINING = Adsicore_V04.tsv
PATH_TEST = DB05419.test.tsv
PATH_VALID = valid.txt

PATH_RULES = rules/alpha-1000

WORKER_THREADS = 30

VERBOSE = 1

PATH_JACCARD = jaccard.V02

which takes quite some time, and then run learnnrnoisy with this config

PATH_TRAINING = Adsicore_V04.tsv
PATH_TEST = DB05419.test.tsv
PATH_VALID = valid.txt

PATH_JACCARD = jaccard.V02
PATH_RULES = rules/alpha-1000

PATH_OUTPUT = predictions.learnnrnoisy.V02

WORKER_THREADS = 15

PATH_CLUSTER = cluster.adsicore.V02.txt

which runs pretty quickly. Unfortunately, cluster.adsicore.V02.txt is empty.

It is to be noted that the test set only contains triplets about 1 molecule, and 54 lines, could it be that there are just no clusters found? (Adsicore_V04.tsv contains 6411664 lines)

nomisto · 2022-06-22T11:19:33Z

Ah I remember now, valid.txt is empty right? The quality of clustering is evaluated using a validation set (hyperparameter tuning). So would have to split your train set into train and val set, f.e. 10% of the triples.

If you don't care about the unfair evaluation, you could do this just for learnnrnoisy and for applynrnoisy use the whole train set again (however if you do this please be aware that you then have triples in the training set in applynrnoisy that were used for hyperparametertuning in learnnrnoisy)

fburdet · 2022-06-24T14:28:10Z

Hello,

Thanks for the answer!

Indeed, specifying a non-empty valid.txt seems to launch many more calculations.

I'm even afraid it's going to take days... is it proportional to the number of triples in valid.txt? There are 641186 in mine, and it seems to take ~ 24h per rule relation, and there are 95 of them... any idea on how to speed that up?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot generate clusters with learnnrnoisy #15

Cannot generate clusters with learnnrnoisy #15

fburdet commented Jun 22, 2022 •

edited

Loading

nomisto commented Jun 22, 2022

fburdet commented Jun 24, 2022

Cannot generate clusters with learnnrnoisy #15

Cannot generate clusters with learnnrnoisy #15

Comments

fburdet commented Jun 22, 2022 • edited Loading

nomisto commented Jun 22, 2022

fburdet commented Jun 24, 2022

fburdet commented Jun 22, 2022 •

edited

Loading