You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
which runs pretty quickly. Unfortunately, cluster.adsicore.V02.txt is empty.
It is to be noted that the test set only contains triplets about 1 molecule, and 54 lines, could it be that there are just no clusters found? (Adsicore_V04.tsv contains 6411664 lines)
The text was updated successfully, but these errors were encountered:
Ah I remember now, valid.txt is empty right? The quality of clustering is evaluated using a validation set (hyperparameter tuning). So would have to split your train set into train and val set, f.e. 10% of the triples.
If you don't care about the unfair evaluation, you could do this just for learnnrnoisy and for applynrnoisy use the whole train set again (however if you do this please be aware that you then have triples in the training set in applynrnoisy that were used for hyperparametertuning in learnnrnoisy)
Indeed, specifying a non-empty valid.txt seems to launch many more calculations.
I'm even afraid it's going to take days... is it proportional to the number of triples in valid.txt? There are 641186 in mine, and it seems to take ~ 24h per rule relation, and there are 95 of them... any idea on how to speed that up?
Hello,
We could successfully run SAFRAN applymax and explore the results with LinkExplorer, thanks again for your help!
Now we would like to run the non-redundant algorithms.
What we did is
run calcjacc with this config
PATH_TRAINING = Adsicore_V04.tsv
PATH_TEST = DB05419.test.tsv
PATH_VALID = valid.txt
PATH_RULES = rules/alpha-1000
WORKER_THREADS = 30
VERBOSE = 1
PATH_JACCARD = jaccard.V02
which takes quite some time, and then run learnnrnoisy with this config
PATH_TRAINING = Adsicore_V04.tsv
PATH_TEST = DB05419.test.tsv
PATH_VALID = valid.txt
PATH_JACCARD = jaccard.V02
PATH_RULES = rules/alpha-1000
PATH_OUTPUT = predictions.learnnrnoisy.V02
WORKER_THREADS = 15
PATH_CLUSTER = cluster.adsicore.V02.txt
which runs pretty quickly. Unfortunately, cluster.adsicore.V02.txt is empty.
It is to be noted that the test set only contains triplets about 1 molecule, and 54 lines, could it be that there are just no clusters found? (Adsicore_V04.tsv contains 6411664 lines)
The text was updated successfully, but these errors were encountered: