Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cosine_dist calculates similarity #23

Open
buerki opened this issue May 23, 2020 · 4 comments
Open

cosine_dist calculates similarity #23

buerki opened this issue May 23, 2020 · 4 comments

Comments

@buerki
Copy link

buerki commented May 23, 2020

Hello,
Thanks for putting together this package. Your cosine_dist function in fact calculates similarity rather than distance. This can be confusing, especially since the documentation also says it computes a distance. Perhaps renaming it cosine_sim or making it return 1-cosine similarity might be more transparent.

@HajkD
Copy link
Member

HajkD commented May 24, 2020

Hi Andy,

Thank you for contacting me and for your feedback.

Actually, in the documentation I do differentiate between distance (d) and similarity (s) measures (also for the cosine similarity metric): https://drostlab.github.io/philentropy/articles/Introduction.html#inner-product-family

Would it be possible to point me to the documentation where you read distance rather than similarity for the cosine metric?

Thank you very much for your feedback and help with this.

Best wishes,
Hajk

@HajkD
Copy link
Member

HajkD commented May 24, 2020

Do you by any chance mean that the internal low-level function name philentropy::cosine_dist() is misleading?

Here, I agree with you, but since I actually intended those functions to be internal rather than externally used, because they were all supposed to be wrapped by the distance() function, my logic was to name all internal functions ought to be wrapped by the high-level distance() function *_dist().

That is why I made it clear in the documentation and vignettes that the distance() function computes both: distance and similarity measures analogous to the definition in http://users.uom.gr/~kouiruki/sung.pdf. I see how this may be misleading when wanting to rely only on the low-level function without reading the vignette documentation.

I can have a look at this and see how to address this consistently.

Many thanks for making me aware of this.

@buerki
Copy link
Author

buerki commented May 24, 2020

Hello Hajk,

Ah, thanks, I missed the d and s tags. That does make it clearer. I was indeed referring to the description of the low-level function, both at https://drostlab.github.io/philentropy/reference/index.html and also when requesting help on it from within R. In both places it is referred to as a distance function and is named cosine_dist. But of course you are quite right that users should be using low-level functions via distance() rather than access them directly. Thanks again for this useful package.

Regards,
Andy

@HajkD
Copy link
Member

HajkD commented May 24, 2020

Hi Andy,

Brilliant. Then I know what to work on.

I think you are right. It should be more practical, so I may rename the functions in the new version (although I am hesitant, because I am not aware of all the tool/scripts/etc that may build upon this notation).

Best wishes,
Hajk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants