-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix function cos_similarity #90
base: master
Are you sure you want to change the base?
Conversation
… cos_similarity was not calculated.
Could you add an axis argument? I think this is one of the functions that we translated from MATLAB and never changed to numpy style. Default could be |
paderbox/math/vector.py
Outdated
@@ -25,7 +25,10 @@ def cos_distance(a, b): | |||
:param b: vector b (1xN or Nx1 numpy array) | |||
:return: distance (scalar) | |||
""" | |||
return 0.5 * (1 - sum(a * b) / np.sqrt(sum(a ** 2) * sum(b ** 2))) | |||
assert a.shape == b.shape, 'Both vectors must have the same dimension' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert a.shape == b.shape, f'Both vectors must have the same dimension. {a.shape} {b.shape}'
Use cos similarity to calculate the cos distance
Codecov ReportPatch and project coverage have no change.
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## master #90 +/- ##
=======================================
Coverage 65.44% 65.44%
=======================================
Files 80 80
Lines 5394 5394
=======================================
Hits 3530 3530
Misses 1864 1864
☔ View full report in Codecov by Sentry. |
In the cos similarity function, the absolute value is used in the denominator. So for the case [1,0] [-1,0] the similarity would not be -1 but 1, which conflicts with the definition of cos similarity? |
There are different definitions for the cosine similarity and the cosine distance. When you search for the definition of cosine similarity, you find the definition for real valued numbers without conj and without abs inside, so the old implementation of "cos_distance" was correct, according to this definition. When you work with complex number, you have to fix the denominator, but the number would still be complex. Since you changed the implementation to use abs, could you give a motivation/use case for abs? @TCord What do you usually use? While I am fine with a breaking change to remove support for column vectors (shouldn't cause any trouble), we should be careful with a breaking change that introduces an abs. |
I would vote against making a breaking change, here. We should keep the behavior of the function (value range between 0 and 1, but no usage of an absolute value) identical, and instead add an axis argument and an assertion that the output will be valid. |
If dimension of input is (1,N) or (N,1), cos_similarity was not calculated.