Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix function cos_similarity #90

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

RautenbergFrederik
Copy link

If dimension of input is (1,N) or (N,1), cos_similarity was not calculated.

@boeddeker
Copy link
Member

boeddeker commented Jul 20, 2023

Could you add an axis argument? I think this is one of the functions that we translated from MATLAB and never changed to numpy style.

Default could be -1. I think no one uses col or row vectors, or at least no one should use col or row vectors, in numpy we use just vectors.

@@ -25,7 +25,10 @@ def cos_distance(a, b):
:param b: vector b (1xN or Nx1 numpy array)
:return: distance (scalar)
"""
return 0.5 * (1 - sum(a * b) / np.sqrt(sum(a ** 2) * sum(b ** 2)))
assert a.shape == b.shape, 'Both vectors must have the same dimension'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert a.shape == b.shape, f'Both vectors must have the same dimension. {a.shape} {b.shape}'

Use cos similarity to calculate the cos distance
@codecov-commenter
Copy link

codecov-commenter commented Jul 20, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (42e0c62) 65.44% compared to head (a2ef5af) 65.44%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@           Coverage Diff           @@
##           master      #90   +/-   ##
=======================================
  Coverage   65.44%   65.44%           
=======================================
  Files          80       80           
  Lines        5394     5394           
=======================================
  Hits         3530     3530           
  Misses       1864     1864           
Impacted Files Coverage Δ
paderbox/math/vector.py 69.23% <0.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@RautenbergFrederik
Copy link
Author

In the cos similarity function, the absolute value is used in the denominator. So for the case [1,0] [-1,0] the similarity would not be -1 but 1, which conflicts with the definition of cos similarity?

@boeddeker
Copy link
Member

There are different definitions for the cosine similarity and the cosine distance.
Depending, what you do, you use a different definition.
When we wrote those two function, we discussed, which implementation we should use and
for some reason, we decided to use the current split. (I forgot the arguments)

When you search for the definition of cosine similarity, you find the definition for real valued numbers without conj and without abs inside, so the old implementation of "cos_distance" was correct, according to this definition.

When you work with complex number, you have to fix the denominator, but the number would still be complex.
As far as I know, the abs is usually used, when you work with complex numbers and cosine dist or similarity.

Since you changed the implementation to use abs, could you give a motivation/use case for abs?

@TCord What do you usually use?

While I am fine with a breaking change to remove support for column vectors (shouldn't cause any trouble), we should be careful with a breaking change that introduces an abs.

@TCord
Copy link
Member

TCord commented Jul 21, 2023

I would vote against making a breaking change, here. We should keep the behavior of the function (value range between 0 and 1, but no usage of an absolute value) identical, and instead add an axis argument and an assertion that the output will be valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants