Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Distribution-based rank fusion in JoinDocuments #7914

Closed
nickprock opened this issue Jun 23, 2024 · 0 comments · Fixed by #7915 or #7972
Closed

Add Distribution-based rank fusion in JoinDocuments #7914

nickprock opened this issue Jun 23, 2024 · 0 comments · Fixed by #7915 or #7972
Labels
2.x Related to Haystack v2.0 community-triage topic:retriever type:feature New feature or request

Comments

@nickprock
Copy link
Contributor

nickprock commented Jun 23, 2024

Is your feature request related to a problem? Please describe.
Add Distribution-based rank fusion in JoinDocuments

Describe the solution you'd like

def _distribution_based_rank_fusion(self, document_lists):
        """
        Merge multiple lists of Documents and assign scores based on Distribution-Based Score Fusion.
        (https://medium.com/plain-simple-software/distribution-based-score-fusion-dbsf-a-new-approach-to-vector-search-ranking-f87c37488b18)

        If a Document is in more than one retriever, the sone with the highest score is used.
        """
        for documents in document_lists:
            scores_list = []

            for doc in documents:
                scores_list.append(doc.score)

            mean_score = sum(scores_list) / len(scores_list)
            std_dev = (
                sum((x - mean_score) ** 2 for x in scores_list) / len(scores_list)
            ) ** 0.5
            min_score = mean_score - 3 * std_dev
            max_score = mean_score + 3 * std_dev

            for doc in documents:
                doc.score = (doc.score - min_score) / (max_score - min_score)

        output = self._concatenate(document_lists=document_lists)

        return output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 community-triage topic:retriever type:feature New feature or request
Projects
None yet
2 participants