-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
修复 BM25 算法计算得分超过 1 导致记忆相似度计算错误的问题 #350
Conversation
Warning Rate limit exceeded@Hoshino-Yumetsuki has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 19 minutes and 46 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughThe pull request introduces modifications to the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
packages/long-memory/src/similarity.ts (1)
219-219
: Split the expression for improved readability.
Although the updated IDF formula helps avoid negative or undefined values, consider breaking the expression into multiple lines to make the logic clearer and aid maintainability.A possible change:
-const idf = Math.log((2 - docFreq + epsilon) / (docFreq + epsilon) + 1) +const idf = Math.log( + (2 - docFreq + epsilon) / (docFreq + epsilon) + 1 +)🧰 Tools
🪛 eslint
[error] 219-219: Replace
(2·-·docFreq·+·epsilon)·/·(docFreq·+·epsilon)·+·1
with⏎················(2·-·docFreq·+·epsilon)·/·(docFreq·+·epsilon)·+·1⏎············
(prettier/prettier)
🪛 GitHub Check: CodeFactor
[warning] 219-219: packages/long-memory/src/similarity.ts#L219
Replace(2·-·docFreq·+·epsilon)·/·(docFreq·+·epsilon)·+·1
with⏎················(2·-·docFreq·+·epsilon)·/·(docFreq·+·epsilon)·+·1⏎············
(prettier/prettier)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/long-memory/src/similarity.ts
(1 hunks)
🧰 Additional context used
🪛 eslint
packages/long-memory/src/similarity.ts
[error] 219-219: Replace (2·-·docFreq·+·epsilon)·/·(docFreq·+·epsilon)·+·1
with ⏎················(2·-·docFreq·+·epsilon)·/·(docFreq·+·epsilon)·+·1⏎············
(prettier/prettier)
🪛 GitHub Check: CodeFactor
packages/long-memory/src/similarity.ts
[warning] 219-219: packages/long-memory/src/similarity.ts#L219
Replace (2·-·docFreq·+·epsilon)·/·(docFreq·+·epsilon)·+·1
with ⏎················(2·-·docFreq·+·epsilon)·/·(docFreq·+·epsilon)·+·1⏎············
(prettier/prettier)
🔇 Additional comments (2)
packages/long-memory/src/similarity.ts (2)
224-224
: Confirm BM25 parameter choices.
The denominator follows the standard BM25 formula with length normalization. However, confirm these parameters (k1
, b
, and avgDocLength
) reflect the intended effect for your use case, as any mismatch can skew the results.
229-230
: Validate the new normalization factor’s impact on scores.
Switching from a log-based normalization to doc1Length + doc2Length
likely prevents scores from exceeding 1 but may compress differences in longer texts. Ensure it aligns with your objective of bounding scores without overly diminishing them for large inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
packages/long-memory/src/similarity.ts (1)
220-220
: Fix formatting: Add line break after opening parenthesisThe static analysis tools indicate a formatting issue. Add a line break after the opening parenthesis for consistency with the codebase style.
- const idf = Math.log((2 - docFreq + epsilon) / (docFreq + epsilon) + 1) + const idf = Math.log( + (2 - docFreq + epsilon) / (docFreq + epsilon) + 1 + )🧰 Tools
🪛 eslint
[error] 220-220: Insert
⏎············
(prettier/prettier)
🪛 GitHub Check: CodeFactor
[warning] 220-220: packages/long-memory/src/similarity.ts#L220
Insert⏎············
(prettier/prettier)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/long-memory/src/similarity.ts
(1 hunks)
🧰 Additional context used
🪛 eslint
packages/long-memory/src/similarity.ts
[error] 220-220: Insert ⏎············
(prettier/prettier)
🪛 GitHub Check: CodeFactor
packages/long-memory/src/similarity.ts
[warning] 220-220: packages/long-memory/src/similarity.ts#L220
Insert ⏎············
(prettier/prettier)
🔇 Additional comments (2)
packages/long-memory/src/similarity.ts (2)
230-231
: Approve normalization factor change
The new normalization using doc1Length + doc2Length
is more appropriate than the previous Math.log(2) * doc1Length
because:
- It considers both documents' lengths, making the score more balanced
- It helps prevent scores from exceeding 1, which was the original issue
- It's a standard approach in BM25 variants when comparing document pairs
219-220
: Verify the IDF calculation modification
The addition of + 1
inside the logarithm ensures the IDF score remains positive and properly weighted. However, let's verify this change:
-
When docFreq = 0 (term appears in no other docs):
- New IDF = log((2 - 0 + 0.25)/(0 + 0.25) + 1) ≈ log(9) ≈ 2.2
- This is reasonable for unique terms
-
When docFreq = 1 (term appears in other doc):
- New IDF = log((2 - 1 + 0.25)/(1 + 0.25) + 1) ≈ log(2) ≈ 0.7
- This appropriately reduces the weight
✅ Verification successful
IDF calculation modification is mathematically sound and properly implemented
The IDF modification in the BM25 implementation is correct and well-justified:
-
The formula
log((2 - docFreq + epsilon) / (docFreq + epsilon) + 1)
ensures:- Maximum IDF when term is unique (docFreq = 0)
- Decreasing IDF as term appears in both documents (docFreq = 1)
- Smoothing via epsilon (0.25) prevents division by zero
- Adding 1 inside log ensures positive IDF values
-
The implementation properly integrates with:
- Standard BM25 parameters (k1 = 1.5, b = 0.75)
- Length normalization using average document length
- Final score normalization using combined document lengths
The modification achieves its goal of keeping scores properly bounded while maintaining the relative importance of terms.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for other BM25 implementations in the codebase for comparison
rg -l "BM25|bm25" --type ts
Length of output: 68
Script:
#!/bin/bash
# Let's examine the entire BM25 implementation to understand the context better
rg -A 20 -B 10 "calculateBM25Similarity" packages/long-memory/src/similarity.ts
Length of output: 2091
Script:
#!/bin/bash
# Let's see the rest of the BM25 implementation to understand the complete calculation
rg -A 40 "tokens2.forEach" packages/long-memory/src/similarity.ts
Length of output: 1482
🧰 Tools
🪛 eslint
[error] 220-220: Insert ⏎············
(prettier/prettier)
🪛 GitHub Check: CodeFactor
[warning] 220-220: packages/long-memory/src/similarity.ts#L220
Insert ⏎············
(prettier/prettier)
…d handle empty token cases. The scoring now accounts for both directions (doc1 to doc2 and doc2 to doc1) and returns a normalized average score. Additionally, added a check to return 0 for empty token arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
packages/long-memory/src/similarity.ts (1)
Add test coverage for BM25 similarity calculation
The codebase lacks test coverage for the BM25 similarity calculation implementation. To ensure the reliability of the changes, you should add test cases that cover:
- Empty documents comparison (tokens1.length === 0 || tokens2.length === 0)
- Documents with no common terms
- Documents with all terms in common
- Documents with significantly different lengths
- Edge cases with special characters
- Verification that scores are always normalized between 0 and 1
Example test structure:
describe('calculateSimilarity', () => { it('should return 0 for empty documents', () => { expect(calculateSimilarity('', 'test')).toBe(0); expect(calculateSimilarity('test', '')).toBe(0); }); it('should return 0 for documents with no common terms', () => { expect(calculateSimilarity('abc def', 'xyz pqr')).toBe(0); }); it('should return high similarity for identical documents', () => { expect(calculateSimilarity('test document', 'test document')).toBeCloseTo(1); }); it('should handle documents with different lengths', () => { const score = calculateSimilarity('short text', 'short text with more words'); expect(score).toBeGreaterThan(0); expect(score).toBeLessThan(1); }); });🔗 Analysis chain
Line range hint
196-257
: Verify the fix with comprehensive test casesWhile the implementation looks correct, please ensure it's tested with various edge cases:
- Documents with no common terms
- Documents with all common terms
- Documents with significantly different lengths
- Documents with special characters or non-standard tokens
Let me analyze the test coverage and implementation details to ensure we have proper test cases for the BM25 similarity calculation.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for existing test cases in the codebase echo "Searching for BM25 test cases..." rg -l "calculateBM25Similarity|SimilarityCalculator.*test" --type ts # Look for actual usage patterns echo "Analyzing usage patterns..." rg "calculateSimilarity\(|SimilarityCalculator\.calculate\(" --type ts -A 2Length of output: 931
Script:
#!/bin/bash # Look for test files echo "Searching for test files..." fd "test.*\.(ts|js)" --type f # Check content of test files if they exist echo "Checking test file contents..." rg "describe|it\(" --type ts packages/long-memory/src/__tests__ -A 3 || true # Look for the actual implementation to understand test coverage needs echo "Checking implementation details..." ast-grep --pattern 'export function calculateSimilarity($_) { $$$ }'Length of output: 470
🧰 Tools
🪛 eslint
[error] 227-227: Replace
(2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1
with⏎····················(2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1⏎················
(prettier/prettier)
[error] 229-229: Insert
⏎···················
(prettier/prettier)
[error] 234-234: Insert
⏎···················
(prettier/prettier)
[error] 242-242: Replace
(2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1
with⏎····················(2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1⏎················
(prettier/prettier)
[error] 244-244: Insert
⏎···················
(prettier/prettier)
[error] 249-249: Insert
⏎···················
(prettier/prettier)
🪛 GitHub Check: CodeFactor
[warning] 242-242: packages/long-memory/src/similarity.ts#L242
Replace(2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1
with⏎····················(2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1⏎················
(prettier/prettier)
[warning] 244-244: packages/long-memory/src/similarity.ts#L244
Insert⏎···················
(prettier/prettier)
[warning] 249-249: packages/long-memory/src/similarity.ts#L249
Insert⏎···················
(prettier/prettier)
[warning] 234-234: packages/long-memory/src/similarity.ts#L234
Insert⏎···················
(prettier/prettier)
[warning] 229-229: packages/long-memory/src/similarity.ts#L229
Insert⏎···················
(prettier/prettier)
[warning] 227-227: packages/long-memory/src/similarity.ts#L227
Replace(2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1
with⏎····················(2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1⏎················
(prettier/prettier)
🧹 Nitpick comments (2)
packages/long-memory/src/similarity.ts (2)
223-236
: Consider adding documentation for the IDF formula modificationThe implementation is correct, but it would be helpful to add a comment explaining why the IDF formula was modified with
+ 1
and how the max score calculation helps prevent scores from exceeding 1.// 计算 doc1 -> doc2 的方向 const tf1 = termFreqDoc1.get(term) || 0 const docFreq1 = (termFreqDoc2.get(term) || 0) > 0 ? 1 : 0 if (tf1 > 0) { + // Add 1 to IDF calculation to ensure non-negative scores and proper normalization const idf1 = Math.log((2 - docFreq1 + epsilon) / (docFreq1 + epsilon) + 1)
🧰 Tools
🪛 eslint
[error] 227-227: Replace
(2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1
with⏎····················(2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1⏎················
(prettier/prettier)
[error] 229-229: Insert
⏎···················
(prettier/prettier)
[error] 234-234: Insert
⏎···················
(prettier/prettier)
🪛 GitHub Check: CodeFactor
[warning] 234-234: packages/long-memory/src/similarity.ts#L234
Insert⏎···················
(prettier/prettier)
[warning] 229-229: packages/long-memory/src/similarity.ts#L229
Insert⏎···················
(prettier/prettier)
[warning] 227-227: packages/long-memory/src/similarity.ts#L227
Replace(2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1
with⏎····················(2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1⏎················
(prettier/prettier)
238-250
: Consider extracting the direction calculation into a helper functionThe calculation logic for both directions is identical. Consider refactoring into a helper function to improve maintainability and reduce code duplication.
+ private static calculateDirectionalScore( + sourceFreq: Map<string, number>, + targetFreq: Map<string, number>, + sourceLength: number, + avgLength: number, + k1: number, + b: number, + epsilon: number + ): { score: number; maxScore: number } { + let score = 0; + let maxScore = 0; + const terms = new Set([...sourceFreq.keys(), ...targetFreq.keys()]); + + for (const term of terms) { + const tf = sourceFreq.get(term) || 0; + const docFreq = (targetFreq.get(term) || 0) > 0 ? 1 : 0; + if (tf > 0) { + const idf = Math.log((2 - docFreq + epsilon) / (docFreq + epsilon) + 1); + const numerator = tf * (k1 + 1); + const denominator = tf + k1 * (1 - b + b * (sourceLength / avgLength)); + score += idf * (numerator / denominator); + + const maxTf = Math.max(tf, targetFreq.get(term) || 0); + const maxNumerator = maxTf * (k1 + 1); + const maxDenominator = maxTf + k1 * (1 - b + b * (sourceLength / avgLength)); + maxScore += idf * (maxNumerator / maxDenominator); + } + } + return { score, maxScore }; + }🧰 Tools
🪛 eslint
[error] 242-242: Replace
(2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1
with⏎····················(2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1⏎················
(prettier/prettier)
[error] 244-244: Insert
⏎···················
(prettier/prettier)
[error] 249-249: Insert
⏎···················
(prettier/prettier)
🪛 GitHub Check: CodeFactor
[warning] 242-242: packages/long-memory/src/similarity.ts#L242
Replace(2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1
with⏎····················(2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1⏎················
(prettier/prettier)
[warning] 244-244: packages/long-memory/src/similarity.ts#L244
Insert⏎···················
(prettier/prettier)
[warning] 249-249: packages/long-memory/src/similarity.ts#L249
Insert⏎···················
(prettier/prettier)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/long-memory/src/similarity.ts
(2 hunks)
🧰 Additional context used
🪛 eslint
packages/long-memory/src/similarity.ts
[error] 227-227: Replace (2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1
with ⏎····················(2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1⏎················
(prettier/prettier)
[error] 229-229: Insert ⏎···················
(prettier/prettier)
[error] 234-234: Insert ⏎···················
(prettier/prettier)
[error] 242-242: Replace (2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1
with ⏎····················(2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1⏎················
(prettier/prettier)
[error] 244-244: Insert ⏎···················
(prettier/prettier)
[error] 249-249: Insert ⏎···················
(prettier/prettier)
🪛 GitHub Check: CodeFactor
packages/long-memory/src/similarity.ts
[warning] 242-242: packages/long-memory/src/similarity.ts#L242
Replace (2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1
with ⏎····················(2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1⏎················
(prettier/prettier)
[warning] 244-244: packages/long-memory/src/similarity.ts#L244
Insert ⏎···················
(prettier/prettier)
[warning] 249-249: packages/long-memory/src/similarity.ts#L249
Insert ⏎···················
(prettier/prettier)
[warning] 234-234: packages/long-memory/src/similarity.ts#L234
Insert ⏎···················
(prettier/prettier)
[warning] 229-229: packages/long-memory/src/similarity.ts#L229
Insert ⏎···················
(prettier/prettier)
[warning] 227-227: packages/long-memory/src/similarity.ts#L227
Replace (2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1
with ⏎····················(2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1⏎················
(prettier/prettier)
🔇 Additional comments (3)
packages/long-memory/src/similarity.ts (3)
196-198
: LGTM: Good defensive programming practice
The early return for empty token arrays prevents potential issues in subsequent calculations and correctly handles edge cases.
216-220
: LGTM: Improved bidirectional scoring approach
The introduction of bidirectional scoring with max score tracking is a good solution to handle asymmetric document lengths and ensure proper score normalization.
254-257
: LGTM: Proper score normalization
The normalization and averaging of bidirectional scores effectively addresses the original issue of BM25 scores exceeding 1 while maintaining a symmetric similarity measure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
packages/long-memory/src/similarity.ts (1)
216-257
: LGTM: Improved BM25 scoring with bidirectional calculationThe implementation effectively addresses the score overflow issue through:
- Bidirectional scoring that accounts for asymmetric document lengths
- Proper score normalization in both directions
- Modified IDF calculation that bounds the scores
Consider these readability improvements:
- Add documentation explaining the mathematical reasoning:
private static calculateBM25Similarity(s1: string, s2: string): number { + // BM25 parameters const k1 = 1.5 // 词频饱和参数 const b = 0.75 // 文档长度归一化参数 const epsilon = 0.25 // 平滑因子 + // Bidirectional BM25 scoring: + // 1. Calculate scores in both directions (doc1->doc2 and doc2->doc1) + // 2. Normalize each score using its maximum possible score + // 3. Average the normalized scores for final similarity + // This approach ensures: + // - Scores are bounded between 0 and 1 + // - Asymmetric document lengths are properly handled
- Extract common calculations into helper functions:
private static calculateDirectionalBM25Score( sourceFreq: Map<string, number>, targetFreq: Map<string, number>, sourceLength: number, avgLength: number, k1: number, b: number, epsilon: number ): { score: number; maxScore: number } { let score = 0; let maxScore = 0; for (const [term, tf] of sourceFreq.entries()) { const docFreq = (targetFreq.get(term) || 0) > 0 ? 1 : 0; const idf = Math.log((2 - docFreq + epsilon) / (docFreq + epsilon) + 1); const numerator = tf * (k1 + 1); const denominator = tf + k1 * (1 - b + b * (sourceLength / avgLength)); score += idf * (numerator / denominator); const maxTf = Math.max(tf, targetFreq.get(term) || 0); const maxNumerator = maxTf * (k1 + 1); const maxDenominator = maxTf + k1 * (1 - b + b * (sourceLength / avgLength)); maxScore += idf * (maxNumerator / maxDenominator); } return { score, maxScore }; }🧰 Tools
🪛 eslint
[error] 227-227: Replace
(2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1
with⏎····················(2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1⏎················
(prettier/prettier)
[error] 229-229: Insert
⏎···················
(prettier/prettier)
[error] 234-234: Insert
⏎···················
(prettier/prettier)
[error] 242-242: Replace
(2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1
with⏎····················(2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1⏎················
(prettier/prettier)
[error] 244-244: Insert
⏎···················
(prettier/prettier)
[error] 249-249: Insert
⏎···················
(prettier/prettier)
🪛 GitHub Check: CodeFactor
[warning] 242-242: packages/long-memory/src/similarity.ts#L242
Replace(2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1
with⏎····················(2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1⏎················
(prettier/prettier)
[warning] 244-244: packages/long-memory/src/similarity.ts#L244
Insert⏎···················
(prettier/prettier)
[warning] 249-249: packages/long-memory/src/similarity.ts#L249
Insert⏎···················
(prettier/prettier)
[warning] 234-234: packages/long-memory/src/similarity.ts#L234
Insert⏎···················
(prettier/prettier)
[warning] 229-229: packages/long-memory/src/similarity.ts#L229
Insert⏎···················
(prettier/prettier)
[warning] 227-227: packages/long-memory/src/similarity.ts#L227
Replace(2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1
with⏎····················(2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1⏎················
(prettier/prettier)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/long-memory/src/similarity.ts
(3 hunks)
🧰 Additional context used
🪛 eslint
packages/long-memory/src/similarity.ts
[error] 227-227: Replace (2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1
with ⏎····················(2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1⏎················
(prettier/prettier)
[error] 229-229: Insert ⏎···················
(prettier/prettier)
[error] 234-234: Insert ⏎···················
(prettier/prettier)
[error] 242-242: Replace (2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1
with ⏎····················(2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1⏎················
(prettier/prettier)
[error] 244-244: Insert ⏎···················
(prettier/prettier)
[error] 249-249: Insert ⏎···················
(prettier/prettier)
🪛 GitHub Check: CodeFactor
packages/long-memory/src/similarity.ts
[warning] 242-242: packages/long-memory/src/similarity.ts#L242
Replace (2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1
with ⏎····················(2·-·docFreq2·+·epsilon)·/·(docFreq2·+·epsilon)·+·1⏎················
(prettier/prettier)
[warning] 244-244: packages/long-memory/src/similarity.ts#L244
Insert ⏎···················
(prettier/prettier)
[warning] 249-249: packages/long-memory/src/similarity.ts#L249
Insert ⏎···················
(prettier/prettier)
[warning] 234-234: packages/long-memory/src/similarity.ts#L234
Insert ⏎···················
(prettier/prettier)
[warning] 229-229: packages/long-memory/src/similarity.ts#L229
Insert ⏎···················
(prettier/prettier)
[warning] 227-227: packages/long-memory/src/similarity.ts#L227
Replace (2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1
with ⏎····················(2·-·docFreq1·+·epsilon)·/·(docFreq1·+·epsilon)·+·1⏎················
(prettier/prettier)
🔇 Additional comments (2)
packages/long-memory/src/similarity.ts (2)
8-11
: LGTM: Weight adjustments align with PR objective
The increased weights for cosine (0.35) and BM25 (0.35) similarities, balanced by reduced weights for Levenshtein (0.15) and Jaccard (0.15), appropriately emphasize the more reliable metrics for memory similarity calculation.
196-198
: LGTM: Added defensive check for empty tokens
Good addition of an early return for empty token arrays, preventing potential issues in subsequent calculations and aligning with the mathematical intuition that similarity with empty text should be 0.
Summary by CodeRabbit