-
Notifications
You must be signed in to change notification settings - Fork 1.1k
FIM strategy context improvements #2863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ecent operations.
… for FIM strategy
|
@beatlevic changed the base so it is easier to review |
} | ||
|
||
// Analyze and track global operations if we have enough history | ||
if (item.history.length >= 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
* @param operation The operation to add | ||
* @param filepath The file where the operation occurred | ||
*/ | ||
private addGlobalOperation(operation: UserAction, filepath: string): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to respect kilocodeignore/gitignore, see #2852
* Formula: |A ∩ B| / |A ∪ B| | ||
* Where A and B are sets of symbols from each string | ||
*/ | ||
export function jaccardSimilarity(a: string, b: string): number { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels like it could be slow for large files
let intersection = 0 | ||
for (const symbol of aSet) { | ||
if (bSet.has(symbol)) { | ||
intersection++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't use these?
https://viljams.medium.com/new-javascript-set-methods-are-here-b21e9a37bc4b
export function jaccardSimilarity(a: string, b: string): number { | ||
const aSet = getSymbolsForSnippet(a) | ||
const bSet = getSymbolsForSnippet(b) | ||
const union = new Set([...aSet, ...bSet]).size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we know the size of the intersection, we know the size of the union right? So this isn't necessary (and probably slow, especially since you're converting to an array in between)
|
||
// Get window around cursor for similarity comparison | ||
const position = context.range.start | ||
const windowSize = 500 // characters before and after cursor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the context of an extension/site windowSize seems like a visual thing, maybe characterLookAroundSize or something?
/** | ||
* Deduplicate snippets from the same file by merging overlapping content | ||
*/ | ||
export function deduplicateSnippets(snippets: RankedSnippet[]): RankedSnippet[] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these and following methods unused? Especially I think the constraining of amount of syntax is especially important as it is easy to overwhelm small models. We might even need to compress / slice the current file a bit if it is too large
*/ | ||
export function fillPromptWithSnippets( | ||
snippets: RankedSnippet[], | ||
maxTokens: number, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably need to use the current file as input for maxTokens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this direction, but we have to be more careful of the context windows size, otherwise this will be a regression (because we'll always go over)
Adds the following to the FIM strategy context: