Skip to content

Conversation

beatlevic
Copy link
Collaborator

Adds the following to the FIM strategy context:

  • recent operations (both from current file and global cross file
  • Use ContextRanking (Jaccard similarity) for smart context selecting

Copy link

changeset-bot bot commented Oct 8, 2025

⚠️ No Changeset found

Latest commit: 2569fef

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@beatlevic beatlevic requested a review from markijbema October 8, 2025 22:35
@markijbema markijbema changed the base branch from main to beatlevic/test-llm-strategies October 9, 2025 07:59
@markijbema
Copy link
Contributor

@beatlevic changed the base so it is easier to review

}

// Analyze and track global operations if we have enough history
if (item.history.length >= 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

* @param operation The operation to add
* @param filepath The file where the operation occurred
*/
private addGlobalOperation(operation: UserAction, filepath: string): void {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to respect kilocodeignore/gitignore, see #2852

* Formula: |A ∩ B| / |A ∪ B|
* Where A and B are sets of symbols from each string
*/
export function jaccardSimilarity(a: string, b: string): number {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like it could be slow for large files

let intersection = 0
for (const symbol of aSet) {
if (bSet.has(symbol)) {
intersection++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

export function jaccardSimilarity(a: string, b: string): number {
const aSet = getSymbolsForSnippet(a)
const bSet = getSymbolsForSnippet(b)
const union = new Set([...aSet, ...bSet]).size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we know the size of the intersection, we know the size of the union right? So this isn't necessary (and probably slow, especially since you're converting to an array in between)


// Get window around cursor for similarity comparison
const position = context.range.start
const windowSize = 500 // characters before and after cursor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the context of an extension/site windowSize seems like a visual thing, maybe characterLookAroundSize or something?

/**
* Deduplicate snippets from the same file by merging overlapping content
*/
export function deduplicateSnippets(snippets: RankedSnippet[]): RankedSnippet[] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these and following methods unused? Especially I think the constraining of amount of syntax is especially important as it is easy to overwhelm small models. We might even need to compress / slice the current file a bit if it is too large

*/
export function fillPromptWithSnippets(
snippets: RankedSnippet[],
maxTokens: number,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably need to use the current file as input for maxTokens

Copy link
Contributor

@markijbema markijbema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this direction, but we have to be more careful of the context windows size, otherwise this will be a regression (because we'll always go over)

Base automatically changed from beatlevic/test-llm-strategies to main October 9, 2025 12:24
@markijbema markijbema marked this pull request as draft October 10, 2025 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants