Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using git diff for autocomplete results in completions including diff lines and -/+ prefixes #3094

Open
3 tasks done
tienshiao opened this issue Nov 26, 2024 · 3 comments
Open
3 tasks done
Assignees
Labels
area:autocomplete Relates to the auto complete feature kind:bug Indicates an unexpected problem or unintended behavior priority:high Indicates high priority

Comments

@tienshiao
Copy link

Before submitting your bug report

Relevant environment info

- OS: macOS
- Continue version: v0.9.?
- IDE version:
- Model: Qwen2.5-Coder-7B-Instruct
- config.json:
  
  "tabAutocompleteModel": {
    "title": "Qwen2.5-Coder-7B-Instruct",
    "provider": "openai",
    "model": "Qwen2.5-Coder-7B-Instruct",
    "apiBase": "http://localhost:8081/"
  },
  "tabAutocompleteOptions": {
    // "maxDiffPercentage": 0, // <- setting to 0 disables git diff
    "maxPromptTokens": 2048,
    "template": "<|fim_prefix|>{{{ prefix }}}<|fim_suffix|>{{{ suffix }}}<|fim_middle|>"
  },

Description

While I think using the git diff to provide additional context is useful (#2999), sometimes it results in the LLM returning completions that include the diff preambles and line prefixes -/+.

ie suggestions include:

diff --git a/internal/a/a..go b/internal/a/a.go

- change 1
+ change 2

I can disable it by setting tabAutocompleteOptions.maxDiffPercentage to 0 but longer term it would be nice to have that additional context in the prompt.

I think competitors include the additional context in a comment block in the prefix. I assume the main difficulty in that is determining how to wrap multiline comments for whatever programming language is in use.

To reproduce

No response

Log output

No response

@dosubot dosubot bot added area:autocomplete Relates to the auto complete feature kind:bug Indicates an unexpected problem or unintended behavior labels Nov 26, 2024
@tomasz-stefaniak
Copy link
Collaborator

I think competitors include the additional context in a comment block in the prefix. I assume the main difficulty in that is determining how to wrap multiline comments for whatever programming language is in use.

@tienshiao we actually have that commenting feature, it's not currently enabled for git diff: https://github.com/continuedev/continue/blob/main/core/autocomplete/templating/formatting.ts#L91

We honestly need to test it a bit more and consider all edge cases. It works well with Codestral, which seems to have a very good grasp of how to work with git diff but I've also seen qwen give incorrect completions.

I'll test it locally to see if wrapping the diff in a comment would help or if we might need to do something else. For example, we could designate diff --git as a stop token for certain models.

@tomasz-stefaniak tomasz-stefaniak added the priority:high Indicates high priority label Nov 27, 2024
@tienshiao
Copy link
Author

Yeah, I had also tried injecting \ndiff --git, \n-, and \n+ as stop tokens and that seemed to work as well.

I suspect a point of confusion for the completion is that the prefix starts with the diff preamble, potentially contains multiple diffs, and then the actual code prefix for the file being edited but the only boundary between the diffs and the code is a comment with the filename // dir/a.go.

Qwen has additional tokens like <|file_sep|>: https://github.com/QwenLM/Qwen2.5-Coder?tab=readme-ov-file#4-repository-level-code-completion
But I'm not sure if it works with FIM or would be appropriate here with diffs.

@tomasz-stefaniak
Copy link
Collaborator

tomasz-stefaniak commented Nov 27, 2024

Qwen has additional tokens like <|file_sep|>: https://github.com/QwenLM/Qwen2.5-Coder?tab=readme-ov-file#4-repository-level-code-completion

I checked out the docs and we definitely should be doing this. We already do something similar for Codestral with the +++++ separator: https://github.com/continuedev/continue/blob/main/core/autocomplete/templating/AutocompleteTemplate.ts#L84

I'll keep this at the back of my head and will experiment with it when I have a bit of time, unless you feel like experimenting and creating a Qwen template similar to the one we use for Codestral.

In the meantime, I commented out the diff and it seems to fix the issue with git artifact as completions, at least based on my testing so far: #3111

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:autocomplete Relates to the auto complete feature kind:bug Indicates an unexpected problem or unintended behavior priority:high Indicates high priority
Projects
None yet
Development

No branches or pull requests

2 participants