Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(codeCompletion): normalize indentation handling for lines with tabs #3782

Merged
merged 3 commits into from
Feb 19, 2025

Conversation

Sma1lboy
Copy link
Collaborator

@Sma1lboy Sma1lboy commented Feb 4, 2025

@Sma1lboy Sma1lboy marked this pull request as ready for review February 5, 2025 03:36
@wsxiaoys wsxiaoys requested a review from icycodes February 5, 2025 04:38
@sboys3
Copy link

sboys3 commented Feb 5, 2025

Thank you for the fix involving tabs. I have previously patched the bundles on my installs to get around the problem, but I suspect this has been affecting many others.

However if my understanding is correct that context.currentLinePrefix is the existing value of the line the cursor is on, I still see some possible problems.

First, if starting from zero indentation the check would not succeed and possibly replace tabs with spaces like it did before. This situation can commonly occur when creating new functions or top level variables. My suggestion would be to also check the beginning of lines in the completion before treating them as spaces.

Secondly, if more than 1 tab is recommended it appears that only a single tab will be removed likely resulting in the completion being off by one tab. Personally, I have had great success with my model of choice, qwen2.5-coder:14b, choosing the correct tab indentation. In many cases the completion's preceding indent is correct although that could be different with smaller models. Adding the additional tab manually requires canceling the completion because tab will accept the completion. Also, the rest of the completion remains unaltered so it will be inconsistent with the first line.

I have not tried on very small models but I suspect that tabs should be easier for models to do reliably, because there is never an in-between, they are distinct characters, and there are less of them. Doing no modification to tabs might be an OK solution.

I do also worry that this makes working on a 3 spaced or other oddly indented things problematic. Although odd indentations somewhat rare they do exists. For example bzip2 is 3 space indented.

@Sma1lboy
Copy link
Collaborator Author

Sma1lboy commented Feb 6, 2025

Hi @sboys3, thank you for your valuable input on this issue. I now realize I overlooked some aspects of tab definition. Through testing, I've found that tabs indeed perform better than spaces in code completion tasks. The introduction of the postprocess method was primarily motivated by the limitations of smaller models (like 3B) in code completion capabilities. To verify this issue, I tested multiple models ranging from 1B to 14B and confirmed the tab-related problems before implementing fixes.

Regarding your second point, I completely agree with your observation. After testing, I will remove the code that deletes the first tab character. I also agree that no modifications should be made when currentPrefixLine is 0.

As for the third point, the current postprocess method is indeed a trade-off for smaller models. We are exploring better ways to handle all indentation cases. For now, adding certain constraints helps smaller models achieve performance more comparable to their more powerful counterparts. My current approach is to fix the existing tab indent issues while maintaining the original version with spaces.

Additionally, I'm considering whether we can completely remove the postprocessing step. I plan to evaluate metrics to assess the actual value of postprocessing. If it proves to have limited value, I may attempt to remove this processing step entirely. Thank you for your help!

…early returns and enhance indentation checks
@sboys3
Copy link

sboys3 commented Feb 6, 2025

This looks good now. Tabby should be back to tabbing correctly when this is released.

@icycodes icycodes merged commit a3e8f6a into main Feb 19, 2025
5 checks passed
@icycodes icycodes deleted the fix-normalize-indention-when-using-tab branch February 19, 2025 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants