Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the --caret-diagnostic flag #3256

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

perillo
Copy link
Contributor

@perillo perillo commented Dec 18, 2023

When the --caret-diagnostic flag is set, the diagnostic message contains 3 lines:

  • the first line contains the filename, line number and column number, followed by wrong word, right word and reason
  • the second line contains the content of the offending line
  • the third line contains the caret, showing the offending word

This is the format used by modern compilers when reporting a diagnostic message. The color of the caret is bold cyan.

This new format should improve the user experience, compared to the context format.

@perillo
Copy link
Contributor Author

perillo commented Dec 18, 2023

TODO

Add a test and update the README.rst file.

Screenshot

codespell --caret-diagnostic codespell_lib/tests/test_basic.py
screenshot

Copy link
Collaborator

@peternewman peternewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, a nice feature addition!

You've sort of covered it a bit with caret versus context, but it might be better to future-proof us with a format option, given they're going to be mutually exclusive (well actually you could have caret with context before/after couldn't you?

It would also be good to add this to our action/problem matcher:
https://github.com/actions/toolkit/blob/main/docs/problem-matchers.md#single-line-matchers
https://github.com/codespell-project/actions-codespell
https://github.com/codespell-project/codespell-problem-matcher

@perillo
Copy link
Contributor Author

perillo commented Dec 18, 2023

Looks great, a nice feature addition!

You've sort of covered it a bit with caret versus context, but it might be better to future-proof us with a format option, given they're going to be mutually exclusive (well actually you could have caret with context before/after couldn't you?

It would also be good to add this to our action/problem matcher: https://github.com/actions/toolkit/blob/main/docs/problem-matchers.md#single-line-matchers https://github.com/codespell-project/actions-codespell https://github.com/codespell-project/codespell-problem-matcher

Thanks!

I named the new flag from https://clang.llvm.org/docs/UsersManual.html#opt-fcaret-diagnostics (and I suspect clang invented this format).

About adding the caret to the context format, I'm not sure it will be a good user experience. Additionally, this can be considered as a breaking change in the user interface. And what about adding the column number?

From using codespell on the Zig project, I think that adding additional context lines may not help. The proposed format always allowed me to check if the mispelling was not a false positive. And this format is the one that I always see when compiling code.

@perillo
Copy link
Contributor Author

perillo commented Dec 18, 2023

A relatively simple solution to improve the context format is to display only one line with filename, line, column and offending line. Then you mark inline the wrong word and right word like it is done with the del and ins tag in HTML.

When -C/-A/-B is selected, additional lines can be added, still making the context readable (this is, IMHO, not true when using a caret).

@peternewman
Copy link
Collaborator

peternewman commented Dec 18, 2023

I named the new flag from https://clang.llvm.org/docs/UsersManual.html#opt-fcaret-diagnostics (and I suspect clang invented this format).

Yeah sorry, I wasn't meaning the name as such, but say I invented an output format which say shows before and after in diff style (or the bit I've actually seen elsewhere but I can't now find, where the tool outputs in the default GitHub Actions annotation format natively, possibly this https://www.npmjs.com/package/eslint-formatter-github ), then it can't be both caret and that. Whereas if we had say --format caret-diagnostics then we're more future-proofed, as I can add --format diff. If they're mutually exclusive, it would make sense if they aren't dedicated options. We could try and make --context legacy.

About adding the caret to the context format, I'm not sure it will be a good user experience. Additionally, this can be considered as a breaking change in the user interface. And what about adding the column number?

From using codespell on the Zig project, I think that adding additional context lines may not help. The proposed format always allowed me to check if the mispelling was not a false positive. And this format is the one that I always see when compiling code.

Although it's called codespell, it works equally well with standard prose too. Or for a long error message wrapped across lines, having some context of the following/leading words, might help work out which you meant.

You could render the caret with context like so:

foo.txt:2:20: lne==>line, lone
the first line contains the filename, line number and column number, followed by wrong word, right word and reason
the second lne contains the content of the offending line
           ^
the third line contains the caret, showing the offending word

I'm not expecting you to implement this, just pointing out they aren't necessarily mutually exclusive, or more accurately the -A/-B options shouldn't necessarily be, I suspect context/caret ought to be maybe.
Does it work in interactive mode too currently, or should that also throw an error?

A relatively simple solution to improve the context format is to display only one line with filename, line, column and offending line. Then you mark inline the wrong word and right word like it is done with the del and ins tag in HTML.

Terminal strikethrough support doesn't seem great, but I get your point. Also how do you manage it with multiple suggestions. Again, I'm not suggesting you have to solve this, more that maybe the argument usage should be tweaked slightly.

@perillo
Copy link
Contributor Author

perillo commented Dec 19, 2023

@peternewman After some thinking I now agree with you that adding a --format flag may be the best choice.
Currently in the code there are different formats and it is very confusing:

                if (not context_shown) and (context is not None):
                    print_context(lines, i, context)
                if filename != "-":
                    if options.caret_diagnostic:
                        print(
                            f"{cfilename}:{cline}:{ccolumn}: {cwrongword} "
                            f"==> {crightword}{creason}"
                        )
                        print(f"{line}", end="")
                        print("{:>{width}}{}".format("", ccaret, width=column))
                    else:
                        print(
                            f"{cfilename}:{cline}: {cwrongword} "
                            f"==> {crightword}{creason}"
                        )
                elif options.stdin_single_line:
                    print(f"{cline}: {cwrongword} ==> {crightword}{creason}")
                else:
                    print(
                        f"{cline}: {line.strip()}\n\t{cwrongword} "
                        f"==> {crightword}{creason}"
                    )

These are some issues I found:

Compatibility

Adding the --format option means that this PR MUST implement all the necessary changes so that all the current flags works correctly, without breaking the compatibility.

I'm willing to try, but this probably needs a official proposal/RFC, since it is not a trivial change.

stdin-single-line

The new stdin-single-line flag should probably be removed before the new release.

Don't use strip when printing the context line

Stripping the context line with strip will broke the caret position.
In this PR I decided to not rstrip the line, assuming that there is always a newline at the end of the file, but using rstrip should be safe.

@perillo
Copy link
Contributor Author

perillo commented Dec 20, 2023

There is a bug in the current code, since when tabs are used, the caret position is incorrect.

There are two solutions that I tested:

Expand tabs in all lines if --caret-diagnostic is set

     for i, line in enumerate(lines):
         if line.rstrip() in exclude_lines:
             continue
+        if options.caret_diagnostic:
+            # Expand tabs in order to show the caret correctly.
+            line = line.replace("\t", "    ")

Expand tags only when necessary

                 if filename != "-":
                     if options.caret_diagnostic:
+                        ntabs = line[:column].count("\t")
+                        line = line.replace("\t", "    ")
+                        column = column + ntabs * 3

The latter should have better performances.

When the --caret-diagnostic flag is set, the diagnostic message contains
3 lines:
  - the first line contains the filename, line number and column number,
    followed by wrong word, right word and reason
  - the second line contains the content of the offending line
  - the third line contains the caret, showing the offending word

This is the format used by modern compilers when reporting a diagnostic
message.  The color of the caret is bold cyan.

This new format should improve the user experience, compared to the
context format.
@perillo
Copy link
Contributor Author

perillo commented Dec 21, 2023

Fixed a bug when using tabs instead of spaces, where the caret pointed at the wrong location.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants