Add the --caret-diagnostic flag #3256

perillo · 2023-12-18T16:26:55Z

When the --caret-diagnostic flag is set, the diagnostic message contains 3 lines:

the first line contains the filename, line number and column number, followed by wrong word, right word and reason
the second line contains the content of the offending line
the third line contains the caret, showing the offending word

This is the format used by modern compilers when reporting a diagnostic message. The color of the caret is bold cyan.

This new format should improve the user experience, compared to the context format.

perillo · 2023-12-18T16:28:13Z

TODO

Add a test and update the README.rst file.

Screenshot

codespell --caret-diagnostic codespell_lib/tests/test_basic.py

peternewman

Looks great, a nice feature addition!

You've sort of covered it a bit with caret versus context, but it might be better to future-proof us with a format option, given they're going to be mutually exclusive (well actually you could have caret with context before/after couldn't you?

It would also be good to add this to our action/problem matcher:
https://github.com/actions/toolkit/blob/main/docs/problem-matchers.md#single-line-matchers
https://github.com/codespell-project/actions-codespell
https://github.com/codespell-project/codespell-problem-matcher

perillo · 2023-12-18T20:40:26Z

Looks great, a nice feature addition!

You've sort of covered it a bit with caret versus context, but it might be better to future-proof us with a format option, given they're going to be mutually exclusive (well actually you could have caret with context before/after couldn't you?

It would also be good to add this to our action/problem matcher: https://github.com/actions/toolkit/blob/main/docs/problem-matchers.md#single-line-matchers https://github.com/codespell-project/actions-codespell https://github.com/codespell-project/codespell-problem-matcher

Thanks!

I named the new flag from https://clang.llvm.org/docs/UsersManual.html#opt-fcaret-diagnostics (and I suspect clang invented this format).

About adding the caret to the context format, I'm not sure it will be a good user experience. Additionally, this can be considered as a breaking change in the user interface. And what about adding the column number?

From using codespell on the Zig project, I think that adding additional context lines may not help. The proposed format always allowed me to check if the mispelling was not a false positive. And this format is the one that I always see when compiling code.

perillo · 2023-12-18T21:00:37Z

A relatively simple solution to improve the context format is to display only one line with filename, line, column and offending line. Then you mark inline the wrong word and right word like it is done with the del and ins tag in HTML.

When -C/-A/-B is selected, additional lines can be added, still making the context readable (this is, IMHO, not true when using a caret).

peternewman · 2023-12-18T22:53:42Z

I named the new flag from https://clang.llvm.org/docs/UsersManual.html#opt-fcaret-diagnostics (and I suspect clang invented this format).

Yeah sorry, I wasn't meaning the name as such, but say I invented an output format which say shows before and after in diff style (or the bit I've actually seen elsewhere but I can't now find, where the tool outputs in the default GitHub Actions annotation format natively, possibly this https://www.npmjs.com/package/eslint-formatter-github ), then it can't be both caret and that. Whereas if we had say --format caret-diagnostics then we're more future-proofed, as I can add --format diff. If they're mutually exclusive, it would make sense if they aren't dedicated options. We could try and make --context legacy.

About adding the caret to the context format, I'm not sure it will be a good user experience. Additionally, this can be considered as a breaking change in the user interface. And what about adding the column number?

From using codespell on the Zig project, I think that adding additional context lines may not help. The proposed format always allowed me to check if the mispelling was not a false positive. And this format is the one that I always see when compiling code.

Although it's called codespell, it works equally well with standard prose too. Or for a long error message wrapped across lines, having some context of the following/leading words, might help work out which you meant.

You could render the caret with context like so:

foo.txt:2:20: lne==>line, lone
the first line contains the filename, line number and column number, followed by wrong word, right word and reason
the second lne contains the content of the offending line
           ^
the third line contains the caret, showing the offending word

I'm not expecting you to implement this, just pointing out they aren't necessarily mutually exclusive, or more accurately the -A/-B options shouldn't necessarily be, I suspect context/caret ought to be maybe.
Does it work in interactive mode too currently, or should that also throw an error?

A relatively simple solution to improve the context format is to display only one line with filename, line, column and offending line. Then you mark inline the wrong word and right word like it is done with the del and ins tag in HTML.

Terminal strikethrough support doesn't seem great, but I get your point. Also how do you manage it with multiple suggestions. Again, I'm not suggesting you have to solve this, more that maybe the argument usage should be tweaked slightly.

perillo · 2023-12-19T09:17:33Z

@peternewman After some thinking I now agree with you that adding a --format flag may be the best choice.
Currently in the code there are different formats and it is very confusing:

                if (not context_shown) and (context is not None):
                    print_context(lines, i, context)
                if filename != "-":
                    if options.caret_diagnostic:
                        print(
                            f"{cfilename}:{cline}:{ccolumn}: {cwrongword} "
                            f"==> {crightword}{creason}"
                        )
                        print(f"{line}", end="")
                        print("{:>{width}}{}".format("", ccaret, width=column))
                    else:
                        print(
                            f"{cfilename}:{cline}: {cwrongword} "
                            f"==> {crightword}{creason}"
                        )
                elif options.stdin_single_line:
                    print(f"{cline}: {cwrongword} ==> {crightword}{creason}")
                else:
                    print(
                        f"{cline}: {line.strip()}\n\t{cwrongword} "
                        f"==> {crightword}{creason}"
                    )

These are some issues I found:

Compatibility

Adding the --format option means that this PR MUST implement all the necessary changes so that all the current flags works correctly, without breaking the compatibility.

I'm willing to try, but this probably needs a official proposal/RFC, since it is not a trivial change.

stdin-single-line

The new stdin-single-line flag should probably be removed before the new release.

Don't use `strip` when printing the context line

Stripping the context line with strip will broke the caret position.
In this PR I decided to not rstrip the line, assuming that there is always a newline at the end of the file, but using rstrip should be safe.

perillo · 2023-12-20T12:13:28Z

There is a bug in the current code, since when tabs are used, the caret position is incorrect.

There are two solutions that I tested:

Expand tabs in all lines if `--caret-diagnostic` is set

     for i, line in enumerate(lines):
         if line.rstrip() in exclude_lines:
             continue
+        if options.caret_diagnostic:
+            # Expand tabs in order to show the caret correctly.
+            line = line.replace("\t", "    ")

Expand tags only when necessary

                 if filename != "-":
                     if options.caret_diagnostic:
+                        ntabs = line[:column].count("\t")
+                        line = line.replace("\t", "    ")
+                        column = column + ntabs * 3

The latter should have better performances.

When the --caret-diagnostic flag is set, the diagnostic message contains 3 lines: - the first line contains the filename, line number and column number, followed by wrong word, right word and reason - the second line contains the content of the offending line - the third line contains the caret, showing the offending word This is the format used by modern compilers when reporting a diagnostic message. The color of the caret is bold cyan. This new format should improve the user experience, compared to the context format.

perillo · 2023-12-21T08:06:22Z

Fixed a bug when using tabs instead of spaces, where the caret pointed at the wrong location.

perillo requested review from larsoner and peternewman as code owners December 18, 2023 16:26

peternewman reviewed Dec 18, 2023

View reviewed changes

DimitriPapadopoulos added the enhancement label Dec 18, 2023

perillo mentioned this pull request Dec 20, 2023

RFC: improve diagnostic formatting #3258

Open

perillo force-pushed the add-caret-diagnostic branch from 7433d7b to 2ec7728 Compare December 21, 2023 08:02

perillo force-pushed the add-caret-diagnostic branch from 2ec7728 to 3097ec3 Compare December 21, 2023 08:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the --caret-diagnostic flag #3256

Add the --caret-diagnostic flag #3256

perillo commented Dec 18, 2023

perillo commented Dec 18, 2023 •

edited

Loading

peternewman left a comment

perillo commented Dec 18, 2023 •

edited

Loading

perillo commented Dec 18, 2023 •

edited

Loading

peternewman commented Dec 18, 2023 •

edited

Loading

perillo commented Dec 19, 2023

perillo commented Dec 20, 2023

perillo commented Dec 21, 2023

Add the --caret-diagnostic flag #3256

Are you sure you want to change the base?

Add the --caret-diagnostic flag #3256

Conversation

perillo commented Dec 18, 2023

perillo commented Dec 18, 2023 • edited Loading

TODO

Screenshot

peternewman left a comment

Choose a reason for hiding this comment

perillo commented Dec 18, 2023 • edited Loading

perillo commented Dec 18, 2023 • edited Loading

peternewman commented Dec 18, 2023 • edited Loading

perillo commented Dec 19, 2023

Compatibility

stdin-single-line

Don't use strip when printing the context line

perillo commented Dec 20, 2023

Expand tabs in all lines if --caret-diagnostic is set

Expand tags only when necessary

perillo commented Dec 21, 2023

perillo commented Dec 18, 2023 •

edited

Loading

perillo commented Dec 18, 2023 •

edited

Loading

perillo commented Dec 18, 2023 •

edited

Loading

peternewman commented Dec 18, 2023 •

edited

Loading

Don't use `strip` when printing the context line

Expand tabs in all lines if `--caret-diagnostic` is set