Diagnostics interface #2

mwkmwkmwk · 2021-08-31T20:14:05Z

General requirements:

diagnostics are strongly-typed objects that can be inspected, not just string messages
every pass gets, as part of the general context, a "diagnostics sink" virtual object that diagnostics can be emitted into
- in normal operation, this will print the diagnostic to stderr
- in test operation, an alternate sink is used that stores all emitted diagnostics for later comparison to "gold" diagnostic output
- for IDE integration, the diagnostics would be serialized and sent over the language server protocol

Types:

every emitted diagnostic has a named type (eg. verilog_parser.redefined_name, verilog_elaboration.mismatched_types)
diagnostic types are defined at compilation time
every component can provide its own diagnostic types (and thus component name is encoded in diagnostic type name)
every component needs to provide a list of all its diagnostic types, and the driver gathers the lists from all components in use
the diagnostic type list will be used for:
- dumping the list in --help (as part of -W option)
- validating the param to -W
- the --explain help option (displays explanation for a given diagnostic type)

Severity:

every diagnostic has a severity: ignored, info, warning, error, fatal
- error marks the output as broken, but still allows further processing for more diagnostics
- fatal means "abort everything NOW", and should be used very sparingly (classic C language example is "include file not found" — this is usually bad enough to result in an error cascade)
- info (aka remark) is rarely seen in compiler world, but has its use cases
  - explanatory opt-in messages for the user on optimization / mapping decisions (eg. "I mapped this particular memory via RAMB36E1")
  - needed to implement $info in any case
every diagnostic type has a default severity (from the above list)
the default severity of a diagnostic type can be overriden by the user (think -W option to cc)
- ... but most errors and fatals are marked as non-downgradeable
- ... but perhaps not all? one could make a case for a -permissive option that allows some obviously-fixable broken input like wire vs reg mismatch
the severity overrides would be handled by a "diagnostics filter" which sits between the pass and the actual diagnostics sink, constructed by the driver from the -W options
the existence of fatal severity plus the possibility of overriding diagnostic severity means that emitting a diagnostic (through the diagnostics filter) should return a bool saying if we should continue work or not

Type groups:

when enabling/disabling warnings via -W, it often makes sense to operate on whole groups of diagnostic types
- groups of the same diagnostic, but coming from various components (eg. "undefined id" error from the parser vs "undefined id" from the elaborator) — or do we want a common diagnostic type defined in a shared component for these cases?
- groups of similar diagnostics: "mixed positional and named port assignments in module instantiation" and "mixed positional and named parameter assignments in task call" may have merit as separate diagnostic types, but we want a single -Wno-mixed-positional-named option that can affect both
- all the "this is an extension from the standard" diagnostics should belong to one group, so they can be enabled wholesale when one wants to write portable code
- same for future compatibility warnings — "this is perfectly valid Verilog-1995 (which was chosen as the language standard on command line), but means something different in Verilog-2001"
- extreme example from C is -Wall (all reasonable warnings), but this is a can of worms
open question: can a diagnostic type belong to several groups? can this happen in arbitrary configurations, or is the only case belonging to a group and its "subgroup"? if yes, how do we handle conflicting -W overrides for these groups?

Diagnostic contents:

in addition to type and severity, a diagnostic has:
- the diagnostic message (what went wrong, as string, eg. "undefined identifier foo")
- zero or more code spans highlighting the relevant locations in source
- an optional help message (suggestion on how to fix this problem, eg "please write to this reg from only one process")
- for trivially-fixable errors or warnings: a list of machine-applicable changes to the source code that would fix the diagnostic ("eg. remove bytes 123-126 of the source (which happen to contain wire) and insert bytes reg there") — these can be printed to stderr for manual application, or sent to IDE via LSP
code spans included in the diagnostic can be marked as "primary" or "secondary" — a primary span points to the (presumed) location of the problem, secondary spans (if any) point to other areas of code that are of interest (eg. calling a function with wrong argument types will highlight the call site as primary span, highlight the function definition as secondary span)
there should always be at least one primary span showing the error location; the only exception is for errors that are really, truly not tied to any position in the source code itself (eg. "the requested top module foo is not defined")
usually there will be exactly one primary span, but that is not always the case — eg. for "redefined identifier" we highlight all definitions as primary spans because none of them is any more incorrect than the others
every span comes with a descriptive textual message for why we are showing it ("the function f (which was just called with wrong arguments) is defined here")
open question: do we want to be able to attach some diagnostic-dependent extra attributes as well (eg. the relevant identifier for "undefined id" errors)?
- if so, what payload types should we support? can we store a "hierarchical Verilog id" as first-class type in the diagnostic, or should we eg. limit it to strings, or to JSON-serializable objects?

Spans:

a span is, conceptually, a range of bytes in the parser input
this is complicated by three things that make the input non-linear:
- include files
- macro expansion
- `line directives
the include files mean that the parser input is not a file, but a tree of files, where every non-top file conceptually replaces a span of some parent file (containing the `include)
since the preprocessor is independent of actual parsing, this means that a span (of eg. an expression) can begin in some include file, run over its end, continue in the parent file, then continue inside a new include file, then end in the middle of that file — this is, of course, a pain
macro expansion works basically the same as `include, except that it results in artificial "macro expansion" source chunks that are not backed by an actual file — thus we cannot really refer the user to the exact source location (but we can still print the expanded code and highlight the error there)
`line directives can be used to override the presumed line number and file name, usually for the case of pre-processed files; however, they cannot override individual token locations nor the source text that we have available for error reporting, so they by necessity result in a mismatch of the source code that user wrote with what we're highlighting in diagnostics
to handle all of the above, we post-process the spans when printing the diagnostic:
- if a span's beginning and end are in different files (macro expansions count as files here), split it up into several sub-spans by going upwards through the include/macro stack until they meet (this is a pathological case); this will result in one canonical "top file" that generated the problematic span
- ... but don't enter includes or macro expansions that are wholly contained within the span (so if we're highlighting foo(`MIN(a, b)), don't bother entering into MIN expansion)
- unfortunately this needs some care and special-casing for when eg. span begin aligns with macro expansion begin, so we don't needlessly enter into expansions
- after emitting the actual span (or sub-spans) containing the diagnostic, go up the include/macro stack from the "top file" of this diagnostic, and append additional "in a file included from here", "in a macro expansion invoked from here", "expanded from a macro defined here" spans, so that the user has a full backtrace
- if a `line directive was involved, print both the actual filename + line number that we parsed and the "virtual" one from the override

Printing a diagnostic:

... now that's a bikeshedworthy area
what information we want to print is more or less well-defined; how we want to print it is much more complex
clang does the simple thing (a little too simple): prints exactly one line of source per diagnostic, selected by the beginning of the primary span
- span longer than one line? too bad, it's cut off
- a secondary span in a different line than the primary? too bad, it's not visible
- ... but this only applies for errors where the spans are usually expected to be on one line (eg. operator and its arguments); for cases where the extra spans are expected to be elsewhere, clang emits several chained diagnostics of the special "note" severity
long spans (dozens of lines of code) should definitely not be printed in their entirety, no matter what (parser should also avoid highlighting long spans in the first place, but they can be unavoidable with eg. long enough expressions)
there are two ways to approach printing multi-span diagnostics where the spans are nearby:
- just print the spans independently one by one, in the order they were attached to the diagnostic (the simple way, but often results in printing a single line of code several times with different highlights)
- attempt to print a single code snippet with multiple embedded highlights and messages; the spans are effectively printed in source order by necessity (see rustc for an example) — the pretty way, but a lot of work
investigate outsourcing the actual printing to a library (miette?)

Important implementation details:

what exactly is a diagnostic type, at the code level?
- an actual type with fields and virtual methods? sounds like an overkill
- just a piece of static data with name + default severity + explanation? still needs an "emit this particular diagnostic" function to not duplicate the message formatting code at every use
what does a diagnostic type definition look like? how do we make it as simple as possible?
what does emitting a diagnostic look like? how to we make it as simple as possible?
how to easily gather the list of all diagnostic types in a component without writing a TableGen? define them all in a big proc_macro?

The text was updated successfully, but these errors were encountered:

mwkmwkmwk · 2021-09-13T21:15:27Z

First stage done. The following parts have not been implemented yet and are TODOs:

severity override functionality
type groups
automatic fixups
a diagnostic printer (there is a simple WIP one)
better infrastructure for spans across files

And the remaining open question items:

diagnostic extra attributes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diagnostics interface #2

Diagnostics interface #2

mwkmwkmwk commented Aug 31, 2021

mwkmwkmwk commented Sep 13, 2021

Diagnostics interface #2

Diagnostics interface #2

Comments

mwkmwkmwk commented Aug 31, 2021

mwkmwkmwk commented Sep 13, 2021