Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Clang] Match MSVC handling of duplicate header search paths in Microsoft compatibility modes. #105738

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tahonermann
Copy link
Contributor

@tahonermann tahonermann commented Aug 22, 2024

Clang has historically matched GCC's behavior for header search path order and pruning of duplicate paths. That traditional behavior is to order user search paths before system search paths, to ignore user search paths that duplicate a (later) system search path, and to ignore search paths that duplicate an earlier search path of the same user/system kind. This differs from MSVC and can result in inconsistent header file resolution for #include directives.

MSVC orders header search paths as follows:

  1. Paths specified by the /I and /external:I options are processed in the order that they appear. Paths specified by /I that duplicate a path specified by /external:I are ignored regardless of the order of the options. Paths specified by /I that duplicate a path from a prior /I option are ignored. Paths specified by /external:I that duplicate a path from a later /external:I option are ignored.

  2. Paths specified by /external:env are processed in the order that they appear. Paths that duplicate a path from a /I or /external:I option are ignored regardless of the order of the options. Paths that duplicate a path from a prior /external:env option or an earlier path from the same /external:env option are ignored.

  3. Paths specified by the INCLUDE environment variable are processed in the order they appear. Paths that duplicate a path from a /I, /external:I, or /external:env option are ignored. Paths that duplicate an earlier path in the INCLUDE environment variable are ignored.

  4. Paths specified by the EXTERNAL_INCLUDE environment variable are processed in the order they appear. Paths that duplicate a path from a /I, /external:I, or /external:env option are ignored. Paths that duplicate a path from the INCLUDE environment variable are ignored. Paths that duplicate an earlier path in the `EXTERNAL_INCLUDE environment variable are ignored.

Prior to this change, Clang handled the /external:I and /external:env options and the paths present in the INCLUDE and EXTERNAL_INCLUDE environment variables as though they were specified with the -isystem option. The GCC behavior described above then lead to a command line such as /external:I dir1 /Idir2 having a header search order of dir2 followed by dir1; contrary to MSVC behavior.

This change adds support for the MSVC external path concept for both the clang and clang-cl drivers with the following option syntax. These options match the MSVC behavior described above for both drivers.

clang                    clang-cl
--------------------     -------------------
-iexternal <dir>         /external:I <dir>
-iexternal-env=<ENV>     /external:env:<ENV>

Paths specified by these options are still treated as system paths. That is, whether warnings are issued in header files found via these paths remains subject to use of the -Wsystem-headers and -Wno-system-headers options. In the future, it would make sense to add a separate option that matches the MSVC /external:Wn option to control such warnings.

The MSVC behavior described above implies that (system) paths present in the INCLUDE and EXTERNAL_INCLUDE environment variables do not suppress matching user paths specified via /I. This contrasts with GCC's behavior of suppressing user paths that match a system path regardless of how each is specified. Since the clang-cl driver maps paths from the INCLUDE and EXTERNAL_INCLUDE environment variable to -internal-isystem, matching MSVC behavior requires suppressing that aspect of the GCC behavior. With this change, system paths will no longer suppress user paths when the -fms-compatibility option is explicitly or implicitly enabled. This will affect header search path ordering for options like -isystem when duplicate user paths are present. Should motivation arise for preserving such suppression of user paths when compiling with -fms-compatibility enabled, it would make sense to introduce a new option for the clang-cl driver to map paths in these environment variables to that would match MSVC behavior without impacting other system path options.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Aug 22, 2024
@llvmbot
Copy link
Member

llvmbot commented Aug 22, 2024

@llvm/pr-subscribers-clang-modules
@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Tom Honermann (tahonermann)

Changes

Clang has historically mimicked gcc behavior for header file searching in which user search paths are ordered before system search paths, user search paths that duplicate a (later) system search path are ignored, and search paths that duplicate an earlier search path of the same user/system kind are ignored.

MSVC behavior differs in that user search paths are ordered before system search paths (like gcc), and search paths that duplicate an earlier search path are ignored regardless of user/system kind (similar to gcc, but without the preference for system search paths over duplicate user search paths).

The gcc and MSVC differences are observable for driver invocations that pass, e.g., -Idir1 -isystem dir2 -isystem dir1. The gcc behavior will result in dir2 being searched before dir1 while the MSVC behavior will result in dir1 being searched before dir2.

This patch modifies Clang to match the MSVC behavior for handling of duplicate header search paths when running in Microsoft compatibility mode (e.g., when invoked with -fms-compatibility or with the clang-cl driver).


Full diff: https://github.com/llvm/llvm-project/pull/105738.diff

4 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+14)
  • (modified) clang/lib/Lex/InitHeaderSearch.cpp (+12-9)
  • (added) clang/test/Driver/header-search-duplicates.c (+74)
  • (added) clang/test/Driver/microsoft-header-search-duplicates.c (+90)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 5c156a9c073a9c..d9c96d8902efa9 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -363,6 +363,20 @@ Windows Support
   When `-fms-compatibility-version=18.00` or prior is set on the command line this Microsoft extension is still
   allowed as VS2013 and prior allow it.
 
+- Clang now matches MSVC behavior for handling of duplicate header search paths
+  when running in Microsoft compatibility mode. Historically, Clang has
+  mimicked gcc behavior in which user search paths are ordered before
+  system search paths, user search paths that duplicate a (later) system search
+  path are ignored, and search paths that duplicate an earlier search path of
+  the same user/system kind are ignored. The MSVC behavior is that user search
+  paths are ordered before system search paths (like gcc), and search paths that
+  duplicate an earlier search path are ignored regardless of user/system kind
+  (similar to gcc, but without the preference for system search paths over
+  duplicate user search paths). These differences are observable for driver
+  invocations that pass, e.g., `-Idir1 -isystem dir2 -isystem dir1`. The gcc
+  behavior will search `dir2` before `dir1` and the MSVC behavior will search
+  `dir1` before `dir2`.
+
 LoongArch Support
 ^^^^^^^^^^^^^^^^^
 
diff --git a/clang/lib/Lex/InitHeaderSearch.cpp b/clang/lib/Lex/InitHeaderSearch.cpp
index 2218db15013d92..a4cba469cd7e31 100644
--- a/clang/lib/Lex/InitHeaderSearch.cpp
+++ b/clang/lib/Lex/InitHeaderSearch.cpp
@@ -368,7 +368,8 @@ void InitHeaderSearch::AddDefaultIncludePaths(
 /// If there are duplicate directory entries in the specified search list,
 /// remove the later (dead) ones.  Returns the number of non-system headers
 /// removed, which is used to update NumAngled.
-static unsigned RemoveDuplicates(std::vector<DirectoryLookupInfo> &SearchList,
+static unsigned RemoveDuplicates(const LangOptions &Lang,
+                                 std::vector<DirectoryLookupInfo> &SearchList,
                                  unsigned First, bool Verbose) {
   llvm::SmallPtrSet<const DirectoryEntry *, 8> SeenDirs;
   llvm::SmallPtrSet<const DirectoryEntry *, 8> SeenFrameworkDirs;
@@ -394,14 +395,15 @@ static unsigned RemoveDuplicates(std::vector<DirectoryLookupInfo> &SearchList,
         continue;
     }
 
-    // If we have a normal #include dir/framework/headermap that is shadowed
-    // later in the chain by a system include location, we actually want to
-    // ignore the user's request and drop the user dir... keeping the system
-    // dir.  This is weird, but required to emulate GCC's search path correctly.
+    // When not in MSVC compatibility mode, if we have a normal
+    // #include dir/framework/headermap that is shadowed later in the chain by
+    // a system include location, we actually want to ignore the user's request
+    // and drop the user dir... keeping the system dir.  This is weird, but
+    // required to emulate GCC's search path correctly.
     //
     // Since dupes of system dirs are rare, just rescan to find the original
     // that we're nuking instead of using a DenseMap.
-    if (CurEntry.getDirCharacteristic() != SrcMgr::C_User) {
+    if (!Lang.MSVCCompat && CurEntry.getDirCharacteristic() != SrcMgr::C_User) {
       // Find the dir that this is the same of.
       unsigned FirstDir;
       for (FirstDir = First;; ++FirstDir) {
@@ -484,14 +486,14 @@ void InitHeaderSearch::Realize(const LangOptions &Lang) {
       SearchList.push_back(Include);
 
   // Deduplicate and remember index.
-  RemoveDuplicates(SearchList, 0, Verbose);
+  RemoveDuplicates(Lang, SearchList, 0, Verbose);
   unsigned NumQuoted = SearchList.size();
 
   for (auto &Include : IncludePath)
     if (Include.Group == Angled || Include.Group == IndexHeaderMap)
       SearchList.push_back(Include);
 
-  RemoveDuplicates(SearchList, NumQuoted, Verbose);
+  RemoveDuplicates(Lang, SearchList, NumQuoted, Verbose);
   unsigned NumAngled = SearchList.size();
 
   for (auto &Include : IncludePath)
@@ -510,7 +512,8 @@ void InitHeaderSearch::Realize(const LangOptions &Lang) {
   // Remove duplicates across both the Angled and System directories.  GCC does
   // this and failing to remove duplicates across these two groups breaks
   // #include_next.
-  unsigned NonSystemRemoved = RemoveDuplicates(SearchList, NumQuoted, Verbose);
+  unsigned NonSystemRemoved = RemoveDuplicates(Lang, SearchList, NumQuoted,
+                                               Verbose);
   NumAngled -= NonSystemRemoved;
 
   Headers.SetSearchPaths(extractLookups(SearchList), NumQuoted, NumAngled,
diff --git a/clang/test/Driver/header-search-duplicates.c b/clang/test/Driver/header-search-duplicates.c
new file mode 100644
index 00000000000000..2480d9f24d8200
--- /dev/null
+++ b/clang/test/Driver/header-search-duplicates.c
@@ -0,0 +1,74 @@
+// Test that the gcc-like driver, when not in Microsoft compatibility mode,
+// emulates the gcc behavior of eliding a user header search path when the
+// same path is present as a system header search path.
+// See microsoft-header-search-duplicates.c for Microsoft compatible behavior.
+
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+
+// Test the clang driver to validate that the gcc compatible behavior is enabled
+// by default. The -nostdinc option is used to suppress default search paths to
+// ease testing.
+// RUN: %clang \
+// RUN:     -v -fsyntax-only -nostdinc \
+// RUN:     -I%t/include/w \
+// RUN:     -isystem %t/include/z \
+// RUN:     -I%t/include/x \
+// RUN:     -isystem %t/include/y \
+// RUN:     -isystem %t/include/x \
+// RUN:     -I%t/include/w \
+// RUN:     -isystem %t/include/y \
+// RUN:     -isystem %t/include/z \
+// RUN:     %t/test.c 2>&1 | FileCheck -DPWD=%t %t/test.c
+
+#--- test.c
+#include <a.h>
+#include <b.h>
+#include <c.h>
+
+// The expected behavior is that user search paths are ordered before system
+// search paths, that user search paths that duplicate a (later) system search
+// path are ignored, and that search paths that duplicate an earlier search
+// path of the same user/system kind are ignored.
+// CHECK:      ignoring duplicate directory "[[PWD]]/include/w"
+// CHECK-NEXT: ignoring duplicate directory "[[PWD]]/include/x"
+// CHECK-NEXT:  as it is a non-system directory that duplicates a system directory
+// CHECK-NEXT: ignoring duplicate directory "[[PWD]]/include/y"
+// CHECK-NEXT: ignoring duplicate directory "[[PWD]]/include/z"
+// CHECK:      #include <...> search starts here:
+// CHECK-NEXT: [[PWD]]/include/w
+// CHECK-NEXT: [[PWD]]/include/z
+// CHECK-NEXT: [[PWD]]/include/y
+// CHECK-NEXT: [[PWD]]/include/x
+// CHECK-NEXT: End of search list.
+
+#--- include/w/b.h
+#define INCLUDE_W_B_H
+#include_next <b.h>
+
+#--- include/w/c.h
+#define INCLUDE_W_C_H
+#include_next <c.h>
+
+#--- include/x/a.h
+#if !defined(INCLUDE_Y_A_H)
+#error 'include/y/a.h' should have been included before 'include/x/a.h'!
+#endif
+
+#--- include/x/b.h
+#if !defined(INCLUDE_W_B_H)
+#error 'include/w/b.h' should have been included before 'include/x/b.h'!
+#endif
+
+#--- include/x/c.h
+#if !defined(INCLUDE_Z_C_H)
+#error 'include/z/c.h' should have been included before 'include/x/c.h'!
+#endif
+
+#--- include/y/a.h
+#define INCLUDE_Y_A_H
+#include_next <a.h>
+
+#--- include/z/c.h
+#define INCLUDE_Z_C_H
+#include_next <c.h>
diff --git a/clang/test/Driver/microsoft-header-search-duplicates.c b/clang/test/Driver/microsoft-header-search-duplicates.c
new file mode 100644
index 00000000000000..1b73eeceb37eba
--- /dev/null
+++ b/clang/test/Driver/microsoft-header-search-duplicates.c
@@ -0,0 +1,90 @@
+// Test that the cl-like driver and the gcc-like driver, when in Microsoft
+// compatibility mode, retain user header search paths that are duplicated in
+// system header search paths.
+// See header-search-duplicates.c for gcc compatible behavior.
+
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+
+// Test the clang driver to validate that the Microsoft compatible behavior is
+// enabled when the -fms-compatibility option is used. The -nostdinc option
+// is used to suppress default search paths to ease testing.
+// RUN: %clang \
+// RUN:     -v -fsyntax-only -fms-compatibility -nostdinc \
+// RUN:     -I%t/include/w \
+// RUN:     -isystem %t/include/z \
+// RUN:     -I%t/include/x \
+// RUN:     -isystem %t/include/y \
+// RUN:     -isystem %t/include/x \
+// RUN:     -I%t/include/w \
+// RUN:     -isystem %t/include/y \
+// RUN:     -isystem %t/include/z \
+// RUN:     %t/test.c 2>&1 | FileCheck -DPWD=%t %t/test.c
+
+// Test the clang-cl driver to validate that the Microsoft compatible behavior
+// is enabled by default. The -nobuiltininc option is used instead of -nostdinc
+// or /X because the latter two suppress all system include paths (including
+// those specified by -isystem and -imsvc). The -imsvc option is used because
+// the -nobuiltininc option suppresses search paths specified by -isystem.
+// RUN: %clang_cl \
+// RUN:     -v -fsyntax-only -nobuiltininc \
+// RUN:     -I%t/include/w \
+// RUN:     -imsvc %t/include/z \
+// RUN:     -I%t/include/x \
+// RUN:     -imsvc %t/include/y \
+// RUN:     -imsvc %t/include/x \
+// RUN:     -I%t/include/w \
+// RUN:     -imsvc %t/include/y \
+// RUN:     -imsvc %t/include/z \
+// RUN:     %t/test.c 2>&1 | FileCheck -DPWD=%t %t/test.c
+
+#--- test.c
+#include <a.h>
+#include <b.h>
+#include <c.h>
+
+// The expected behavior is that user search paths are ordered before system
+// search paths and that search paths that duplicate an earlier search path
+// (regardless of user/system concerns) are ignored.
+// CHECK:      ignoring duplicate directory "[[PWD]]/include/w"
+// CHECK-NEXT: ignoring duplicate directory "[[PWD]]/include/x"
+// CHECK-NEXT: ignoring duplicate directory "[[PWD]]/include/y"
+// CHECK-NEXT: ignoring duplicate directory "[[PWD]]/include/z"
+// CHECK-NOT:   as it is a non-system directory that duplicates a system directory
+// CHECK:      #include <...> search starts here:
+// CHECK-NEXT: [[PWD]]/include/w
+// CHECK-NEXT: [[PWD]]/include/x
+// CHECK-NEXT: [[PWD]]/include/z
+// CHECK-NEXT: [[PWD]]/include/y
+// CHECK-NEXT: End of search list.
+
+#--- include/w/b.h
+#define INCLUDE_W_B_H
+#include_next <b.h>
+
+#--- include/w/c.h
+#define INCLUDE_W_C_H
+#include_next <c.h>
+
+#--- include/x/a.h
+#define INCLUDE_X_A_H
+#include_next <a.h>
+
+#--- include/x/b.h
+#if !defined(INCLUDE_W_B_H)
+#error 'include/w/b.h' should have been included before 'include/x/b.h'!
+#endif
+
+#--- include/x/c.h
+#define INCLUDE_X_C_H
+
+#--- include/y/a.h
+#if !defined(INCLUDE_X_A_H)
+#error 'include/x/a.h' should have been included before 'include/y/a.h'!
+#endif
+
+#--- include/z/c.h
+#include_next <c.h>
+#if !defined(INCLUDE_X_C_H)
+#error 'include/x/c.h' should have been included before 'include/z/c.h'!
+#endif

Copy link

github-actions bot commented Aug 22, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@AaronBallman AaronBallman requested a review from MaskRay August 23, 2024 13:35
Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but please allow some time for @rnk or @MaskRay to review before landing

@tahonermann tahonermann force-pushed the thonerma-intel-CMPLRLLVM-57471 branch 2 times, most recently from 6377cc2 to 57192d5 Compare August 24, 2024 15:27
@tahonermann tahonermann marked this pull request as draft August 24, 2024 17:33
@tahonermann tahonermann force-pushed the thonerma-intel-CMPLRLLVM-57471 branch from 57192d5 to 79b2ceb Compare August 25, 2024 19:54
@tahonermann tahonermann marked this pull request as ready for review August 26, 2024 01:30
@tahonermann
Copy link
Contributor Author

It took me a few tries to get the tests tweaked just right for the clang-cl driver to pass on Windows, but the tests now pass and this PR is now ready for review.

@tahonermann tahonermann self-assigned this Aug 27, 2024
@tahonermann tahonermann marked this pull request as draft August 27, 2024 19:54
@tahonermann
Copy link
Contributor Author

Sigh, reverted to draft again. I noticed a vestigial file in both test cases, a misplaced #include_next directive, and on further investigation discovered that there is an additional divergence in the MSVC behavior that the patch doesn't address (it looks like a user include path that is later duplicated as a system include path is promoted to a system include path in place such that its position is preserved relative to other system include paths; but I need to investigate further).

@tahonermann tahonermann force-pushed the thonerma-intel-CMPLRLLVM-57471 branch from f5358ed to 7c8f4ea Compare October 17, 2024 19:43
@zygoloid
Copy link
Collaborator

This choice seems like a question of how the driver's command line is interpreted, rather than a language mode, so I wonder if it makes sense for it to consider -fms-compatibility at all or whether this should just be based on the driver mode. Do you have rationale for this also changing under -fms-compatibility?

@tahonermann
Copy link
Contributor Author

This choice seems like a question of how the driver's command line is interpreted, rather than a language mode, so I wonder if it makes sense for it to consider -fms-compatibility at all or whether this should just be based on the driver mode. Do you have rationale for this also changing under -fms-compatibility?

@zygoloid, that is a fair question. My original intent had been to make the behavior contingent on driver mode. I switched to tying it to -fms-compatibility because other preprocessor behavior is contingent on that option and because that would be a less invasive change. Duplicate search path pruning is currently done by cc1 and I don't think that should be moved to the driver. I could be mistaken, but I don't think driver mode is exposed to cc1 (and that is probably a good thing). We could introduce a new cc1 option to opt-in to Microsoft style search path pruning. I'm not opposed to that.

@zygoloid
Copy link
Collaborator

I've been pondering this. On the one hand, people switching from the cl.exe-compatible driver to the GCC-compatible driver might want the MSVC interpretation of the path flags. On the other hand, people enabling -fms-compatibility to accept code written against MSVC in a different environment might be surprised if other command-line flags behave differently (especially include paths unrelated to those files). On the third hand, this only affects cases where the same path is specified in more than one kind of list, which seems like it's probably rare. On the fourth hand, IIRC the choice of whether header search for a relative path looks in the directory of includers as well as the directory of the current file already depends on -fms-compatibility, and this seems sort of similar. So yeah, I dunno what we should do; it doesn't seem clear to me.

@tahonermann
Copy link
Contributor Author

@zygoloid,

On the fourth hand, IIRC the choice of whether header search for a relative path looks in the directory of includers as well as the directory of the current file already depends on -fms-compatibility, and this seems sort of similar.

I confirmed the search of the include stack directories is controlled by -fms-compatibility. The relevant code is

// MSVC searches the current include stack from top to bottom for
// headers included by quoted include directives.
// See: http://msdn.microsoft.com/en-us/library/36k2cdd4.aspx
if (LangOpts.MSVCCompat && !isAngled) {
for (IncludeStackInfo &ISEntry : llvm::reverse(IncludeMacroStack)) {
if (IsFileLexer(ISEntry))
if ((FileEnt = ISEntry.ThePPLexer->getFileEntry()))
Includers.push_back(std::make_pair(*FileEnt, FileEnt->getDir()));
}
}
}
.

So yeah, I dunno what we should do; it doesn't seem clear to me.

Yup. My intuition has been to follow the existing precedent unless/until motivation to do otherwise becomes apparent.

@rnk
Copy link
Collaborator

rnk commented Oct 21, 2024

I would stick with fms-compatibility being the setting here rather than plumbing the driver mode through. Over the years, many build systems have been adapted to work with clang[++] and cl without going through clang-cl.

@tahonermann
Copy link
Contributor Author

There is some interesting MSVC behavior that I'm not planning to replicate with the changes being done for this issue but that I thought are worth documenting (here at least, perhaps in Clang docs as well).

MSVC documentation for the /external suite of options includes the /external:Wn option (where Wn is one of W0, W1, W2,W3, or W4) to specify the warning level that is used for "external" header search paths; paths specified with the /external:I or /external:env option, the %EXTERNAL_INCLUDE% environment variable, or included with a angle bracket enclosed path (#include <file.h>) when the /external:anglebrackets option is enabled. The warning levels are meaningless to Clang, so Clang currently maps /external:W0 to -Wno-system-headers and the rest to -Wsystem-headers. Clang does not yet implement /external:anglebrackets. This all seems fine.

There are two interesting behaviors that MSVC exhibits related to the /I and %INCLUDE% environment variable.

  1. By default, paths specified by /I or %INCLUDE% are treated as user paths; whether warnings are issued for them is subject to use of one of the warning level options like /Wall and /W4; the /external:Wn option has no effect on them.
  2. However, when a path specified by /external:I, /external:env, or %EXTERNAL_INCLUDE% matches the beginning of a path in a /I or %INCLUDE% path, then the /external:Wn specified warning level is applied to that path. The path match does not have to occur at a path component boundary (unless the external path ends in a path separator). Matched paths are treated like system paths.

In the following example, each header file has code that triggers MSVC warning C4245; a warning that is issued at warning level 4.

> type incbar\a.h
const unsigned int sysinc1 = -1;

> type incbaz\b.h
const unsigned int sysinc2 = -1;

> type incfoo\c.h
const unsigned int extinc = -1;

> type t.cpp
#include <a.h>
#include <b.h>
#include <c.h>

> set INCLUDE=incbar;incbaz;incfoo

# No warnings issued by default.
> cl /nologo /c t.cpp
t.cpp

# Warnings issued with `/W4`.
> cl /nologo /c /W4 t.cpp
t.cpp
incbar\a.h(1): warning C4245: 'initializing': conversion from 'int' to 'unsigned int', signed/unsigned mismatch
incbaz\b.h(1): warning C4245: 'initializing': conversion from 'int' to 'unsigned int', signed/unsigned mismatch
incfoo\c.h(1): warning C4245: 'initializing': conversion from 'int' to 'unsigned int', signed/unsigned mismatch

# No warnings issued with `/external:W4`.
> cl /nologo /c /external:W4 t.cpp
t.cpp

# Warnings issued for all three search directories with `/external:W4` and `/external:I inc`.
# Not shown here, but if a `inc` directory existed, it would also be searched for header files.
> cl /nologo /c /external:W4 /external:I inc t.cpp
t.cpp
incbar\a.h(1): warning C4245: 'initializing': conversion from 'int' to 'unsigned int', signed/unsigned mismatch
incbaz\b.h(1): warning C4245: 'initializing': conversion from 'int' to 'unsigned int', signed/unsigned mismatch
incfoo\c.h(1): warning C4245: 'initializing': conversion from 'int' to 'unsigned int', signed/unsigned mismatch

# Warnings issued for only `incbar/a.h` and `incbaz.h` with `/external:W4` and `/external:I incb`.
# Not shown here, but if a `incb` directory existed, it would also be searched for header files.
> cl /nologo /c /external:W4 /external:I incb t.cpp
t.cpp
incbar\a.h(1): warning C4245: 'initializing': conversion from 'int' to 'unsigned int', signed/unsigned mismatch
incbaz\b.h(1): warning C4245: 'initializing': conversion from 'int' to 'unsigned int', signed/unsigned mismatch

# No warnings issued when the external include path ends with a path separator that doesn't match another path.
# Not shown here, but if a `incb` directory existed, it would be searched for header files.
> cl /nologo /c /external:W4 /external:I incb\ t.cpp
t.cpp

My intent with changes made for this issue is to (continue to) treat all paths specified by /I as user paths and all paths specified by /external:I, /external:env, %INCLUDE%, or %EXTERNAL_INCLUDE% as system paths.

Ideally, I think we would do the following at some point to improve compatibility with MSVC.

  1. Treat paths specified by %INCLUDE% as user paths by default.
  2. After building the header search path in InitHeaderSearch::Realize(), alter the SrcMgr::CharacteristicKind value stored in the Lookup member of DirectoryLookupInfo in the IncludePath member of InitHeaderSearch to indicate a system path for each path that is a prefix match for an external path (where the prefix need not match on a path component boundary).

@tahonermann tahonermann force-pushed the thonerma-intel-CMPLRLLVM-57471 branch from ca7d339 to 346c969 Compare October 30, 2024 14:26
@tahonermann tahonermann marked this pull request as ready for review October 30, 2024 15:27
@tahonermann tahonermann requested a review from nico October 30, 2024 15:28
@tahonermann
Copy link
Contributor Author

This PR is finally ready for review. I would especially appreciate it if @rnk and @nico took a close look. Please add other known stakeholders.

@tahonermann tahonermann force-pushed the thonerma-intel-CMPLRLLVM-57471 branch 2 times, most recently from 14538f8 to a9b1711 Compare November 4, 2024 18:34
@tahonermann tahonermann force-pushed the thonerma-intel-CMPLRLLVM-57471 branch from a9b1711 to 35e3cb1 Compare November 5, 2024 18:14
@AaronBallman
Copy link
Collaborator

My intent with changes made for this issue is to (continue to) treat all paths specified by /I as user paths and all paths specified by /external:I, /external:env, %INCLUDE%, or %EXTERNAL_INCLUDE% as system paths.

I think this option is the least disruptive.

Ideally, I think we would do the following at some point to improve compatibility with MSVC.

I'm not opposed, but I am concerned about the potential to subtly break user code that's relying on our current search path behavior. We may need to find some clever diagnostics for cases where lookup would have previously succeeded or found a different file.

@tahonermann tahonermann force-pushed the thonerma-intel-CMPLRLLVM-57471 branch from 35e3cb1 to d5c7035 Compare November 6, 2024 16:53
@tahonermann
Copy link
Contributor Author

Thanks, @AaronBallman,

Ideally, I think we would do the following at some point to improve compatibility with MSVC.

I'm not opposed, but I am concerned about the potential to subtly break user code that's relying on our current search path behavior. We may need to find some clever diagnostics for cases where lookup would have previously succeeded or found a different file.

The changes I was suggesting in that comment would not change what files are found; it would only affect whether the file was to be treated as a user or system header. That being said, the changes in this PR do affect file resolution, but only in cases where the same path is specified multiple times.

@tahonermann
Copy link
Contributor Author

Friendly ping for additional code review. @rnk, @nico, @zmodem, @majnemer, @zygoloid.

The Windows build is failing spuriously. I've rebased and force pushed multiple times to try to get a clean build, but it keeps failing during git checkout.

@AaronBallman
Copy link
Collaborator

Thanks, @AaronBallman,

Ideally, I think we would do the following at some point to improve compatibility with MSVC.

I'm not opposed, but I am concerned about the potential to subtly break user code that's relying on our current search path behavior. We may need to find some clever diagnostics for cases where lookup would have previously succeeded or found a different file.

The changes I was suggesting in that comment would not change what files are found; it would only affect whether the file was to be treated as a user or system header.

Doesn't that impact whether a header is found via "" or <> syntax? e.g., if something went from a user header to a system header, I thought that meant the search order would then change depending on which include syntax you used?

@tahonermann
Copy link
Contributor Author

Doesn't that impact whether a header is found via "" or <> syntax? e.g., if something went from a user header to a system header, I thought that meant the search order would then change depending on which include syntax you used?

No, those are orthogonal concerns, but there is unfortunate terminology overlap. Options like -isystem nominate system include paths and headers found within those paths are treated as system headers. However, a header found via another include path can still be considered a system header. Consider a header file that contains #pragma GCC system_header; such a header file is considered a system header, but doesn't influence how header file inclusion is resolved.

@AaronBallman
Copy link
Collaborator

Doesn't that impact whether a header is found via "" or <> syntax? e.g., if something went from a user header to a system header, I thought that meant the search order would then change depending on which include syntax you used?

No, those are orthogonal concerns, but there is unfortunate terminology overlap. Options like -isystem nominate system include paths and headers found within those paths are treated as system headers. However, a header found via another include path can still be considered a system header. Consider a header file that contains #pragma GCC system_header; such a header file is considered a system header, but doesn't influence how header file inclusion is resolved.

Okay, thank you! Then this seem good to me.

@tahonermann
Copy link
Contributor Author

Further internal testing has indicated that there will likely be backward compatibility problems in practice with the implemented approach of differentiating behavior based on -fms-compatibility. I'm therefore going to look into adding a new -cc1 option, -fms-header-search, to control this behavior. The new option will only be enabled by default for clang-cl.

Opinions on whether the current -fms-compatibility controlled behavior to search the directories of the include stack for #include "xxx" headers should be switched to this new option is welcome. I tend to think it should be.

@AaronBallman
Copy link
Collaborator

Opinions on whether the current -fms-compatibility controlled behavior to search the directories of the include stack for #include "xxx" headers should be switched to this new option is welcome. I tend to think it should be.

My intuition is that it should be switched to the new option; we want -fms-compatibility to mean "be compatible with MSVC" and that shouldn't require additional flags except in exceptional cases. This doesn't feel like an exceptional case to me.

Copy link
Collaborator

@shafik shafik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick

++It) {
OptSpecifier Opt = [It, Matches]() {
if (Matches(*It, frontend::Angled, true, true))
return OPT_F;
if (Matches(*It, frontend::Angled, false, true))
return OPT_I;
if (Matches(*It, frontend::External, false, true))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be striving to use bugprone-argument-comment for literals like this. It looks like we do so below, it is unfortunate it is not applied consistently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the call is to a local lambda and there are a lot of such calls. Adding the parameter names pushes the line length therefore requiring line wrapping. I don't think it is an improvement in this case, but if you insist, I'll make the change.

@llvmbot llvmbot added the clang:modules C++20 modules and Clang Header Modules label Dec 10, 2024
@tahonermann tahonermann force-pushed the thonerma-intel-CMPLRLLVM-57471 branch from 9749f3b to cd82a68 Compare December 10, 2024 23:20
@tahonermann tahonermann marked this pull request as draft January 7, 2025 02:03
@tahonermann
Copy link
Contributor Author

I've been continuing to work on this, but have yet to get all of our internal tests to pass. I've reverted the PR to a draft pending successful internal testing.

@tahonermann tahonermann force-pushed the thonerma-intel-CMPLRLLVM-57471 branch from e6a3432 to f2b15af Compare January 7, 2025 21:15
@tahonermann tahonermann force-pushed the thonerma-intel-CMPLRLLVM-57471 branch 3 times, most recently from 21b213c to 1c9813e Compare January 18, 2025 04:39
Clang has historically matched GCC's behavior for header search path order
and pruning of duplicate paths. That traditional behavior is to order user
search paths before system search paths, to ignore user search paths that
duplicate a (later) system search path, and to ignore search paths that
duplicate an earlier search path of the same user/system kind. This differs
from MSVC and can result in inconsistent header file resolution for `#include`
directives.

MSVC orders header search paths as follows:
1) Paths specified by the `/I` and `/external:I` options are processed in
   the order that they appear. Paths specified by `/I` that duplicate a path
   specified by `/external:I` are ignored regardless of the order of the
   options. Paths specified by `/I` that duplicate a path from a prior `/I`
   option are ignored. Paths specified by `/external:I` that duplicate a
   path from a later `/external:I` option are ignored.
2) Paths specified by `/external:env` are processed in the order that they
   appear. Paths that duplicate a path from a `/I` or `/external:I` option
   are ignored regardless of the order of the options. Paths that duplicate a
   path from a prior `/external:env` option or an earlier path from the same
   `/external:env` option are ignored.
3) Paths specified by the `INCLUDE` environment variable are processed in
   the order they appear. Paths that duplicate a path from a `/I`,
   `/external:I`, or `/external:env` option are ignored. Paths that
   duplicate an earlier path in the `INCLUDE` environment variable are
   ignored.
4) Paths specified by the `EXTERNAL_INCLUDE` environment variable are
   processed in the order they appear. Paths that duplicate a path from a
   `/I`, `/external:I`, or `/external:env` option are ignored. Paths that
   duplicate a path from the `INCLUDE` environment variable are ignored.
   Paths that duplicate an earlier path in the `EXTERNAL_INCLUDE
   environment variable are ignored.

Prior to this change, Clang handled the MSVC `/external:I` and `/external:env`
options and the paths present in the `INCLUDE` and `EXTERNAL_INCLUDE`
environment variables as though they were specified with the `-isystem` option.
The GCC behavior described above then lead to a command line such as
`/external:I dir1 /Idir2` having a header search order of `dir2` followed by
`dir1`; contrary to MSVC behavior.

Paths specified by the MSVC `/external:I` or `/external:env` options or by the
`EXTERNAL_INCLUDE` environment variable are not just used to nominate a header
search path. MSVC also uses these paths as external directory prefixes to mark
paths that are to be treated as system directories. When a header file is
resolved to a path for which one of these paths is a prefix match, that header
file is treated as a system header regardless of whether the header file was
resolved against a `/I` specified path. Note that it is not necessary for the
final path component of the external path to match a directory in the filesystem
in order to be used as an external directory prefix, though trailing path
separators are significant. For example:

   Include directive         Command line         System header
   ------------------------  ------------------   -------------
   #include <foobar/file.h>  /I. /external:Ifoo   Yes
   #include <foobar/file.h>  /I. /external:Ifoo\  No

This change adds support for the MSVC external path concept through the addition
of new driver options with the following option syntax.
   clang                     clang-cl
   ------------------------  -------------------
   -iexternal <dir>          /external:I <dir>
   -iexternal-env=<ENV>      /external:env:<ENV>

Paths specified by these options are treated as system paths. That is, whether
warnings are issued in header files found via these paths remains subject to
use of the `-Wsystem-headers` and `-Wno-system-headers` options. Note that the
MSVC `/external:W<N>` options are mapped to these options.

The MSVC behavior described above implies that (system) paths present in the
`INCLUDE` and `EXTERNAL_INCLUDE` environment variables do not suppress matching
user paths specified via `/I`. This contrasts with GCC's behavior, and Clang's
historical behavior, of suppressing user paths that match a system path
regardless of how each is specified. In order to support both behaviors, the
following driver option has been added. The option arguments shown reflect the
default behavior for each driver.

   clang                     clang-cl
   ------------------------  -------------------------
   -fheader-search=gcc       -fheader-search=microsoft

Use of the MSVC compatible header search path order by default for `clang-cl`
is a change in behavior with potential to cause problems for some projects that
build with `clang-cl`. Potentially impacted projects include those that specify
a header search path via either the `/I` option or in the `CPATH` environment
variable, but rely on the path being suppressed and/or treated as a system
header due to a duplicate path specified by another MSVC option or environment
variable or by the `-imsvc` option or one of the `-isystem` family of options.
Such projects can pass the `-fheader-search=gcc` option in their `clang-cl`
invocations to (mostly) restore previous behavior.

Clang emulates the MSVC behavior of resolving quoted header file inclusions
(e.g., `#include "file.h"`) by searching for a matching file in the directories
for the including files in the include stack. See Microsoft's documentation of
this feature in the Microsoft-specific section of
https://learn.microsoft.com/en-us/cpp/preprocessor/hash-include-directive-c-cpp.
Previously, this behavior was implicitly enabled when the `-fms-compatibility`
option is specified (implicitly for Windows targets). This change ties this
behavior to the `-fheader-search=microsoft` option instead. Projects that use
the `clang` (not `clang-cl`) driver to build code targeting Windows that depend
on this local header file lookup may be impacted by this change. Such projects
can try adding `-fheader-search=microsoft` to their `clang` invocations to
restore the prior behavior.

There is at least one behavior that MSVC exhibits that is not currently
replicated with these changes. Clang treats paths specified in the `INCLUDE`
environment variable as system directories; MSVC does not. Clang will therefore
suppress warnings in header files found via these paths by default while MSVC
will not. However, recent MSVC releases set the `EXTERNAL_INCLUDE` environment
variable to have the same paths as `INCLUDE` by default, so, for recent
releases, MSVC appears to suppress warnings for header files found via
`INCLUDE` by default when it does not. Note that many Microsoft provided header
files use `#pragma` directives to suppress warnings.

Finally, this change enables support for the MSVC `/external:anglebrackets`
option with the following option syntax. Use of this option results in all
header files included with angle bracket syntax (`#include <file.h>`) being
treated as system headers for the purposes of warning suppression.
   clang                     clang-cl
   ------------------------  -----------------------
   -iexternal-anglebrackets  /external:anglebrackets
@tahonermann tahonermann force-pushed the thonerma-intel-CMPLRLLVM-57471 branch from 1c9813e to c5d4a8b Compare January 19, 2025 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants