Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dredd mutates identical source files differently in two directories with same compilation database #338

Open
JonathanFoo0523 opened this issue Sep 19, 2024 · 8 comments

Comments

@JonathanFoo0523
Copy link
Collaborator

JonathanFoo0523 commented Sep 19, 2024

I encountered a problem where Dredd mutates the same source file differently in two directories with the same content but different directory names. The compilation database is generated in the same way for both directories and should be identical, except for references to the directory name. Dredd is applied in default mode (not using mutant tracking) to the same source file from the two different directories, and each is supplied with its own compilation database (which should only differ in the directory name references).

The reduced program (excluding several headers) is shown below. When applying Dredd to this code:

void a () {
  if (1) HOST_WIDE_INT_M1U;
}

the if-stmt is mutated differently in the two directories:

if (!__dredd_enabled_mutation(6)) { if (__dredd_replace_expr_bool_true(1, 0)) __dredd_replace_expr_unsigned_long_constant(__dredd_replace_expr_unsigned_long_one(HOST_WIDE_INT_M1U, 1), 3); }

compared to

if (!__dredd_enabled_mutation(3)) { if (__dredd_replace_expr_bool_true(1, 0)) __dredd_replace_expr_unsigned_long_one(HOST_WIDE_INT_M1U, 1); }

This is strange because the AST generated by Dredd when mutating the source file is identical, aside from the directory name.

`-FunctionDecl 0x5566ffe12e98 <../../gcc-14.1.0-mutated/gcc/tree-ssa-strlen.cc:12:1, line:14:1> line:12:6 a 'void ()'
  `-CompoundStmt 0x5566ffe12fd0 <col:11, line:14:1>
    `-IfStmt 0x5566ffe12fb0 <line:13:3, <scratch space>:188:1>
      |-ImplicitCastExpr 0x5566ffe12f60 <../../gcc-14.1.0-mutated/gcc/tree-ssa-strlen.cc:13:7> 'bool' <IntegralToBoolean>
      | `-IntegerLiteral 0x5566ffe12f40 <col:7> 'int' 1
      `-UnaryOperator 0x5566ffe12f98 <../../gcc-14.1.0-mutated/gcc/hwint.h:72:45, <scratch space>:188:1> 'unsigned long' prefix '-'
        `-IntegerLiteral 0x5566ffe12f78 <col:1> 'unsigned long' 1

compared to

`-FunctionDecl 0x557f36add5c8 <../../gcc-14.1.0-mutant-tracking/gcc/tree-ssa-strlen.cc:12:1, line:14:1> line:12:6 a 'void ()'
  `-CompoundStmt 0x557f36add700 <col:11, line:14:1>
    `-IfStmt 0x557f36add6e0 <line:13:3, <scratch space>:5:1>
      |-ImplicitCastExpr 0x557f36add690 <../../gcc-14.1.0-mutant-tracking/gcc/tree-ssa-strlen.cc:13:7> 'bool' <IntegralToBoolean>
      | `-IntegerLiteral 0x557f36add670 <col:7> 'int' 1
      `-UnaryOperator 0x557f36add6c8 <../../gcc-14.1.0-mutant-tracking/gcc/hwint.h:72:45, <scratch space>:5:1> 'unsigned long' prefix '-'
        `-IntegerLiteral 0x557f36add6a8 <col:1> 'unsigned long' 1

Let me know if there's anything else I should try to debug the issue.

@afd
Copy link
Member

afd commented Sep 19, 2024

Here are some ideas:

  • Triple check that you have used the same version of Dredd for both mutation attempts (I'm sure you have, but always good to check!)

  • Run Dredd 100 times in a row with each configuration, saving out the mutated files each time, and confirm that Dredd is behaving deterministically within each configuration. If there's nondeterminism in Dredd that's a problem but it would be good to know.

  • See whether the problem goes away when "HOST_WIDE_INT_M1U" is replaced with its value from a header file. If it does, investigate that further. (My hunch is that it has something to do with this.)

  • Find the compilation command from each compilation database, and then use the compiler's -E option to get the preprocessed file under each compilation command and see whether there are any differences. There might be differences that aren't reflected in the AST but that do affect the in-memory representation that Dredd works on.

@JonathanFoo0523
Copy link
Collaborator Author

I can confirm that the Dredd version is the same, and Dredd behaves deterministically within each configuration. When compiling both files with the -E option under their respective compilation commands, the outputs are identical, except for references to the directory name.

With the -E option, HOST_WIDE_INT_M1U is replaced with -1UL. Replacing HOST_WIDE_INT_M1U with -1UL and applying dredd eliminates the issue. I’m wondering why dredd treats HOST_WIDE_INT_M1U as unsigned_long_one, since it actually represents -1.

HOST_WIDE_INT_M1U is defined from a series of macros

#if INT64_T_IS_LONG   
#   define HOST_WIDE_INT long
#   define HOST_WIDE_INT_C(X) X ## L
#else
# if HOST_BITS_PER_LONGLONG == 64
#   define HOST_WIDE_INT long long
#   define HOST_WIDE_INT_C(X) X ## LL
# else
   #error "Unable to find a suitable type for HOST_WIDE_INT"
# endif
#endif

#define HOST_WIDE_INT_UC(X) HOST_WIDE_INT_C (X ## U)
#define HOST_WIDE_INT_M1U HOST_WIDE_INT_UC (-1)

@afd
Copy link
Member

afd commented Sep 19, 2024

I suggest you try the following:

  • Insert these #ifdefs and #defines into the source file in place of the #include that would normally lead to these definitions.

  • Confirm that the problem persists.

  • If it does, try inserting #error lines into different branches of the #ifdefs.

  • Run Dredd using the two compilation databases and see whether you get the same or different complaints due to the #errors you have added.

This would help to confirm whether anything different is going on in the preprocessor with the two setups.

I am wondering whether we might be up against an issue where strange things happen when we apply Dredd to a project that was compiled with gcc and thus has a gcc-oriented compilation database.

@afd
Copy link
Member

afd commented Sep 19, 2024

Something bad is going on with GetSourceRangeInMainFile.

The HOST_WIDE_INT_M1U define expands to -1UL, which is represented in the AST as a unary minus operator applied to an integer literal with value 1.

The integer literal with value 1 should not have a source range that occurs in the main file, because it starts part way through a macro, not at the beginning of a macro. But GetSourceRangeInMainFile is returning a valid source range, and the valid source range seems to be the whole of "HOST_WIDE_INT_M1U", hence why the mutator for the integer literal with value one is being (wrongly) applied to this.

@JonathanFoo0523
Copy link
Collaborator Author

@afd Am I correct in understanding that there are two problems at play here?

  1. dredd applies mutation of __dredd_replace_expr_unsigned_long_one to HOST_WIDE_INT_M1U when it shouldn't.
  2. There is an inconsistency where some instances of HOST_WIDE_INT_M1U are mutated with__dredd_replace_expr_unsigned_long_constant in some cases but not in others.

I can easily isolate problem 1 and produce a minimal reproducible example.

For problem 2, the inconsistency can occur within the same file, and I am trying to produce a minimal reproducible example that contains two instances of HOST_WIDE_INT_M1U being mutated differently in the same file. This is proving to be nontrivial, as even removing commented-out code can make the problem disappear. creduce shows little progress after running for half a day.

@afd
Copy link
Member

afd commented Sep 20, 2024

Yes, that's right about there being two problems at play - hopefully not interlinked problems!

@JonathanFoo0523
Copy link
Collaborator Author

I will put this on hold for a while.

For future reference, the dredd commit used is 0725037

afd added a commit that referenced this issue Sep 21, 2024
Adjusts the logic for attempting to find the main file source range of
an AST node to take account of "pasted macros", which arise when the
preprocessor operator ## is used. In the presence of pasted macros,
the logic for determining whether an AST node corresponding to a macro
expansion has a meaningful source range in the main file was not
working correctly. This change detects pasted macros and
conservatively refuses to give a main file source range in their
presence.

Fixes #345.
Related issue: #338.
@afd
Copy link
Member

afd commented Sep 21, 2024

I just pushed a change which resolves the issue regarding:

dredd applies mutation of __dredd_replace_expr_unsigned_long_one to HOST_WIDE_INT_M1U when it shouldn't.

Perhaps you could see whether the other issue persists. I have a hunch they may have been related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants