Skip to content

fix #13333: better handling when source file is repeated in compile_commands.json #7469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

ludviggunne
Copy link
Contributor

No description provided.

@firewave
Copy link
Collaborator

We should have opt-in/opt-out CLI options so this behavior can be overridden. And it should probably be disabled by default (I feel like there was already a discussion where I was flip-flopping on this).

We should also somehow inform the user when a file is omitted because it is a duplicate (within --check-config?). This is something I need to do in a different scope but haven't gotten to it yet.

@firewave
Copy link
Collaborator

We should also somehow inform the user when a file is omitted because it is a duplicate (within --check-config?). This is something I need to do in a different scope but haven't gotten to it yet.

Sorry, I overlooked that there is already a message.

And some Python tests for this need to be added.

@firewave
Copy link
Collaborator

Something for a future improvement.

Some of the settings could be sanitized so they are not different if the order is different. Please do not do this for the defines because they need to be reworked from the bottom up first (simplecpp et al) and that will probably implicitly fix that.

@ludviggunne
Copy link
Contributor Author

Some of the settings could be sanitized so they are not different if the order is different. Please do not do this for the defines because they need to be reworked from the bottom up first (simplecpp et al) and that will probably implicitly fix that.

The hash should be dependent of the order of defines/undefs now.

@ludviggunne
Copy link
Contributor Author

We should have opt-in/opt-out CLI options so this behavior can be overridden. And it should probably be disabled by default (I feel like there was already a discussion where I was flip-flopping on this).

What's a scenario where you'd want it to be disabled?

@firewave
Copy link
Collaborator

What's a scenario where you'd want it to be disabled?

Where it omits files from the analysis it should not (i.e. bugs). The existing handling of duplicates in other areas has been lacking so there should be a way to disable it. There will be more stuff added to FileSettings in the future.

@ludviggunne
Copy link
Contributor Author

So a flag to disable all de-duplication?

@firewave
Copy link
Collaborator

So a flag to disable all de-duplication?

Just for this added logic as it many parts which make up the hash.

The existing de-duplication is based on the (real) path being unique so that will only miss de-duplication but not accidentally omit files it should not because it is just a single value to compare.

while (it != fileSettings.end()) {
fileSettings.erase(std::remove_if(std::next(it), fileSettings.end(), [&](const FileSettings& fs) {
if (fs.hash == it->hash) {
mLogger.printMessage("removing duplicate file " + fs.filename());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "removing" sounds pretty aggressive. I think "skipping duplicated file entry " or something like that may be better.

Copy link
Owner

@danmar danmar Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this should be written unconditionally. It can mess up plugins because the format is not expected.
It does not affect the output so personally I don't understand why we print it out at all. I guess you wanted this @firewave explain why?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this should be written unconditionally.

It should not (I even suggested it to be put into --check-config - we point out that you should implement such a step in the CI integration guide). BUT as this changes behavior which has been in place for a very long (possible since the import was introduced) and might potentially lead to files wrongly being skipped (given our track record I do not trust any first version of "breaking" changes), it should not be something which happens silently (especially given this is about security).

The messages can be fixed by the user by adjusting the command-line (possibly easy) or the project file (most likely not easy). But having an option to opt-out would also mitigate it.

So I am torn...

That also makes me think if we should provide more extensive filtering of projects as you currently only match the file name.

It can mess up plugins because the format is not expected.

Well - we have been bringing this up a lot but also introduced lots of output which would have broken this in the same time and strangely never gotten a report about that.

I personally had to deal with those in the CLion IDE integration though. So it is actually valid.

This will all be better when these messages are turned into actual findings (but I keep finding other broken stuff).

@@ -635,6 +651,9 @@ CmdLineParser::Result CmdLineParser::parseFromArgs(int argc, const char* const a
else if (std::strcmp(argv[i], "--debug-clang-output") == 0)
mSettings.debugClangOutput = true;

else if (std::strcmp(argv[i], "--deduplicate") == 0)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this word "deduplicate" sounds weird. It sounds generic but it only works for imported projects so how how about : --import-project-recheck-duplicate-entries. And can we check that it's only used in combination with --project because otherwise the user is confused?
I would say it should be on by default if there are only downsides and no upsides to turn it off..
@firewave why would anybody turn it off?
we could run cppcheck on all files twice but we don't and there is no flag to turn that on/off.. I don't see why we would do it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this word "deduplicate" sounds weird.

Something along the lines of --exclude-duplicated-project-files. As it only applies to FileSettings it has to mention projects. And if the project does not support it we should bail out.

But this is not just about duplicated files but duplicated "file + configuration" entries. We are currently only allowing input files or a single project to import. After FileWithDetails and FileSettings have been merged and the imports have been reworked a bit I think we could possibly support multiple different inputs. So then having "project" in the name would be misleading.

I am probably overthinking this but as we are not very good with deprecating stuff we should look a bit ahead to avoid having stuff we need to support as legacy down the line.

@firewave why would anybody turn it off?
we could run cppcheck on all files twice but we don't and there is no flag to turn that on/off.. I don't see why we would do it.

#7469 (comment)

@ludviggunne ludviggunne force-pushed the 13333 branch 2 times, most recently from 9772903 to ed50f3b Compare April 22, 2025 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants