Fix #13333: better handling when source file is repeated in compile_commands.json #7117

olabetskyi · 2024-12-18T16:56:15Z

No description provided.

…ommands.json

firewave · 2024-12-18T19:43:24Z

Just no - for various reasons...

we cannot simply compare the file names if we are using projects. We need to consider the whole(!) configuration (which is currently all over the place) as it might be unique to each file. This will lead to issue not being reported
modifying FileSettings during the analysis is just wrong
no tests added at all
this needs to be opt-in (even if the configuration is being considered)

firewave · 2024-12-18T19:57:41Z

modifying FileSettings during the analysis is just wrong

The de-duplication should be done in CmdLineParser::fillSettingsFromArgs() - there is a TODO.

Also we are not yet able to fully de-duplicate the non-project input files (see https://trac.cppcheck.net/ticket/12834) so it seems a bit opportunistic that we can suddenly do it for projects.

firewave · 2024-12-18T20:03:46Z

I wonder if we should generate an information message if a file was de-duplicated (I only added a TODO about ignored files) as it might indicate an invocation or configuration error.

danmar · 2024-12-18T21:03:08Z

We need to consider the whole(!) configuration (which is currently all over the place) as it might be unique to each file.

I am not sure about this. My idea is that we can add a index for each file. either it's a count so all files gets a unique index. Or we just ensure that files with the same filenames gets unique index. I am thinking that a index that is unique for all files will be easier. However the risk might be more artifacts in the build folder.

Just for information; If the build dir is not used an alternative could also be to give each thread a unique id. And mark all dump files etc with <pid>-<thread>. But as that would only work for "no build dir" use case I think it's not the way to go.

modifying FileSettings during the analysis is just wrong

yes I think it's because it is a experimental draft PR right now, it's not meant to be the final implementation.

I agree with you, I also believe the index should be determined earlier.

no tests added at all

yes this will be needed later when it's not experimental anymore

this needs to be opt-in (even if the configuration is being considered)

I am not sure why? the final goal of this work is to distinguish different files properly.

Also we are not yet able to fully de-duplicate the non-project input files (see https://trac.cppcheck.net/ticket/12834) so it seems a bit opportunistic that we can suddenly do it for projects.

I am in favor of fixing that also.. however just for my understanding, it requires some special circumstances like specifying overlapping directories.. right?

I have couple of customers that are running into this and they can only run with -j1 right now on some projects. They do use --project=.. But of course we should make a proper fix here not just something that makes a few customers happy..

danmar · 2024-12-18T21:25:27Z

I am in favor of fixing that also.. however just for my understanding, it requires some special circumstances like specifying overlapping directories.. right?

If a user would run into this problem and we determined that the command line was messed up I'd suggest that the command line was fixed first :-)

firewave · 2024-12-18T21:42:15Z

Okay, I totally misunderstood the intention here. The ticket lines out what the issue is but it seems I somehow didn't read it properly (I do not even understand what I meant with one of my comments). Sorry about that.

I am not sure why? the final goal of this work is to distinguish different files properly.

"final" is the keyword here. Currently we cannot do that (and that will take quite a while until we can) - in the way integrated this (de-duplication)

In the context of avoiding conflicts with the dump/analysis information this might be fine and obviously needs to be done unconditionally. It would rely on the input always being in a fixed order though (which might not be case depending on the generator). I have the feeling something like a hash might be better than an index (maybe even hashes for the file names in general).

That makes me think of some potential issues. What happens if the project changes and files are no longer contained? That previous output is probably not removed - is it still considered for the analysis? And regarding the index - what if entries for overlapping files are being added/removed? In that case the previous indexes would be invalid and all the files need to be analyzed again.

On a side note some context on why this earlier reaction of mine happened:
The problem here is that sometimes a change you initiated is put up and it is being merged within minutes giving nobody even a chance to look at it. That's why I sometimes have to rush in head over heels (if it overlaps with some of my [planned] work) with limited time to get all the information. Others have to wait weeks or months for any feedback at all. So it would be nice if there would be some time for other people to at least have a look.

danmar · 2024-12-20T11:53:59Z

The problem here is that sometimes a change you initiated is put up and it is being merged within minutes giving nobody even a chance to look at it. That's why I sometimes have to rush in head over heels (if it overlaps with some of my [planned] work) with limited time to get all the information. Others have to wait weeks or months for any feedback at all. So it would be nice if there would be some time for other people to at least have a look.

It's true that sometimes I rush changes in quickly. :-(

I will try to improve. Recently I merged changes into the GUI in such way.. and in hindsight I was too quick it wasn't that urgent to get those fixes into Cppcheck.

Others have to wait weeks or months for any feedback at all.

It's not easy to keep up with all the changes by the community; it is good that there is so much work but bad that it's too much for me at times. I think it can become better.

danmar

The hash will remove duplicate file entries right.. but that is not the problem.

if a compile_commands.json says that foo.c is compiled 3 times then we want to analyse it 3 times. I would assume that the preprocessor flags will be different otherwise I don't understand why it should be recompiled.

danmar · 2024-12-20T14:07:47Z

It would rely on the input always being in a fixed order though (which might not be case depending on the generator).

It's not really a problem as long as the order is fixed during the analysis. Let's say the generator will put the files in random order for some reason. first we execute cppcheck and order is entry1,entry2. Next time we execute cppcheck the order is entry2,entry1.

If entry1 and entry2 has the same preprocessor output then cppcheck will reuse the analyzerinfo.. which is fine. if filename matches, preprocessor output matches, then as far as I see analyserinfo matches.
If they don't have the same preprocessor output then cppcheck will recheck the files.. but that only means some wasted cpu no real danger as far as I see.

firewave · 2024-12-20T19:23:22Z

It would rely on the input always being in a fixed order though (which might not be case depending on the generator).

It's not really a problem as long as the order is fixed during the analysis. Let's say the generator will put the files in random order for some reason. first we execute cppcheck and order is entry1,entry2. Next time we execute cppcheck the order is entry2,entry1.
* If entry1 and entry2 has the same preprocessor output then cppcheck will reuse the analyzerinfo.. which is fine. if filename matches, preprocessor output matches, then as far as I see analyserinfo matches.

* If they don't have the same preprocessor output then cppcheck will recheck the files.. but that only means some wasted cpu no real danger as far as I see.

You are probably right. I will still try to write some tests to cover these cases.

Fix #13333: better handling when source file is repeated in compile_c…

619a129

…ommands.json

olabetskyi added 2 commits December 19, 2024 17:37

Rework

2ef1ded

Ci fix

be36e52

danmar reviewed Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #13333: better handling when source file is repeated in compile_commands.json #7117

Fix #13333: better handling when source file is repeated in compile_commands.json #7117

olabetskyi commented Dec 18, 2024

firewave commented Dec 18, 2024

firewave commented Dec 18, 2024

firewave commented Dec 18, 2024

danmar commented Dec 18, 2024 •

edited

Loading

danmar commented Dec 18, 2024

firewave commented Dec 18, 2024

danmar commented Dec 20, 2024

danmar left a comment

danmar commented Dec 20, 2024

firewave commented Dec 20, 2024

Fix #13333: better handling when source file is repeated in compile_commands.json #7117

Are you sure you want to change the base?

Fix #13333: better handling when source file is repeated in compile_commands.json #7117

Conversation

olabetskyi commented Dec 18, 2024

firewave commented Dec 18, 2024

firewave commented Dec 18, 2024

firewave commented Dec 18, 2024

danmar commented Dec 18, 2024 • edited Loading

danmar commented Dec 18, 2024

firewave commented Dec 18, 2024

danmar commented Dec 20, 2024

danmar left a comment

Choose a reason for hiding this comment

danmar commented Dec 20, 2024

firewave commented Dec 20, 2024

danmar commented Dec 18, 2024 •

edited

Loading