Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #13333: better handling when source file is repeated in compile_commands.json #7117

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

olabetskyi
Copy link
Collaborator

No description provided.

@firewave
Copy link
Collaborator

Just no - for various reasons...

  • we cannot simply compare the file names if we are using projects. We need to consider the whole(!) configuration (which is currently all over the place) as it might be unique to each file. This will lead to issue not being reported
  • modifying FileSettings during the analysis is just wrong
  • no tests added at all
  • this needs to be opt-in (even if the configuration is being considered)

@firewave
Copy link
Collaborator

  • modifying FileSettings during the analysis is just wrong

The de-duplication should be done in CmdLineParser::fillSettingsFromArgs() - there is a TODO.

Also we are not yet able to fully de-duplicate the non-project input files (see https://trac.cppcheck.net/ticket/12834) so it seems a bit opportunistic that we can suddenly do it for projects.

@firewave
Copy link
Collaborator

I wonder if we should generate an information message if a file was de-duplicated (I only added a TODO about ignored files) as it might indicate an invocation or configuration error.

@danmar
Copy link
Owner

danmar commented Dec 18, 2024

We need to consider the whole(!) configuration (which is currently all over the place) as it might be unique to each file.

I am not sure about this. My idea is that we can add a index for each file. either it's a count so all files gets a unique index. Or we just ensure that files with the same filenames gets unique index. I am thinking that a index that is unique for all files will be easier. However the risk might be more artifacts in the build folder.

Just for information; If the build dir is not used an alternative could also be to give each thread a unique id. And mark all dump files etc with <pid>-<thread>. But as that would only work for "no build dir" use case I think it's not the way to go.

modifying FileSettings during the analysis is just wrong

yes I think it's because it is a experimental draft PR right now, it's not meant to be the final implementation.

I agree with you, I also believe the index should be determined earlier.

no tests added at all

yes this will be needed later when it's not experimental anymore

this needs to be opt-in (even if the configuration is being considered)

I am not sure why? the final goal of this work is to distinguish different files properly.

Also we are not yet able to fully de-duplicate the non-project input files (see https://trac.cppcheck.net/ticket/12834) so it seems a bit opportunistic that we can suddenly do it for projects.

I am in favor of fixing that also.. however just for my understanding, it requires some special circumstances like specifying overlapping directories.. right?

I have couple of customers that are running into this and they can only run with -j1 right now on some projects. They do use --project=.. But of course we should make a proper fix here not just something that makes a few customers happy..

@danmar
Copy link
Owner

danmar commented Dec 18, 2024

I am in favor of fixing that also.. however just for my understanding, it requires some special circumstances like specifying overlapping directories.. right?

If a user would run into this problem and we determined that the command line was messed up I'd suggest that the command line was fixed first :-)

@firewave
Copy link
Collaborator

Okay, I totally misunderstood the intention here. The ticket lines out what the issue is but it seems I somehow didn't read it properly (I do not even understand what I meant with one of my comments). Sorry about that.

I am not sure why? the final goal of this work is to distinguish different files properly.

"final" is the keyword here. Currently we cannot do that (and that will take quite a while until we can) - in the way integrated this (de-duplication)

In the context of avoiding conflicts with the dump/analysis information this might be fine and obviously needs to be done unconditionally. It would rely on the input always being in a fixed order though (which might not be case depending on the generator). I have the feeling something like a hash might be better than an index (maybe even hashes for the file names in general).

That makes me think of some potential issues. What happens if the project changes and files are no longer contained? That previous output is probably not removed - is it still considered for the analysis? And regarding the index - what if entries for overlapping files are being added/removed? In that case the previous indexes would be invalid and all the files need to be analyzed again.

On a side note some context on why this earlier reaction of mine happened:
The problem here is that sometimes a change you initiated is put up and it is being merged within minutes giving nobody even a chance to look at it. That's why I sometimes have to rush in head over heels (if it overlaps with some of my [planned] work) with limited time to get all the information. Others have to wait weeks or months for any feedback at all. So it would be nice if there would be some time for other people to at least have a look.

@danmar
Copy link
Owner

danmar commented Dec 20, 2024

The problem here is that sometimes a change you initiated is put up and it is being merged within minutes giving nobody even a chance to look at it. That's why I sometimes have to rush in head over heels (if it overlaps with some of my [planned] work) with limited time to get all the information. Others have to wait weeks or months for any feedback at all. So it would be nice if there would be some time for other people to at least have a look.

It's true that sometimes I rush changes in quickly. :-(

I will try to improve. Recently I merged changes into the GUI in such way.. and in hindsight I was too quick it wasn't that urgent to get those fixes into Cppcheck.

Others have to wait weeks or months for any feedback at all.

It's not easy to keep up with all the changes by the community; it is good that there is so much work but bad that it's too much for me at times. I think it can become better.

Copy link
Owner

@danmar danmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hash will remove duplicate file entries right.. but that is not the problem.

if a compile_commands.json says that foo.c is compiled 3 times then we want to analyse it 3 times. I would assume that the preprocessor flags will be different otherwise I don't understand why it should be recompiled.

@danmar
Copy link
Owner

danmar commented Dec 20, 2024

It would rely on the input always being in a fixed order though (which might not be case depending on the generator).

It's not really a problem as long as the order is fixed during the analysis. Let's say the generator will put the files in random order for some reason. first we execute cppcheck and order is entry1,entry2. Next time we execute cppcheck the order is entry2,entry1.

  • If entry1 and entry2 has the same preprocessor output then cppcheck will reuse the analyzerinfo.. which is fine. if filename matches, preprocessor output matches, then as far as I see analyserinfo matches.
  • If they don't have the same preprocessor output then cppcheck will recheck the files.. but that only means some wasted cpu no real danger as far as I see.

@firewave
Copy link
Collaborator

It would rely on the input always being in a fixed order though (which might not be case depending on the generator).

It's not really a problem as long as the order is fixed during the analysis. Let's say the generator will put the files in random order for some reason. first we execute cppcheck and order is entry1,entry2. Next time we execute cppcheck the order is entry2,entry1.

* If entry1 and entry2 has the same preprocessor output then cppcheck will reuse the analyzerinfo.. which is fine. if filename matches, preprocessor output matches, then as far as I see analyserinfo matches.

* If they don't have the same preprocessor output then cppcheck will recheck the files.. but that only means some wasted cpu no real danger as far as I see.

You are probably right. I will still try to write some tests to cover these cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants