Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safety: Make the interface safer by removing old style C buffer inputs #377

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

danmar
Copy link
Owner

@danmar danmar commented Oct 7, 2024

No description provided.

@danmar
Copy link
Owner Author

danmar commented Oct 7, 2024

@firewave I don't feel good about these. It is not safe. A large number of CVEs are caused by old style C buffers..

@danmar
Copy link
Owner Author

danmar commented Oct 7, 2024

Taking a std::string would be better imho but I still feel that std::istream is better for type safety reasons.

@firewave
Copy link
Collaborator

firewave commented Oct 7, 2024

I get that, but I just added these because I need them. The overhead of std::istream is just not acceptable.

I also stated in the other PR that I will provide more modern interfaces but I did not have time to do that yet.

There is also -Wunsafe-buffer-usage which will warn about this. That is something I want to have clean in that context.

@danmar
Copy link
Owner Author

danmar commented Oct 7, 2024

I also stated in the other PR that I will provide more modern interfaces but I did not have time to do that yet.

I would be interested to know what that is however it should be done here in the end so why not start with that so we don't have to rewrite cppcheck again later.

@firewave
Copy link
Collaborator

firewave commented Oct 7, 2024

Taking a std::string would be better imho but I still feel that std::istream is better for type safety reasons.

And as I said - if the string is not terminated or the size is wrong you will have the same issues. If you are coming from a raw buffer and make a mistake the outcome will be the same no matter the wrapper.

And as also raised before - using std::string might be problematic with binary data or different encodings. We also do not have support for std::wstring, std::u16string and std::u32string at all.

@firewave
Copy link
Collaborator

firewave commented Oct 7, 2024

I would be interested to know what that is however it should be done here in the end so why not start with that so we don't have to rewrite cppcheck again later.

It is coming up. But I just have too many open, local or follow-ups (which I can barely take track of) to work on. And you are also waiting on my feedback on your things. So please don't make things even more complicated. I am trying to stay on top of it in a timely manner.

Nothing has to be rewritten. It is just convenience (your side) and reducing overhead (my side) - and guiding users to a more modern way (currently completely missing).

But none of these will make things actually safer.

@danmar
Copy link
Owner Author

danmar commented Oct 7, 2024

I agree it's a big problem with the istream slowness..

@danmar
Copy link
Owner Author

danmar commented Oct 7, 2024

Even though fstream is slowish my hunch is that the changes will not make a significant speedup overall in cppcheck analysis.

Could you try to build cppcheck with your changes and without them.. then run some arbitrary command such as:

time cppcheck -D__GNUC__ -D__CPPCHECK__ lib/token.cpp

To see how much it speeds up the preprocessing specifically it would also be interesting to see the times when you use -E.

@danmar
Copy link
Owner Author

danmar commented Oct 7, 2024

And as I said - if the string is not terminated or the size is wrong you will have the same issues. If you are coming from a raw buffer and make a mistake the outcome will be the same no matter the wrapper.

But we are not usually coming from raw buffers. We are usually coming from string literals or file streams. And then it is less safe to convert to buffers. You could easily forget a -1 in your code when passing the size and there will be no warning.

@firewave
Copy link
Collaborator

firewave commented Oct 7, 2024

It is about core performance since simplecpp is supposed to be embedded. In Cppcheck as soon as the ValueFlow kicks in obviously nothing else matters...

But we are not usually coming from raw buffers.

I am talking external users and not us. We are safe because of the sanitizers, valgrind etc. in the CI.

Please give me a bit to implement the approach via danmar/cppcheck#6379. The non-ASCII stuff will be out-of-scope for now though.

I want to look into the builddir stuff first and finish up my standards stuff so there are at least a few things I finally can put a lid on.

@danmar
Copy link
Owner Author

danmar commented Oct 8, 2024

Please give me a bit to implement the approach

yeah sure feel free to look at a better approach. However in my humble opinion we need to make simplecpp interface safer.

std::vector<uint8_t> or std::string would be better than a C buffer and size. I do not know what problems you see with std::string that are solved by using a raw buffer.

@firewave
Copy link
Collaborator

firewave commented Oct 8, 2024

std::vector<uint8_t> or std::string would be better than a C buffer and size. I do not know what problems you see with std::string that are solved by using a raw buffer.

An unnecessary wrapper (i.e. copy) and not sure what is going on with non-ASCII data. That's what I want to look into.

@danmar
Copy link
Owner Author

danmar commented Oct 10, 2024

An unnecessary wrapper (i.e. copy) and not sure what is going on with non-ASCII data. That's what I want to look into.

About the copy I don't care about the performance hit from that. This is a not a big performance problem.

But if there would be some issues with non-ASCII data that it's not copied properly that is worth fixing.

Anyway it feels like std::vector<uint8_t> would be better than std::string from a type safety point of view to distinguish that it's raw file data not a string.

@danmar
Copy link
Owner Author

danmar commented Oct 10, 2024

This is a not a big performance problem.

If I put on the Cppcheck hat for a little moment..

I have the feeling that the Tokenizer in Cppcheck could be 90% faster if we redesign it.

I just don't know what that redesign means. Rewrite it so that all simplifications are made in 1 pass (how to do it for C++?)? Preallocate memory buffer for tokens and use placement new? Remove the std::string str() and provide a int strId() instead? Stop making various simplifications? Do you have any other ideas?

@firewave
Copy link
Collaborator

I have the feeling that the Tokenizer in Cppcheck could be 90% faster if we redesign it.

The speed of that is fine (except for some extreme cases) - otherwise I would not be looking into getting rid of the std::istream overhead.

The only issue with an actual impact exists in simplecpp and I tried to address that in #305.

@firewave
Copy link
Collaborator

Anyway it feels like std::vector<uint8_t> would be better than std::string from a type safety point of view to distinguish that it's raw file data not a string.

That is under consideration. My IDE is currently broken so I cannot do anything which requires compilation - waiting for reply from support. So might be a while until I get that underway. I am currently trying to clean up my massive local backlog.

@firewave
Copy link
Collaborator

We should keep in mind that this was not based on an external request. I will obviously over-engineer this from the start and I think it will help to come a reasonable solution.

@danmar
Copy link
Owner Author

danmar commented Oct 10, 2024

The speed of that is fine (except for some extreme cases)

thanks.. I measured now again.. I thought it was way worse for some reason.

@firewave
Copy link
Collaborator

There is also #279 regarding performance which is a much more general issue.

@danmar
Copy link
Owner Author

danmar commented Oct 11, 2024

There is also #279 regarding performance which is a much more general issue.

interesting. I wonder if you have ever profiled simplecpp in windows. A customer has reported that preprocessing is way slower in windows.

@firewave
Copy link
Collaborator

I wonder if you have ever profiled simplecpp in windows.

Yes, but not much since enough still falls out of simply doing it on Linux. Also it is confusing at times compared to looking at callgrind (I am also bad at understanding perf output).

I am planning to have a closer look at the Windows performance compared to Linux after I updated the release build to Qt6. I finally what to build that with Boost as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants