Safety: Make the interface safer by removing old style C buffer inputs #377

danmar · 2024-10-07T18:44:19Z

No description provided.

danmar · 2024-10-07T18:47:21Z

@firewave I don't feel good about these. It is not safe. A large number of CVEs are caused by old style C buffers..

danmar · 2024-10-07T18:55:09Z

Taking a std::string would be better imho but I still feel that std::istream is better for type safety reasons.

firewave · 2024-10-07T18:55:16Z

I get that, but I just added these because I need them. The overhead of std::istream is just not acceptable.

I also stated in the other PR that I will provide more modern interfaces but I did not have time to do that yet.

There is also -Wunsafe-buffer-usage which will warn about this. That is something I want to have clean in that context.

danmar · 2024-10-07T19:01:43Z

I also stated in the other PR that I will provide more modern interfaces but I did not have time to do that yet.

I would be interested to know what that is however it should be done here in the end so why not start with that so we don't have to rewrite cppcheck again later.

firewave · 2024-10-07T19:04:43Z

Taking a std::string would be better imho but I still feel that std::istream is better for type safety reasons.

And as I said - if the string is not terminated or the size is wrong you will have the same issues. If you are coming from a raw buffer and make a mistake the outcome will be the same no matter the wrapper.

And as also raised before - using std::string might be problematic with binary data or different encodings. We also do not have support for std::wstring, std::u16string and std::u32string at all.

firewave · 2024-10-07T19:09:37Z

I would be interested to know what that is however it should be done here in the end so why not start with that so we don't have to rewrite cppcheck again later.

It is coming up. But I just have too many open, local or follow-ups (which I can barely take track of) to work on. And you are also waiting on my feedback on your things. So please don't make things even more complicated. I am trying to stay on top of it in a timely manner.

Nothing has to be rewritten. It is just convenience (your side) and reducing overhead (my side) - and guiding users to a more modern way (currently completely missing).

But none of these will make things actually safer.

danmar · 2024-10-07T19:19:25Z

I agree it's a big problem with the istream slowness..

danmar · 2024-10-07T20:14:57Z

Even though fstream is slowish my hunch is that the changes will not make a significant speedup overall in cppcheck analysis.

Could you try to build cppcheck with your changes and without them.. then run some arbitrary command such as:

time cppcheck -D__GNUC__ -D__CPPCHECK__ lib/token.cpp

To see how much it speeds up the preprocessing specifically it would also be interesting to see the times when you use -E.

danmar · 2024-10-07T20:17:08Z

And as I said - if the string is not terminated or the size is wrong you will have the same issues. If you are coming from a raw buffer and make a mistake the outcome will be the same no matter the wrapper.

But we are not usually coming from raw buffers. We are usually coming from string literals or file streams. And then it is less safe to convert to buffers. You could easily forget a -1 in your code when passing the size and there will be no warning.

firewave · 2024-10-07T20:36:18Z

It is about core performance since simplecpp is supposed to be embedded. In Cppcheck as soon as the ValueFlow kicks in obviously nothing else matters...

But we are not usually coming from raw buffers.

I am talking external users and not us. We are safe because of the sanitizers, valgrind etc. in the CI.

Please give me a bit to implement the approach via danmar/cppcheck#6379. The non-ASCII stuff will be out-of-scope for now though.

I want to look into the builddir stuff first and finish up my standards stuff so there are at least a few things I finally can put a lid on.

danmar · 2024-10-08T06:49:48Z

Please give me a bit to implement the approach

yeah sure feel free to look at a better approach. However in my humble opinion we need to make simplecpp interface safer.

std::vector<uint8_t> or std::string would be better than a C buffer and size. I do not know what problems you see with std::string that are solved by using a raw buffer.

firewave · 2024-10-08T12:23:47Z

std::vector<uint8_t> or std::string would be better than a C buffer and size. I do not know what problems you see with std::string that are solved by using a raw buffer.

An unnecessary wrapper (i.e. copy) and not sure what is going on with non-ASCII data. That's what I want to look into.

danmar · 2024-10-10T07:54:34Z

An unnecessary wrapper (i.e. copy) and not sure what is going on with non-ASCII data. That's what I want to look into.

About the copy I don't care about the performance hit from that. This is a not a big performance problem.

But if there would be some issues with non-ASCII data that it's not copied properly that is worth fixing.

Anyway it feels like std::vector<uint8_t> would be better than std::string from a type safety point of view to distinguish that it's raw file data not a string.

danmar · 2024-10-10T08:06:02Z

This is a not a big performance problem.

If I put on the Cppcheck hat for a little moment..

I have the feeling that the Tokenizer in Cppcheck could be 90% faster if we redesign it.

I just don't know what that redesign means. Rewrite it so that all simplifications are made in 1 pass (how to do it for C++?)? Preallocate memory buffer for tokens and use placement new? Remove the std::string str() and provide a int strId() instead? Stop making various simplifications? Do you have any other ideas?

firewave · 2024-10-10T12:32:02Z

I have the feeling that the Tokenizer in Cppcheck could be 90% faster if we redesign it.

The speed of that is fine (except for some extreme cases) - otherwise I would not be looking into getting rid of the std::istream overhead.

The only issue with an actual impact exists in simplecpp and I tried to address that in #305.

firewave · 2024-10-10T12:34:38Z

Anyway it feels like std::vector<uint8_t> would be better than std::string from a type safety point of view to distinguish that it's raw file data not a string.

That is under consideration. My IDE is currently broken so I cannot do anything which requires compilation - waiting for reply from support. So might be a while until I get that underway. I am currently trying to clean up my massive local backlog.

firewave · 2024-10-10T12:53:03Z

We should keep in mind that this was not based on an external request. I will obviously over-engineer this from the start and I think it will help to come a reasonable solution.

danmar · 2024-10-10T19:58:19Z

The speed of that is fine (except for some extreme cases)

thanks.. I measured now again.. I thought it was way worse for some reason.

firewave · 2024-10-11T10:10:35Z

There is also #279 regarding performance which is a much more general issue.

danmar · 2024-10-11T14:47:08Z

There is also #279 regarding performance which is a much more general issue.

interesting. I wonder if you have ever profiled simplecpp in windows. A customer has reported that preprocessing is way slower in windows.

firewave · 2024-10-11T16:37:48Z

I wonder if you have ever profiled simplecpp in windows.

Yes, but not much since enough still falls out of simply doing it on Linux. Also it is confusing at times compared to looking at callgrind (I am also bad at understanding perf output).

I am planning to have a closer look at the Windows performance compared to Linux after I updated the release build to Qt6. I finally what to build that with Boost as well.

Remove dangerous TokenList constructors that take C buffers as inputs

86da36e

danmar force-pushed the safety-c-buffers branch from e45b931 to 86da36e Compare October 7, 2024 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safety: Make the interface safer by removing old style C buffer inputs #377

Safety: Make the interface safer by removing old style C buffer inputs #377

danmar commented Oct 7, 2024

danmar commented Oct 7, 2024

danmar commented Oct 7, 2024

firewave commented Oct 7, 2024

danmar commented Oct 7, 2024

firewave commented Oct 7, 2024

firewave commented Oct 7, 2024

danmar commented Oct 7, 2024

danmar commented Oct 7, 2024 •

edited

Loading

danmar commented Oct 7, 2024 •

edited

Loading

firewave commented Oct 7, 2024

danmar commented Oct 8, 2024

firewave commented Oct 8, 2024

danmar commented Oct 10, 2024 •

edited

Loading

danmar commented Oct 10, 2024

firewave commented Oct 10, 2024

firewave commented Oct 10, 2024

firewave commented Oct 10, 2024

danmar commented Oct 10, 2024

firewave commented Oct 11, 2024

danmar commented Oct 11, 2024 •

edited

Loading

firewave commented Oct 11, 2024

Safety: Make the interface safer by removing old style C buffer inputs #377

Are you sure you want to change the base?

Safety: Make the interface safer by removing old style C buffer inputs #377

Conversation

danmar commented Oct 7, 2024

danmar commented Oct 7, 2024

danmar commented Oct 7, 2024

firewave commented Oct 7, 2024

danmar commented Oct 7, 2024

firewave commented Oct 7, 2024

firewave commented Oct 7, 2024

danmar commented Oct 7, 2024

danmar commented Oct 7, 2024 • edited Loading

danmar commented Oct 7, 2024 • edited Loading

firewave commented Oct 7, 2024

danmar commented Oct 8, 2024

firewave commented Oct 8, 2024

danmar commented Oct 10, 2024 • edited Loading

danmar commented Oct 10, 2024

firewave commented Oct 10, 2024

firewave commented Oct 10, 2024

firewave commented Oct 10, 2024

danmar commented Oct 10, 2024

firewave commented Oct 11, 2024

danmar commented Oct 11, 2024 • edited Loading

firewave commented Oct 11, 2024

danmar commented Oct 7, 2024 •

edited

Loading

danmar commented Oct 7, 2024 •

edited

Loading

danmar commented Oct 10, 2024 •

edited

Loading

danmar commented Oct 11, 2024 •

edited

Loading