Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code sanity #73

Closed
sinkingsugar opened this issue Feb 6, 2020 · 10 comments
Closed

Code sanity #73

sinkingsugar opened this issue Feb 6, 2020 · 10 comments

Comments

@sinkingsugar
Copy link
Contributor

sinkingsugar commented Feb 6, 2020

Since we are talking about "sanitizing" and "security" for instance https://github.com/status-im/nimbus/issues/164
I'm collecting here a list of risky code.
Not doing any change yet, but keeping it under the spotlight and up for discussion.

Wild casting

This is an example of bad practice, while string, seq[byte], seq[uint8] and Taintedstring are all equivalent at a low-level, this is something that might change, will likely change out of our control.
In this specific case both seq and string are also basically references (not exactly but similar behavior for the compiler).
Moreover is lossy as there is no UTF8 sanity check either.
https://github.com/status-im/nim-libp2p/blob/88a030d8fbd76023354f14ff10ba740786eb46a4/libp2p/muxers/mplex/mplex.nim#L88

Wild gcsafe

#68
I see way too many ref and way too many unsafe gcsafe

new pattern

I'm 50/50 on this.. I actually like the idea of newX expressing heap/ref allocations, but I often hear we want to change it into T.init (cc @arnetheduck )
I made a list of those newX in our code base (some are system.nim)

131: newSeq
184: newException
2: newSkContext
43: newConnection
17: newDaemonApi
1: newStringTable
3: newStringOfCap
47: newFuture
1: newPool
7: newAsyncEvent
7: newString
2: newMultistreamHandshakeException
22: newMultistream
2: newInvalidVarintException
1: newInvalidVarintSizeException
2: newChannel
16: newLPStreamEOFError
3: newStreamInternal
4: newLPStreamLimitError
21: newMplex
13: newStream
4: newMuxer
5: newMuxerProvider
9: newSeqOfCap
9: newIdentify
4: newMessage
5: newTimedCache
6: newMCache
3: newAsyncLock
15: newPubSub
63: newBufferStream
12: newPubSubPeer
9: newNoise
2: newPlainText
2: newSecioConn
4: newSecio
7: newStandardSwitch
48: newTransport
5: newSwitch
2: newAlreadyPipedError
4: newNotWritableError
8: newLPStreamIncompleteError
2: newChronosStream
2: newAsyncStreamReader
2: newAsyncStreamWriter
6: newLPStreamReadError
9: newLPStreamIncorrectError
4: newLPStreamWriteError
6: newNoPubSubException
3: newTestSelectStream
2: newTestLsStream
2: newTestNaStream
@arnetheduck
Copy link
Contributor

for casting, see https://github.com/status-im/nim-stew/blob/50562b515a771cfc443557ee8e2dceee59207d52/stew/byteutils.nim#L127 - ie the idea is to transition them into function calls so that we can do something sane about it in the future - the conversion in the other direction might still be missing though - for now, like nim does, we assume a neutral world of byte sequences and no special meaning for utf8 etc.

a core issue is that string usage is prolific in nim even when byte sequence would be more appropriate - again, the idea is that we slowly clean that up by creating alternatives in stew so that we can test what a role model solution would look like in practise.

@sinkingsugar
Copy link
Contributor Author

Yeah that's better (the toBytes) I guess, at least it's in a place that can be localized and triaged if something happens. Still it's far from safe, I think the GC references (on the backing memory) likely get increased/decreased/messed so far from noop.
But the cast[string] for example doubt would work with arc or strv2
https://github.com/nim-lang/Nim/blob/7ec7731f824b933cdd4f4a1f816c1f3e4862bef6/lib/system/strs_v2.nim#L20
As length is before the payload (and payload cap is before etc..etc..)!

Completely true and agree on the string usage that randomly pops :), something like rust &[u8] a slice or basically a string view would be nice.

@arnetheduck
Copy link
Contributor

afair the cast still has value semantics so there's a copy made, but feel free to correct me (and the comment on that function, maybe) - re arc, no idea, but getting rid of the casts is small step in a reasonable direction that works today.

@sinkingsugar
Copy link
Contributor Author

Ah very true, you are completely right, there is likely a copy indeed! So basically no point on cast, just implementing a clean copy is basically the same and future proof!

@zah
Copy link
Contributor

zah commented Feb 6, 2020

Most code accepting read-only inputs should prefer using openarray (a.k.a. slice/view/span). Strings offer toOpenArrayByte which is a safe and legal way to get the bytes of a string. The current issue is that you cannot use openarray parameters in async functions.

@mratsim
Copy link
Contributor

mratsim commented Feb 6, 2020

Casting a string to seq[byte] is OK.
Casting a seq[byte] to string is definitely not OK.
The strings created by casting are not terminated by '\0' and will cause buffer overflow with C APIs.

This already happened in status-im/nim-http-utils#8 and fixed in status-im/nim-http-utils#9 by using the following procedure instead of cast:

proc toString(data: openarray[byte], start, stop: int): string =
  ## Slice a raw data blob into a string
  ## This is an inclusive slice
  ## The output string is nul-terminated for raw C-compat
  let len = stop - start + 1
  result = newString(len)
  copyMem(result[0].addr, data[start].unsafeAddr, len)

It has the same performance properties as casting for lvalues, for rvalues casting would avoid the copy but openarray would work as well.

If casting to string is desired, it should be a type NotNulString = distinct string to ensure it is never used as a public string or passed to cstring / C-API

@sinkingsugar
Copy link
Contributor Author

Cool, was thinking to add something like that @mratsim , maybe can be added into stew or something like that!
@zah Hmm, I didn't use async so much lately, is that a limitation in the current implementation for some reason?

@dryajov
Copy link
Contributor

dryajov commented Feb 11, 2020

The strings created by casting are not terminated by '\0' and will cause buffer overflow with C APIs.

Not implicitly AFAIK, but if converting from string to byte array/seq and back to strings, it should still be safe? This is mostly how this is being used across the codebase.

@zah, I'm also not aware of issues with async and openarray, do you remember what they are @cheatfate might also have an idea.

In general I'm all for cleaning this up, but I was under the impression that this sort of casting was mostly OK in Nim and there were some guarantees to keep seqs and strings casts compatible?

@zah
Copy link
Contributor

zah commented Feb 12, 2020

@sinkingsugar, @dryajov, please read through this rather long RFC where I've explained the problems in detail:
status-im/nim-chronos#2

@dryajov
Copy link
Contributor

dryajov commented Jan 26, 2021

This is addressed by the latest style guide outlined in status-im/nimbus-eth2#2249.

@dryajov dryajov closed this as completed Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants