-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unconsistent requirements for enum keys when they represented as numbers and as strings #1132
Unconsistent requirements for enum keys when they represented as numbers and as strings #1132
Comments
@Mingun Maybe if you actually read the comment inside the def asLong(src: Any, path: List[String]): Long = {
src match {
case n: Long =>
n
case n: Int =>
n
case str: String =>
// Generally should not happen, but when the data comes from JavaScript,
// all object keys are forced to be strings.
try {
Utils.strToLong(str)
} catch {
case ex: MatchError =>
throw KSYParseError.withText(s"unable to parse `$str` as int", path)
} OK, let me clarify. We never wanted to allow string keys in enums, but since the parsed YAML document that we get from an external YAML parser in JavaScript will be a JavaScript object, and JavaScript objects only allow strings and symbols as keys, enum integer IDs will be stringified, and there's nothing we can do about that - the compiler must be ready for that. But since the YAML parser will actually interpret them and then convert them to string, we know that we can only expect strings as returned by the Yes, this has the side effect that it technically allows you to use actual strings as keys (which prevents the YAML parser from interpreting it as an integer). So what? How does this limit anyone when writing .ksy specs? I don't think it does, because the chance that someone would accidentally wrap enum integer IDs in quotes like |
I don't understand what JavaScript has to do with it. Isn't SnakeYAML used for parsing? String on input -> compiler parses using it's own understanding of YAML rules? As you known, I'm writing an alternative KS compiler. I tries to be as much compatible with existing implementation as I can. That means writing a lot of tests of corner cases, because compatible cannot mean "compatible here and there, but not here". I want to create drop-in replacement (in "compatible" mode). When I try to implement some things in a most straightforward way I find, that original compiler behaves differently. That's need explanation, because I do not want just write code without comments, since this vicious practice has already borne fruit in this project.
This does not explaining why then parser tries to extract HEX numbers from strings or even uses regexps to ensure integers. Wasn't it easier to call
The probability is higher that you think: Actually, I started investigating this issue, because my compiler was stricter in that case and could not parse those keys. So what the expected behavior? What I should test in my tests? Usage of string keys should be discouraged, need to fix all .ksy where they are used? |
SnakeYAML is a Java library, so it only runs on the JVM platform. Theferore, only the JVM build of KSC can use it. With the JavaScript build, you have to use a JavaScript library to do the YAML parsing.
This is only true for the part of KSC code that is JVM-specific (i.e. the Until recently, the Web IDE used (and the stable version at https://ide.kaitai.io/ still does) an abandoned shitty YAML parsing library https://github.com/jeremyfa/yaml.js (I don't even link to it, don't go there). It had a number of very noticeable and annoying bugs (I explained all of which I knew about in kaitai-io/kaitai_struct_webide#165), so https://ide.kaitai.io/devel/ already uses https://github.com/nodeca/js-yaml, which works much better (perhaps even better than SnakeYAML - I really like its error messages).
Well, I've seen a number of times that if there are comments in the code that explain exactly what you're confused about, you just skip them or don't notice them. Then even the best comments in the world are useless.
It happens from time to time that the code you see isn't written in the best and simplest possible way. This often happens when something is refactored or an existing pattern is used and no thought is given to whether something simpler could be used. It happens to the best of us.
Yes. |
Of course it had to be @KOLANICH, who else :) But it's 7 years ago to be fair. |
Is it needed to keep special handling for JS at all? It's 2024, Map is widely available:
Probably modern YAML parsing library already uses it for storing mappings? |
For the reference: there is also one .ksy with underscores in numbers, which should be prohibited in YAML 1.2 (which is js-yaml implements), but it allows them right now: |
I know https://github.com/eemeli/yaml offers to read into a
Yeah, and I like this. I also discuss this in kaitai-io/kaitai_struct_webide#165 (comment). I don't think either 100% YAML 1.1 compliant mode or 100% YAML 1.2 compliant mode makes sense, and I'm pretty happy with the compromise that https://github.com/nodeca/js-yaml offers. Also, https://bitbucket.org/snakeyaml/snakeyaml mostly implements YAML 1.1, but not where it doesn't make sense (for example, it doesn't parse To be honest, I believe the final solution to the annoying differences of YAML 1.1 and YAML 1.2 will be fully adopting the YAML 1.2 failsafe schema, as I described in kaitai-io/kaitai_struct_webide#165 (comment). The failsafe schema means that the YAML parsers will only provide mappings, sequences and strings, and will never try to interpret certain strings as boolean / integer / float / null regardless of whether they are quoted or not, so we'll be able to process everything consistently in KSC ourselves. But this kind of discussion belongs more to #229 again. This will finally fix some edge cases that we're currently not able to fix properly, but I don't think it will make a huge difference to the end user. I believe that the behavior of https://ide.kaitai.io/devel/ using js-yaml and the command-line KSC using SnakeYAML is already pretty well compatible right now, except for some very marginal cases (such as different |
Kaitais struct allows to define enum key either as number, or as string which value is number.
https://github.com/kaitai-io/kaitai_struct_compiler/blob/c23ec2ca88d84042edba76f70c1f003d062b7585/shared/src/main/scala/io/kaitai/struct/format/EnumSpec.scala#L33-L35
Numbers are extracted using
ParseUtils.asLong
method...https://github.com/kaitai-io/kaitai_struct_compiler/blob/c23ec2ca88d84042edba76f70c1f003d062b7585/shared/src/main/scala/io/kaitai/struct/format/ParseUtils.scala#L186-L194
...which uses
Utils.strToLong
which uses simple regular expressions to detect decimal and hexadecimal numbers:https://github.com/kaitai-io/kaitai_struct_compiler/blob/c23ec2ca88d84042edba76f70c1f003d062b7585/shared/src/main/scala/io/kaitai/struct/Utils.scala#L29-L34
That means that you cannot:
X
in hexadecimal numbers (which is allowed in expression language)All of that creates inconsistence between literal numbers and literal strings in YAML markup:
For consistency it would be better reuse either YAML int parser, or int parser from expression language.
Also note, that both YAML parser and regular expression allows leading zeroes for decimal numbers, but expression language not:
https://github.com/kaitai-io/kaitai_struct_compiler/blob/c23ec2ca88d84042edba76f70c1f003d062b7585/shared/src/main/scala/io/kaitai/struct/exprlang/Lexical.scala#L86
This inconsistency probably also should be fixed.
#448 may be related (it is possible to solve both problems at once).
The text was updated successfully, but these errors were encountered: