-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trade (tick) timestamps do not follow json conventions (causing overflow) #16
Comments
JSON does not "convert" anything, it's just a data format. Large, high-resolution numbers are totally valid JSON, but your language and/or parser may have limitations. I use Javascript, and it is true that |
JSON does not convert, is just a format, that is true, However because JSON originated from Javascript, and Javascript has this quirk, quite a few parsers (in the JVM, .NET, etc) also represent the JSON parsed numeric type with a double. Sadly all of the high-resolution numeric types I output in json have had to be quoted over the yrs to avoid this issue. I think for the sake of the install base of parsers out there it makes sense to quote. I am using Kotlin and have checked a number of JSON parsers, all of which follow the very flawed behavior of Javascript. To work around this problem wrote my own parser, and then lazy evaluate to a long, double, int or whatever the expected type is. |
One more note, the reason why this is important (even if not interested in ns resolution) is that the full timestamp is used as a paging cursor. if there are more than 50K trades / day, one needs to set the starting timestamp to be 1ns past the last timestamp received in the prior "page". |
Yes 😞 , yes it is. My code works around that by counting how many records were in the last ms of the previous page, and skipping that many on the next page. |
tested, happens in golang also. python and javascript use 64bit integer types.
|
Surely you mean 64-bit floating point types? |
@kmcminn I don't currently have a problem in dealing with this in my code; However to handle these messages I had to write a new JSON parser to deal with the fact that any number of JSON parsers out there use double to hold a parsed numeric value. I investigated a number of JVM parsers only to find that they all had this flaw. I've seen the same on the .NET side as well. It's pretty "retarded", but I think was just adopted from Javascript's implementation. |
@spazmodius no, the integer types that json decoders most people use in python and javascript have to handle integers greater than 32 bits. the types just have some code under the hood to switch to a 64 bit type. The loss of precision when switching:
to a 64 bit floating point number is a language problem, its a general computation problem. |
@kmcminn The problem here does not related to 32 vs 64 bit, rather that passing a long or string-version-of-a-long into a double and back into a long loses resolution. This happens due to an artifact started on the Javascript side and carried forward into various JSON parser implementations. |
@kmcminn I'm not sure I know what you're saying. That number would easily fit in a 64-bit int, which Javascript doesn't have. |
in the case of
yeah to clarify, any >=13 digit number requires more than 32 bits of precision to store it. its a 32 v 64 implementation detail however in the json decoder as the decoders logic will be based on output of type inference to some degree, which will select a determined 64bit type. in cases where that selects a |
@kmcminn Yes, that is my point. What I am indicating is that many JSON parsers use floating-point representation for numeric types, hence to work around this, large numeric values sent in a JSON stream should be quoted. Ideally would like to see Javascript, and the many other JSON parsers for JVM and other environments not follow Javascript's folly, but until then ... Re FP, though double is 64 bit, the mantissa component is just 52 bits. There are other aspects of floating-point representation which can introduce inaccuracy, even for large integers within the 52 bits of magnitude. |
lets dive into a solutions:
we'll now have to explicitly cast these numbers ourselves to a type which wouldnt be terrible but the migration would no fun for all involved
dont like this migration story either or having to rely on a difficult time conversion in their code and the error-prone blackbox it creates for them and their custoemrs
this is probably the more robust path in that we get a string type that can be accurately represented everywhere and that also is the most intuitive for customers. migration woes. the history on this diccussion in polygon would be interesting.
probably the lowest impact and easiest to accomplish |
I agree that any change in format will break code. That said note the following, other data sources (such as crypto exchanges), provide timestamps in one of two forms:
Epoch time is more compact and cheaper to parse, so I think quoted epoch is the better of the two from efficiency standpoint. More compact bandwidth use and cheaper parsing. Probably this "bug" should resolve into a feature request in a future API release. |
yeah I feel your pain. I'm working with it using 64bit integer types that can accurately represent this. I've not seen a timestamp > 13 digits on any websocket feeds.
indeed the ability to declare number and string serialization options would be elite. I would want it on all APIs... one quick fix polygon can do on their side which would help golang, java, kotlin, .net users, would be to just stop sending 19 digit milli-timestamps. numbers that big get wrecked by
|
can someone can port this to kotlin or java to see if you see similar? |
URL
https://api.polygon.io/v2/ticks/stocks/trades/A/2010-01-06?reverse=false&limit=50000×tamp=1262812349393999878&apiKey=...
Result
Getting a results that on first glance looks textually correct, however the timestamps must be quoted, as json converts #s to double under the covers.
This causes a timestamp like: 1262812349394000000 to be converted to a double, where the mantissa is such that does not have full resolution on the #. The stamp becomes: 1262812349393999878 once it goes through the double conversion (close but not exact).
For better or worse, the time stamp must be quoted. So instead should be:
Expected Result
Timestamps must be quoted for proper json parsing.
Additional Notes
Javascript and many JSON parsers represent the numeric type as a double. This means that most parsers will return the wrong timestamp value (out to some resolution) due to the reduced resolution of 52 bits in the IEEE floating point representation. By convention most market data providers streaming JSON will either:
I think the most compact form would be to continue to use the epoch time in nanoseconds, but quote it so that Javascript, Java-based JSON parsers, and other parsers following the Javascript convention of holding numeric in a double do not lose resolution when parsing. This would present a minor format change and might break some code, however. APIs that expect a numeric value rather than a string-based-long might fail to parse the messages without a minor adjustment to convert the string to a long.
The text was updated successfully, but these errors were encountered: