-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Followup: Return TIMESTAMP columns as native Python datetime objects #437
Conversation
afee2cc
to
c364c4a
Compare
c364c4a
to
2c7e75f
Compare
src/crate/client/cursor.py
Outdated
@staticmethod | ||
def _transform_date_columns(row, gen_flags): | ||
""" | ||
Generates iterates over each value from a row and converts timestamps to pandas TIMESTAMP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do something more elegant with lambdas? (I have basic python knowledge though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using the generator here is fine. Let me know if you think otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When looking at this again, specifically after cleaning up a bit with f355164, I tend to agree with you. It is probably enough that _transform_result_types
, which iterates all records, is a generator, in order to save memory.
On the other hand, _transform_datetime_columns
, which iterates all columns per record, does not necessarily have to be a generator. I will keep the original patch unchanged on this matter yet, in order to not change too much prematurely, and will await the official review.
1b8f0ad
to
f667f14
Compare
0ce46fd
to
db91b3d
Compare
This returns the data type of the columns in a query, to allow us to transform the dates from cratedb (long int) to python datetime
In execute(), transform the columns with type timestamp and timestamp without time zone to python datetime, this will correctly display dates in apache superset
db91b3d
to
ea4f9c7
Compare
src/crate/client/cursor.py
Outdated
if value < 0: | ||
yield None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this guard? Was it thought to protect against negative timestamps? In practice, they would translate to Python well, no?
>>> datetime.fromtimestamp(-2342342342415 / 1e3)
datetime.datetime(1895, 10, 10, 14, 20, 57, 585000)
2a9b4c2
to
6522e43
Compare
>>> (now - location.datetime_tz).seconds < 4 | ||
True | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this test case protect against anything significant? Currently, it croaks on my machine with a difference of 79600s
(22.11 h
), and I can not make much other sense of it. Most probably, I am missing the point.
Otherwise, the values of the datetime objects will depend on the time zone setting of the system, making it difficult to compare deterministically.
- Naming things - Slight structural changes - Use iterator instead of generator for column type flagging - Improve inline documentation
6522e43
to
f355164
Compare
|
||
Currently, only converting to native `datetime` objects is implemented. | ||
""" | ||
datetime_column_types = [11, 15] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does someone have a resource at hand where those column code types are enumerated? The table at 1 coincidentally lists _timestamp without time zone
as 1115
, but here the code is apparently expecting two-digit integer numbers.
I will be happy to receive further pointers for better educating myself on this topic.
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://crate.io/docs/crate/reference/en/5.0/interfaces/http.html#column-types
The code here will probably also need to handle arrays of timestamps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I will add the conversion for container types as well.
Reading up on your reference, I am asking myself whether it is appropriate that type=15
(Unchecked object) is handled here as well?
src/crate/client/cursor.py
Outdated
break | ||
|
||
if flag and value is not None: | ||
value = datetime.fromtimestamp(value / 1e3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using utcfromtimestamp
is the less error prone option here.
There are two scenarios:
The user stores UTC on the server:
fromtimestamp
would be wrong, or at least confusing as it would convert to localtime but without attaching a timezone info.utcfromtimestamp
would at least not be wrong.
The user stores local timestamps on the server. Unless they also store the timezone in a separate field this is error prone but
utcfromtimestamp
would at least preserve the value used on the serverfromtimestamp
would be wrong, because it would apply another offset.
It's important that we don't attach a tzinfo
given that we don't have one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option could be to to let users configure the timezone on the client (with default UTC) and always create timezone aware objects.
The default could turn out to be wrong, but at least the user has an easy option to fix it, and it's easier to spot a incorrectly attached tzone then it is to figure out that a conversion happened and why/where some conversion happend.
It's also way more convenient/less error prone to continue working with tz aware datetimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. With 3154d32, I've switched to use datetime.utcfromtimestamp
instead.
Another option could be to to let users configure the timezone on the client (with default UTC) and always create timezone aware objects.
Shall we bring in the timezone awareness (#359, #361) with a subsequent patch right after this one?
I am closing this in favor of other patches / issues, where the corresponding details have been converged into. |
Dear @Aymaru,
apologies for the late reply.
On the patch you submitted at #395 the other day, I recently exercised #426 separately. Your other improvements from there have been converged into this very patch, which has been slightly cleaned up to reflect the changes without the gist of #426 and other spurious commits introduced by merging from the master branch.
I've tried to keep your original commits for now, but I might finally squash them together while working on the patch. The next steps are rebasing upon master and adding eventual fixup commits.
Feel free to also add additional comments or suggestions, specifically if you can spot a place where I missed to reflect the improvements from your original patch #395 correctly.
With kind regards,
Andreas.
Backlog