-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return TIMESTAMP columns as native Python datetime objects to improve support for Apache Superset #395
Conversation
This is to fix the apache superset bug, that doesn't show the cratedb table metadata in the sqleditor
This returns the data type of the columns in a query, to allow us to transform the dates from cratedb (long int) to python datetime
In execute(), transform the columns with type timestamp and timestamp without time zone to python datetime, this will correctly display dates in apache superset
Dear @Aymaru, thanks a stack for your contribution. We will have a look and might come back to you with some comments. With kind regards, |
Dear @Aymaru, first of all, thank you very much for your contribution again. I was able to take a more detailed look at the implementation right now and wanted to start a discussion around your patch. Apologies that the CI job apparently didn't work when you submitted your patch. I enabled it again (see #396 (comment)) and just triggered it by pushing (and then removing) an empty commit to your PR. The outcome is that the patch currently doesn't pass the tests. The spots are:
Will you be able to fix those? I also added some thoughts about the implementation itself. With kind regards, |
src/crate/client/cursor.py
Outdated
col_types = self._result["col_types"] | ||
tmp_data = self._result["rows"] | ||
|
||
rows_to_convert = self._get_rows_to_convert_to_date(col_types) | ||
tmp_data = self._convert_dates_to_datetime(tmp_data, rows_to_convert) | ||
|
||
self._result["rows"] = tmp_data | ||
|
||
self.rows = iter(self._result["rows"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can come up with a more elegant and efficient implementation. When trying to follow the code, I recognize that it builds up a list of boolean flags which designates which columns to apply the datetime conversion to, right?
Based on that information, the code will then apply the conversion function on each column while mapping each row into memory through the list()
function.
I am a bit hesitant on this and wanted to share my specific concerns:
a) The code might allocate a significant amount of memory on larger results.
b) On a very large result set, it might even run out of memory.
c) When applying the conversion, there are currently no sanity checks around which test for NULL
- or other emptyness. I am specifically looking at float(str(x)[0:10])
here. Also, there is no other exception handling either. We might want to improve on this aspect.
Maybe we can wrap that conversion into a generator function in order to keep the memory footprint low and, while being at it, add some more bells and whistles on the error handling side?
Dear @Aymaru, thank you very much for continuing your work on this. Please let me know if you are still able to spend time for more iterations on it or whether you would like to see us taking over. On the patch itself, I will add some more comments. With kind regards, |
src/crate/client/cursor.py
Outdated
rows_to_convert = self._get_rows_to_convert_to_date(self._result["col_types"]) | ||
for flag in rows_to_convert: | ||
if flag: | ||
t_rows = (row for row in self._result["rows"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 61 will probably still load the full result into memory, right?
I was thinking about swapping in the generator right in between of self.rows = iter(self._result["rows"])
, which is then responsible for converting between data types (timestamp epoch integer to Python datetime object).
In that manner, the pre-computation using _get_rows_to_convert_to_date()
might become obsolete.
return set(map(lambda el: el[0], rows)) | ||
return list(set(map(lambda el: el[0], rows))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I don't know about the specific background why there has been a set
used here, can you elaborate why this should now return a list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear @Aymaru,
apologies for the late reply. We've split off this amendment into #426. For you, it apparently solved this issue:
Changing the return type from set to list on get_pk_constraint(), solves a compatibility issue from Apache Superset that doesn't show the metadata from CrateDB tables in the SQL Lab.
With the patch at #426, we are observing a regression. Maybe due to the upgrade to SQLAlchemy 1.4, and maybe other changes in Apache Superset, this is not needed anymore?
With kind regards,
Andreas.
1373932800000, | ||
datetime.datetime(2013, 7, 15, 18, 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This currently raises an exception on CI, see [1]. The current code expects datetime.datetime(2013, 7, 16, 0, 0)
here.
[1] https://github.com/crate/crate-python/runs/2114206441?check_suite_focus=true#step:5:388
This pull request introduces 1 alert when merging 57ec090 into 71adf4d - view on LGTM.com new alerts:
|
This pull request introduces 1 alert when merging 9b82ac2 into 71adf4d - view on LGTM.com new alerts:
|
Dear @Aymaru, first things first: As I noticed recent activity from you on this patch again, I wanted to thank you for keeping up to it. Further, I wanted to take the chance to tell you that we recently fixed some flaws within the test suite. Besides it now also works on macOS, more importantly, it received different fixes where the outcome of the software tests was indeterministic before. So, you might want to rebase your branch on top of the current master branch in order to also receive those improvements. With kind regards, P.S.: Because we updated the installation procedure (i.e. removing |
- Adjust and add some badges to README.rst. - For covering the most recent SQLAlchemy 1.3 release, bump to SQLAlchemy version 1.3.23. - Also, enable listening to "workflow_dispatch" events in order to be able to trigger workflows manually from the UI.
Dear @Aymaru, the main aspects of your patch have been converged into #442, #445, and crate/sqlalchemy-cratedb#89. With #426, we evaluated that the specific detail of returning a list from With kind regards, |
Hi again. @hlcianfagna: Do you think that with your recent patch to fix the CrateDB adapter in Apache Superset, this issue can be considered as resolved, or do you think it is referring to a similar but different detail on datetime/timestamp compatibility matters? @Aymaru: May we humbly ask you to check again, using the most recent versions of both Apache Superset, and CrateDB? With kind regards, |
Dear @hlcianfagna and @Aymaru, we would like to thank you once again for your excellent improvements on this matter. It looks like Hernan's recent improvements made it into the upstream codebase of Apache Superset, so we may consider this issue as resolved? @Aymaru: Can you verify if everything you aimed at works well with CrateDB now, when using a recent version of Apache Superset? With kind regards, |
Hi again. Based on those improvements, and that we didn't hear about any other flaws recently, we are closing this PR, in the assumption that everything works well in this regard.
Please let us know if you think anything is still wrong, best raised on the issue tracker of sqlalchemy-cratedb. Thank you again for your efforts on this! |
Summary of the changes / Why this is an improvement
Changing the return type from set to list on
get_pk_constraint()
, solves a compatibility issue from Apache Superset that doesn't show the metadata from CrateDB tables in the SQL Lab.CrateDB types TIMESTAMP and TIMESTAMP WITHOUT TIME ZONE, are returned as python datetime objects, correctly displaying date columns in charts and SQL Lab from Apache Superset.
Checklist