Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update dialect to get columns based on DESCRIBE #20

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

morganda
Copy link

Summary

This attempts to fix #19

Currently, if you try to fetch the columns of the table, it will select * from collection limit 1 which assumes a fixed schema.
This PR adds a version of this implementation, but instead uses describe collection option(max_field_depth=1) where max_field_depth is configurable. There are two minor caveats

  • Rhe describe method, does not return the _meta field whereas the select method would.
  • Rollup collections do not currently support describe so if a RocksetException is thrown, it will fall back to the select method. The format of the response is the same so there shouldn't be too many surprises.

Misc

  • Pulled some executions out to separate functions to make things easier to unittest

Tests

  • Added unittests for RocksetDialect.get_columns

@morganda morganda requested review from pmenglund, sbaldwin-rs and a team May 29, 2024 22:57
@sbaldwin-rs
Copy link

Will review this shortly

@@ -15,16 +15,14 @@
entry_points={
"sqlalchemy.dialects": [
"rockset_sqlalchemy = rockset_sqlalchemy.sqlalchemy:RocksetDialect",
"rockset = rockset_sqlalchemy.sqlalchemy:RocksetDialect"
"rockset = rockset_sqlalchemy.sqlalchemy:RocksetDialect",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version needs to be bumped, right?

@@ -115,10 +116,72 @@ def _get_table_columns(self, connection, table_name, schema):
raise e
return columns

def _validate_query(self, connection: Engine, query: str):
import rockset.models

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to have this import at the top of the file?

if not isinstance(max_field_depth, int):
raise ValueError("Query option max_field_depth, must be of type 'int'")

q = f"DESCRIBE {table_name} OPTION(max_field_depth={max_field_depth})"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema needs to be in the DESCRIBE query, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Getting column definitions assumes fixed schema for all rows
2 participants