-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add list_views
for hive catalog
#1251
base: main
Are you sure you want to change the base?
Conversation
Hello @kevinjqliu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Could you add integration tests to ensure that pyiceberg is using the same semantics as the Java implementation?
It would be great to test these scenarios
- Create an Iceberg view using Spark with Glue catalog, list the view using pyiceberg and glue catalog. Ensure that it's the same view definition
- Create an Iceberg table and Iceberg view in the same namespace, make sure
list_table
only shows the table andlist_views
only shows the view.
Hi @kevinjqliu, Could you please explain how the list_tables method handles this situation? Specifically, how does it differentiate between tables and views when the parameter for the view results in a NoSuchPropertyException due to the missing table_type property? |
Im not familiar with how the iceberg view is implemented in the hive catalog. Take a look at the HiveCatalog implementation in Java, listTables and listViews.
it looks like views are determined by the |
Hi @kevinjqliu , Previous SituationIn the existing implementation of the list_tables method within hive catalog, the functionality aimed to retrieve all tables under a specified namespace. However, it did not account for the distinction between Iceberg tables and views. Specifically, the method included both Iceberg tables and virtual views in its results, which could lead to confusion and incorrect data handling in downstream processes. Changes MadeTo improve the accuracy of the list_tables method, the following changes have been implemented: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I left a few comments
dbname=database_name, | ||
tbl_names=open_client.get_all_tables(db_name=database_name) | ||
) | ||
if view.parameters.get(TABLE_TYPE, "").lower() == ICEBERG and view.tableType.lower() == "virtual_view" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we do something like this here?
d5b51f9#diff-5dc4045c0eac182d6c22d8a06f62fcc79c369ad8bff80d357252147e2d9a76f2R785
for both __is_iceberg_table
and __is_iceberg_view
with self._client as open_client: | ||
return [ | ||
(database_name, view.tableName) | ||
for view in open_client.get_table_objects_by_name( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this right? or is there a "view" equivalent function
catalog._client.__enter__().get_table_objects_by_name.assert_called_with( | ||
dbname="database", tbl_names=["view1", "view2", "non_iceberg_view", "table"] | ||
) | ||
def test_list_views(hive_table: HiveTable)-> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can run make lint
to fix linter issue
@@ -1233,3 +1233,84 @@ def test_create_hive_client_failure() -> None: | |||
with pytest.raises(Exception, match="Connection failed"): | |||
HiveCatalog._create_hive_client(properties) | |||
assert mock_hive_client.call_count == 2 | |||
|
|||
def test_list_views(hive_table: HiveTable) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test sets specific parameters then test those parameters.
I think creating an integration test using the HMS would be useful. That way we can test the actual parameter for HMS
list_views
for hive catalog
Added list_views and test_list_views for hive catalog