Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add list_views for hive catalog #1251

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Add list_views for hive catalog #1251

wants to merge 5 commits into from

Conversation

omkenge
Copy link
Contributor

@omkenge omkenge commented Oct 26, 2024

Added list_views and test_list_views for hive catalog

@omkenge
Copy link
Contributor Author

omkenge commented Oct 26, 2024

Hello @kevinjqliu
As we discussed earlier, I created a PR for list_view for hive catalog
Could you plz test it ?

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Could you add integration tests to ensure that pyiceberg is using the same semantics as the Java implementation?

It would be great to test these scenarios

  • Create an Iceberg view using Spark with Glue catalog, list the view using pyiceberg and glue catalog. Ensure that it's the same view definition
  • Create an Iceberg table and Iceberg view in the same namespace, make sure list_table only shows the table and list_views only shows the view.

@omkenge
Copy link
Contributor Author

omkenge commented Oct 28, 2024

Hi @kevinjqliu,
( All tested with AWS Glue Catalog )
When I run the list_tables method, the output includes both Iceberg tables and views. However, when I check the list_tables method, it distinguishes between the tables based on a specific table_type parameter.
When I check the parameter for the view, I encounter the following exception:
pyiceberg.exceptions.NoSuchPropertyException: Property table_type missing, could not determine type.

Could you please explain how the list_tables method handles this situation? Specifically, how does it differentiate between tables and views when the parameter for the view results in a NoSuchPropertyException due to the missing table_type property?

@kevinjqliu
Copy link
Contributor

When I run the list_tables method, the output includes both Iceberg tables and views. However, when I check the list_tables method, it distinguishes between the tables based on a specific table_type parameter.

Im not familiar with how the iceberg view is implemented in the hive catalog. Take a look at the HiveCatalog implementation in Java, listTables and listViews.

When I check the parameter for the view, I encounter the following exception:
pyiceberg.exceptions.NoSuchPropertyException: Property table_type missing, could not determine type.

it looks like views are determined by the VIRTUAL_VIEW type instead of the table properties
https://github.com/apache/iceberg/blob/86a8560315432d6c63af8f6bc1379618d7b432cf/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java#L188C75-L188C87

@omkenge
Copy link
Contributor Author

omkenge commented Oct 28, 2024

Hi @kevinjqliu ,

Previous Situation

In the existing implementation of the list_tables method within hive catalog, the functionality aimed to retrieve all tables under a specified namespace. However, it did not account for the distinction between Iceberg tables and views. Specifically, the method included both Iceberg tables and virtual views in its results, which could lead to confusion and incorrect data handling in downstream processes.

Changes Made

To improve the accuracy of the list_tables method, the following changes have been implemented:
1) Exclusion of Views: The condition for filtering tables now explicitly excludes views by checking that the tableType is not "VIRTUAL_VIEW"
2) Implemented list_views Method: A new list_views method has been implemented to specifically retrieve Iceberg views under the given namespace. This method ensures that only valid Iceberg views are returned

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I left a few comments

dbname=database_name,
tbl_names=open_client.get_all_tables(db_name=database_name)
)
if view.parameters.get(TABLE_TYPE, "").lower() == ICEBERG and view.tableType.lower() == "virtual_view"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we do something like this here?
d5b51f9#diff-5dc4045c0eac182d6c22d8a06f62fcc79c369ad8bff80d357252147e2d9a76f2R785
for both __is_iceberg_table and __is_iceberg_view

with self._client as open_client:
return [
(database_name, view.tableName)
for view in open_client.get_table_objects_by_name(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this right? or is there a "view" equivalent function

catalog._client.__enter__().get_table_objects_by_name.assert_called_with(
dbname="database", tbl_names=["view1", "view2", "non_iceberg_view", "table"]
)
def test_list_views(hive_table: HiveTable)-> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can run make lint to fix linter issue

@@ -1233,3 +1233,84 @@ def test_create_hive_client_failure() -> None:
with pytest.raises(Exception, match="Connection failed"):
HiveCatalog._create_hive_client(properties)
assert mock_hive_client.call_count == 2

def test_list_views(hive_table: HiveTable) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test sets specific parameters then test those parameters.
I think creating an integration test using the HMS would be useful. That way we can test the actual parameter for HMS

@omkenge omkenge changed the title Add list_views for hive catalog Add list_views for hive catalog Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants