Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple partitions for hive not delimited correctly #234

Closed
Robert-Christensen-visa opened this issue Aug 30, 2021 · 0 comments · Fixed by #834
Closed

multiple partitions for hive not delimited correctly #234

Robert-Christensen-visa opened this issue Aug 30, 2021 · 0 comments · Fixed by #834
Labels
bug Something isn't working hive Improvements to or issues with Hive functionality needs triage Awaiting triage by a dask-sql maintainer

Comments

@Robert-Christensen-visa

To get a list of hive partitions the command SHOW PARTITIONS is used:

def _parse_hive_partition_description(
self,
cursor: Union["sqlalchemy.engine.base.Connection", "hive.Cursor"],
schema: str,
table_name: str,
):
"""
Extract all partition informaton for a given table
"""
cursor.execute(f"USE {schema}")
result = self._fetch_all_results(cursor, f"SHOW PARTITIONS {table_name}")
return [row[0] for row in result]

For tables with multiple partition keys, it will return a list that is / delimited. For example, if the table has two partition keys, date and region, one of the entries returned might be like this:

date=20210101/region=south

This string is used without modifications to get additional information about each partition using DESCRIBE FORMATTED {table_name} PARTITION ({partition}). If multiple partition keys exist when using this command, the list should be , separated, not / separated.

if partition:
result = self._fetch_all_results(
cursor, f"DESCRIBE FORMATTED {table_name} PARTITION ({partition})"
)

Because of this, when I try to read in a table with multiple hive partition keys dask-sql throws an error.

The issue #179 might also be somewhat related to the example above.

@charlesbluca charlesbluca added bug Something isn't working needs triage Awaiting triage by a dask-sql maintainer hive Improvements to or issues with Hive functionality labels Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hive Improvements to or issues with Hive functionality needs triage Awaiting triage by a dask-sql maintainer
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants