-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingestion/iceberg): Several improvements to iceberg connector #12744
feat(ingestion/iceberg): Several improvements to iceberg connector #12744
Conversation
Codecov ReportAttention: Patch coverage is
📢 Thoughts on this report? Let us know! |
iceberg integration tests
) | ||
return | ||
def _try_processing_dataset( | ||
dataset_path: Tuple[str, ...], dataset_name: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tuple[str, ...]
seems equivalent to List[str]
I guess only difference is the tuple ensures a size >= 1 and that's relevant here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are more differences between a tuple and a list and this annotation reflects that. For one, tuple is immutable and passed by value. Moreover this annotation matches what we are getting from pyiceberg
, so overall I would like to keep it like this.
except ValueError as e: | ||
if "Could not initialize FileIO" not in str(e): | ||
raise | ||
self.report.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may you have forgot the redundant LOGGER.warning
for this except case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning
method will automatically log, unlike report_warning
. I should change all lines to use warning
I think, see:
datahub/metadata-ingestion/src/datahub/ingestion/api/source.py
Lines 247 to 267 in 5309ae0
def report_warning( | |
self, | |
message: LiteralString, | |
context: Optional[str] = None, | |
title: Optional[LiteralString] = None, | |
exc: Optional[BaseException] = None, | |
) -> None: | |
self._structured_logs.report_log( | |
StructuredLogLevel.WARN, message, title, context, exc, log=False | |
) | |
def warning( | |
self, | |
message: LiteralString, | |
context: Optional[str] = None, | |
title: Optional[LiteralString] = None, | |
exc: Optional[BaseException] = None, | |
) -> None: | |
self._structured_logs.report_log( | |
StructuredLogLevel.WARN, message, title, context, exc, log=True | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adding locks feels repetitive code.
Not a big fan of decorators, but they may be helpful here.
Considering they are singe lines |
Unexpectedly, I haven't found any! 😅 |
datasetProperties
information with creation and last modified times (if available)ValueError
exception, in long term we should probably changepyiceberg
library to throw unique exception type in such casedatasetProperties
withqualifiedName
(set asnamespace.table
)