-
Couldn't load subscription status.
- Fork 935
Set up type hinting: add fixes in schema-registry module (mostly already typed) #2107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
🎉 All Contributor License Agreements have been signed. Ready to merge. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
54fab2a to
a5ff8aa
Compare
|
|
||
| def __init__( | ||
| self, key_uri: Optional[str], token: Optional[str], ns: Optional[str] = None, | ||
| self, key_uri: str, token: Optional[str], ns: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the optional part as we are already checking emptiness for key_url in https://github.com/confluentinc/confluent-kafka-python/blob/master/src/confluent_kafka/schema_registry/rules/encryption/hcvault/hcvault_driver.py#L53, and I think HcVaultKmsClient is supposed to be only created with a specified key_uri, according to the function doc
Not sure if this is considered a breaking change (if customers initializes HcVaultKmsClient directly in their code). @rayokota would love to hear your thoughts on this
|
|
||
| def __init__( | ||
| self, key_uri: Optional[str], credentials: TokenCredential | ||
| self, key_uri: str, credentials: TokenCredential |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar change as the one in hcvault_client.py
This comment has been minimized.
This comment has been minimized.
| return parsed_schema | ||
|
|
||
| named_schemas = _resolve_named_schema(schema, self._registry) | ||
| if schema.schema_str is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice the schema_str field should never be empty, and even it is I think it makes sense to raise the error to fail early here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple questions on the PR. We are introducing a lot of ignore comments in places like avro that feel off but maybe not worth tackling in this pass anyway.
| referenced_schema = await schema_registry_client.get_version(ref.subject, ref.version, True) | ||
| ref_named_schemas = await _resolve_named_schema(referenced_schema.schema, schema_registry_client) | ||
| # References in registered schemas are validated by server to be complete | ||
| referenced_schema = await schema_registry_client.get_version(ref.subject, ref.version, True) # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a need for type ignoring here? Can we set the ref/referenced_schema type to avoid this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, subject and version of SchemaReference never be None, but probably for historical reason (or tech debt) they have been typed as optional. I think updating the types will be a breaking change
Here, get_version() requires non-empty subject and version, so that's why I added the type ignore there. Alternatively we can do:
if ref.subject is None or ref.version is None:
# maybe log something
continue
referenced_schema = await schema_registry_client.get_version(ref.subject, ref.version, True)
| schema_name = parsed_schema.get("name", schema_dict.get("type")) | ||
| if schema.schema_str is not None: | ||
| schema_dict = json.loads(schema.schema_str) | ||
| schema_name = parsed_schema.get("name", schema_dict.get("type")) # type: ignore[union-attr] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment that 'type' in the dict is for the schema type and not any language type hinting? Seeing it might be confusing without context right next to a type ignore call.
| def field_transformer(rule_ctx, field_transform, msg): return ( # noqa: E731 | ||
| transform(rule_ctx, parsed_schema, msg, field_transform)) | ||
| value = self._execute_rules(ctx, subject, RuleMode.WRITE, None, | ||
| value = self._execute_rules(ctx, subject, RuleMode.WRITE, None, # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should spread this function args out per line if we're type commenting one. It read weird like this and I think is error prone to refactor bugs
| buffer = fo.getvalue() | ||
|
|
||
| if latest_schema is not None: | ||
| if latest_schema is not None and ctx is not None and subject is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changed the logic here. Are we certain it's a correct / tested change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_execute_rules_with_phase() already requires SerializationContext and subject to build RuleContext (
| def _execute_rules_with_phase( |
… entrypoint init files
a5ff8aa to
cdbd203
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
This comment has been minimized.
This comment has been minimized.
| ValueError: If ctx is None. | ||
| """ | ||
| if ctx is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored the name_strategy functions a bit: they have to follow the same function signature but SerializationContext is only required for some
| self.bearer_field_provider = _StaticFieldProvider(static_token, logical_cluster, identity_pool) | ||
| if not isinstance(static_token, string_type): | ||
| raise TypeError("bearer.auth.token must be a str, not " + str(type(static_token))) | ||
| if self.bearer_auth_credentials_source == 'OAUTHBEARER': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the indentation is intended to address the type issue of logical_cluster and identity_pool for building _AsyncOAuthClient: they must be non-empty, which we already check in line 280 and 284
This doesn't affect code logic: we check self.bearer_auth_credentials_source in {'OAUTHBEARER', 'STATIC_TOKEN'} for the outer block, and those if-else branches are for OAUTHBEARER and STATIC_TOKEN respectively
|
|
||
|
|
||
| def get_type(schema: JsonSchema) -> FieldType: | ||
| if isinstance(schema, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JsonSchema union type is either bool or dict. This was likely coped from avro.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 42 out of 42 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| await self.generate_access_token() | ||
|
|
||
| if self.token is None: | ||
| raise ValueError("Token is not set after the at") |
Copilot
AI
Oct 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected incomplete error message from 'after the at' to 'after the attempt to generate it'.
| raise ValueError("Token is not set after the at") | |
| raise ValueError("Token is not set after the attempt to generate it") |
| self.token_endpoint, logical_cluster, identity_pool, | ||
| self.max_retries, self.retries_wait_ms, | ||
| self.retries_max_wait_ms) | ||
| else: # STATIC_TOKEN |
Copilot
AI
Oct 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The elif condition check has been replaced with else without verifying all possible values. This assumes only 'OAUTHBEARER' and 'STATIC_TOKEN' are valid values for bearer_auth_credentials_source. Consider adding an explicit elif condition for 'STATIC_TOKEN' to make the code more maintainable and guard against unexpected values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already the outer block if self.bearer_auth_credentials_source in {'OAUTHBEARER', 'STATIC_TOKEN'}:
| self.token_endpoint, logical_cluster, identity_pool, | ||
| self.max_retries, self.retries_wait_ms, | ||
| self.retries_max_wait_ms) | ||
| else: # STATIC_TOKEN |
Copilot
AI
Oct 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The elif condition check has been replaced with else without verifying all possible values. This assumes only 'OAUTHBEARER' and 'STATIC_TOKEN' are valid values for bearer_auth_credentials_source. Consider adding an explicit elif condition for 'STATIC_TOKEN' to make the code more maintainable and guard against unexpected values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already the outer block if self.bearer_auth_credentials_source in {'OAUTHBEARER', 'STATIC_TOKEN'}: (line 273)
| if subject_name is not None: | ||
| query['subject'] = subject_name | ||
| query: dict[str, Any] = {'offset': offset, 'limit': limit} | ||
| if subject_name is not None: query['subject'] = subject_name |
Copilot
AI
Oct 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation is incorrect on line 867. The 'query['subject'] = subject_name' statement should be on its own line with proper indentation.
| if subject_name is not None: query['subject'] = subject_name | |
| if subject_name is not None: | |
| query['subject'] = subject_name |
| if subject_name is not None: | ||
| query['subject'] = subject_name | ||
| query: dict[str, Any] = {'offset': offset, 'limit': limit} | ||
| if subject_name is not None: query['subject'] = subject_name |
Copilot
AI
Oct 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation is incorrect on line 868. The 'query['subject'] = subject_name' statement should be on its own line with proper indentation.
| if subject_name is not None: query['subject'] = subject_name | |
| if subject_name is not None: | |
| query['subject'] = subject_name |
src/confluent_kafka/admin/_group.py
Outdated
| def __init__(self, topic_partitions: List[TopicPartition] = []) -> None: | ||
| self.topic_partitions = topic_partitions or [] |
Copilot
AI
Oct 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a mutable default argument (empty list) is dangerous as it will be shared across all instances. Use None as default and initialize inside the function instead.
| def __init__(self, topic_partitions: List[TopicPartition] = []) -> None: | |
| self.topic_partitions = topic_partitions or [] | |
| def __init__(self, topic_partitions: Optional[List[TopicPartition]] = None) -> None: | |
| self.topic_partitions = topic_partitions if topic_partitions is not None else [] |
| inline_tags = get_inline_tags(reader_schema) if reader_schema is not None else None | ||
| obj_dict = self._execute_rules(ctx, subject, RuleMode.READ, None, | ||
| reader_schema_raw, obj_dict, | ||
| inline_tags,field_transformer) |
Copilot
AI
Oct 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space after comma between 'inline_tags' and 'field_transformer' arguments.
| inline_tags,field_transformer) | |
| inline_tags, field_transformer) |
What
Follow-up PR after #2041: adding missing types + correcting existing types, according to mypy static checker, in schema-registry module
What's left are functions that might require refactoring and more thorough investigation to get the types right:
Checklist
References
JIRA: https://confluentinc.atlassian.net/browse/DGS-22076
Test & Review
Open questions / Follow-ups